admin管理员组文章数量:1292691
I'm using mlx.data to load an image dataset where each class is represented by a separate folder. My function files_and_classes generates a list of dictionaries containing file paths and corresponding labels.
Here’s my code:
from pathlib import Path
import mlx.data as dx
def files_and_classes(root: Path):
"""Load the files and classes from an image dataset that contains one folder per class."""
images = list(root.rglob("*.jpg"))
categories = [p.relative_to(root).parent.name for p in images]
category_set = set(categories)
category_map = {c: i for i, c in enumerate(sorted(category_set))}
return [
{
"file": str(p.relative_to(root)).encode("ascii"),
"label": category_map[c]
}
for c, p in zip(categories, images)
]
sample = files_and_classes(Path('/Users/kimduhyeon/Desktop/d2f/mlx/val'))
print(sample[0])
dset = dx.buffer_from_vector(sample)
print(dset[0])
Expected Output: I expected each entry in dset to contain the correct file path as a byte string and the corresponding label.
Actual Output: The first dictionary prints correctly from sample[0]:
{'file': b'film/000017270027.jpg', 'label': 1}
However, when accessing dset[0], the file field is empty:
{'label': array(1), 'file': array([], dtype=int8)}
Question: Why is the file field showing up as an empty array (array([], dtype=int8)) when converted to a mlx.data buffer? Is there a specific data type requirement for mlx.data.buffer_from_vector? How should I properly format the file field to avoid this issue?
I'm using mlx.data to load an image dataset where each class is represented by a separate folder. My function files_and_classes generates a list of dictionaries containing file paths and corresponding labels.
Here’s my code:
from pathlib import Path
import mlx.data as dx
def files_and_classes(root: Path):
"""Load the files and classes from an image dataset that contains one folder per class."""
images = list(root.rglob("*.jpg"))
categories = [p.relative_to(root).parent.name for p in images]
category_set = set(categories)
category_map = {c: i for i, c in enumerate(sorted(category_set))}
return [
{
"file": str(p.relative_to(root)).encode("ascii"),
"label": category_map[c]
}
for c, p in zip(categories, images)
]
sample = files_and_classes(Path('/Users/kimduhyeon/Desktop/d2f/mlx/val'))
print(sample[0])
dset = dx.buffer_from_vector(sample)
print(dset[0])
Expected Output: I expected each entry in dset to contain the correct file path as a byte string and the corresponding label.
Actual Output: The first dictionary prints correctly from sample[0]:
{'file': b'film/000017270027.jpg', 'label': 1}
However, when accessing dset[0], the file field is empty:
{'label': array(1), 'file': array([], dtype=int8)}
Question: Why is the file field showing up as an empty array (array([], dtype=int8)) when converted to a mlx.data buffer? Is there a specific data type requirement for mlx.data.buffer_from_vector? How should I properly format the file field to avoid this issue?
Share Improve this question asked Feb 13 at 5:19 user18934955user18934955 491 silver badge4 bronze badges2 Answers
Reset to default 0Remove the .encode("ascii")
, instead of encoding file path to bytes leave it as regular string.This waymlx.data.buffer_from_vector
can automatically infer fixed length string type for file field
"file":str(p.relative_to(root))
Also you could do this:
"file":np.array(str(p.relative_to(root)), dtype='S40')
There was a strange issue where the code worked in the global environment on my MacBook but not in a virtual environment. Based on this, I concluded that my Python environment and variables were tangled. So, I decided to reset my MacBook and reinstall everything properly, ensuring that I used only a single, correctly configured Python environment.
Following the installation method described in the official mlx-data
documentation—cloning the repository via git clone
and then binding it with Python—I was able to run everything without any issues.
I hope my experience can help others who might be struggling with similar problems.
本文标签:
版权声明:本文标题:python - Why is file field empty (array([], dtype=int8)) when loading dataset with mlx.data? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741563184a2385566.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论