DictSample
DictSample(_data=None, **kwargs)Dynamic sample type providing dict-like access to raw msgpack data.
This class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample["field"]) access to fields.
DictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema
To convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.
Examples
>>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample]
>>> for sample in ds.ordered():
... print(sample.some_field) # Attribute access
... print(sample["other_field"]) # Dict access
... print(sample.keys()) # Inspect available fields
...
>>> # Convert to typed schema
>>> typed_ds = ds.as_type(MyTypedSample)Note
NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.
Attributes
| Name | Description |
|---|---|
| as_wds | Pack this sample’s data for writing to WebDataset. |
| packed | Pack this sample’s data into msgpack bytes. |
Methods
| Name | Description |
|---|---|
| from_bytes | Create a DictSample from raw msgpack bytes. |
| from_data | Create a DictSample from unpacked msgpack data. |
| get | Get a field value with optional default. |
| items | Return list of (field_name, value) tuples. |
| keys | Return list of field names. |
| to_dict | Return a copy of the underlying data dictionary. |
| values | Return list of field values. |
from_bytes
DictSample.from_bytes(bs)Create a DictSample from raw msgpack bytes.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| bs | bytes | Raw bytes from a msgpack-serialized sample. | required |
Returns
| Name | Type | Description |
|---|---|---|
| DictSample | New DictSample instance with the unpacked data. |
from_data
DictSample.from_data(data)Create a DictSample from unpacked msgpack data.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| data | dict[str, Any] | Dictionary with field names as keys. | required |
Returns
| Name | Type | Description |
|---|---|---|
| DictSample | New DictSample instance wrapping the data. |
get
DictSample.get(key, default=None)Get a field value with optional default.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| key | str | Field name to access. | required |
| default | Any | Value to return if field doesn’t exist. | None |
Returns
| Name | Type | Description |
|---|---|---|
| Any | The field value or default. |
items
DictSample.items()Return list of (field_name, value) tuples.
keys
DictSample.keys()Return list of field names.
to_dict
DictSample.to_dict()Return a copy of the underlying data dictionary.
values
DictSample.values()Return list of field values.