DictSample

DictSample(_data=None, **kwargs)

Dynamic sample type providing dict-like access to raw msgpack data.

This class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample["field"]) access to fields.

DictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema

To convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.

Examples

>>> ds = load_dataset("path/to/data.tar")  # Returns Dataset[DictSample]
>>> for sample in ds.ordered():
...     print(sample.some_field)      # Attribute access
...     print(sample["other_field"])  # Dict access
...     print(sample.keys())          # Inspect available fields
...
>>> # Convert to typed schema
>>> typed_ds = ds.as_type(MyTypedSample)

Note

NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.

Attributes

Name Description
as_wds Pack this sample’s data for writing to WebDataset.
packed Pack this sample’s data into msgpack bytes.

Methods

Name Description
from_bytes Create a DictSample from raw msgpack bytes.
from_data Create a DictSample from unpacked msgpack data.
get Get a field value with optional default.
items Return list of (field_name, value) tuples.
keys Return list of field names.
to_dict Return a copy of the underlying data dictionary.
values Return list of field values.

from_bytes

DictSample.from_bytes(bs)

Create a DictSample from raw msgpack bytes.

Parameters

Name Type Description Default
bs bytes Raw bytes from a msgpack-serialized sample. required

Returns

Name Type Description
DictSample New DictSample instance with the unpacked data.

from_data

DictSample.from_data(data)

Create a DictSample from unpacked msgpack data.

Parameters

Name Type Description Default
data dict[str, Any] Dictionary with field names as keys. required

Returns

Name Type Description
DictSample New DictSample instance wrapping the data.

get

DictSample.get(key, default=None)

Get a field value with optional default.

Parameters

Name Type Description Default
key str Field name to access. required
default Any Value to return if field doesn’t exist. None

Returns

Name Type Description
Any The field value or default.

items

DictSample.items()

Return list of (field_name, value) tuples.

keys

DictSample.keys()

Return list of field names.

to_dict

DictSample.to_dict()

Return a copy of the underlying data dictionary.

values

DictSample.values()

Return list of field values.