AbstractIndex

AbstractIndex()

Protocol for index operations - implemented by LocalIndex and AtmosphereIndex.

This protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses

A single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.

Optional Extensions

Some index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.

Examples

>>> def publish_and_list(index: AbstractIndex) -> None:
...     # Publish schemas for different types
...     schema1 = index.publish_schema(ImageSample, version="1.0.0")
...     schema2 = index.publish_schema(TextSample, version="1.0.0")
...
...     # Insert datasets of different types
...     index.insert_dataset(image_ds, name="images")
...     index.insert_dataset(text_ds, name="texts")
...
...     # List all datasets (mixed types)
...     for entry in index.list_datasets():
...         print(f"{entry.name} -> {entry.schema_ref}")

Attributes

Name Description
data_store Optional data store for reading/writing shards.
datasets Lazily iterate over all dataset entries in this index.
schemas Lazily iterate over all schema records in this index.

Methods

Name Description
decode_schema Reconstruct a Python Packable type from a stored schema.
get_dataset Get a dataset entry by name or reference.
get_schema Get a schema record by reference.
insert_dataset Insert a dataset into the index.
list_datasets Get all dataset entries as a materialized list.
list_schemas Get all schema records as a materialized list.
publish_schema Publish a schema for a sample type.

decode_schema

AbstractIndex.decode_schema(ref)

Reconstruct a Python Packable type from a stored schema.

This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.

Parameters

Name Type Description Default
ref str Schema reference string (local:// or at://). required

Returns

Name Type Description
Type[Packable] A dynamically generated Packable class with fields matching
Type[Packable] the schema definition. The class can be used with
Type[Packable] Dataset[T] to load and iterate over samples.

Raises

Name Type Description
KeyError If schema not found.
ValueError If schema cannot be decoded (unsupported field types).

Examples

>>> entry = index.get_dataset("my-dataset")
>>> SampleType = index.decode_schema(entry.schema_ref)
>>> ds = Dataset[SampleType](entry.data_urls[0])
>>> for sample in ds.ordered():
...     print(sample)  # sample is instance of SampleType

get_dataset

AbstractIndex.get_dataset(ref)

Get a dataset entry by name or reference.

Parameters

Name Type Description Default
ref str Dataset name, path, or full reference string. required

Returns

Name Type Description
IndexEntry IndexEntry for the dataset.

Raises

Name Type Description
KeyError If dataset not found.

get_schema

AbstractIndex.get_schema(ref)

Get a schema record by reference.

Parameters

Name Type Description Default
ref str Schema reference string (local:// or at://). required

Returns

Name Type Description
dict Schema record as a dictionary with fields like ‘name’, ‘version’,
dict ‘fields’, etc.

Raises

Name Type Description
KeyError If schema not found.

insert_dataset

AbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)

Insert a dataset into the index.

The sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.

Parameters

Name Type Description Default
ds Dataset The Dataset to register in the index (any sample type). required
name str Human-readable name for the dataset. required
schema_ref Optional[str] Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type. None
**kwargs Additional backend-specific options. {}

Returns

Name Type Description
IndexEntry IndexEntry for the inserted dataset.

list_datasets

AbstractIndex.list_datasets()

Get all dataset entries as a materialized list.

Returns

Name Type Description
list[IndexEntry] List of IndexEntry for each dataset.

list_schemas

AbstractIndex.list_schemas()

Get all schema records as a materialized list.

Returns

Name Type Description
list[dict] List of schema records as dictionaries.

publish_schema

AbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)

Publish a schema for a sample type.

The sample_type is accepted as type rather than Type[Packable] to support @packable-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.

Parameters

Name Type Description Default
sample_type type A Packable type (PackableSample subclass or @packable-decorated). Validated at runtime via the @runtime_checkable Packable protocol. required
version str Semantic version string for the schema. '1.0.0'
**kwargs Additional backend-specific options. {}

Returns

Name Type Description
str Schema reference string:
str - Local: ‘local://schemas/{module.Class}@version
str - Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’