AbstractIndex

AbstractIndex()

Protocol for index operations - implemented by LocalIndex and AtmosphereIndex.

This protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses

A single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.

Optional Extensions

Some index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.

Examples

>>> def publish_and_list(index: AbstractIndex) -> None:
...     # Publish schemas for different types
...     schema1 = index.publish_schema(ImageSample, version="1.0.0")
...     schema2 = index.publish_schema(TextSample, version="1.0.0")
...
...     # Insert datasets of different types
...     index.insert_dataset(image_ds, name="images")
...     index.insert_dataset(text_ds, name="texts")
...
...     # List all datasets (mixed types)
...     for entry in index.list_datasets():
...         print(f"{entry.name} -> {entry.schema_ref}")

Attributes

Name	Description
data_store	Optional data store for reading/writing shards.
datasets	Lazily iterate over all dataset entries in this index.
schemas	Lazily iterate over all schema records in this index.

Methods

Name	Description
decode_schema	Reconstruct a Python Packable type from a stored schema.
get_dataset	Get a dataset entry by name or reference.
get_schema	Get a schema record by reference.
insert_dataset	Insert a dataset into the index.
list_datasets	Get all dataset entries as a materialized list.
list_schemas	Get all schema records as a materialized list.
publish_schema	Publish a schema for a sample type.

decode_schema

AbstractIndex.decode_schema(ref)

Reconstruct a Python Packable type from a stored schema.

This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string (local:// or at://).	required

Returns

Name	Type	Description
	Type[Packable]	A dynamically generated Packable class with fields matching
	Type[Packable]	the schema definition. The class can be used with
	Type[Packable]	`Dataset[T]` to load and iterate over samples.

Raises

Name	Type	Description
	KeyError	If schema not found.
	ValueError	If schema cannot be decoded (unsupported field types).

Examples

>>> entry = index.get_dataset("my-dataset")
>>> SampleType = index.decode_schema(entry.schema_ref)
>>> ds = Dataset[SampleType](entry.data_urls[0])
>>> for sample in ds.ordered():
...     print(sample)  # sample is instance of SampleType

get_dataset

AbstractIndex.get_dataset(ref)

Get a dataset entry by name or reference.

Parameters

Name	Type	Description	Default
ref	str	Dataset name, path, or full reference string.	required

Returns

Name	Type	Description
	IndexEntry	IndexEntry for the dataset.

Raises

Name	Type	Description
	KeyError	If dataset not found.

get_schema

AbstractIndex.get_schema(ref)

Get a schema record by reference.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string (local:// or at://).	required

Returns

Name	Type	Description
	dict	Schema record as a dictionary with fields like ‘name’, ‘version’,
	dict	‘fields’, etc.

Raises

Name	Type	Description
	KeyError	If schema not found.

insert_dataset

AbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)

Insert a dataset into the index.

The sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.

Parameters

Name	Type	Description	Default
ds	Dataset	The Dataset to register in the index (any sample type).	required
name	str	Human-readable name for the dataset.	required
schema_ref	Optional[str]	Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.	`None`
**kwargs		Additional backend-specific options.	`{}`

Returns

Name	Type	Description
	IndexEntry	IndexEntry for the inserted dataset.

list_datasets

AbstractIndex.list_datasets()

Get all dataset entries as a materialized list.

Returns

Name	Type	Description
	list[IndexEntry]	List of IndexEntry for each dataset.

list_schemas

AbstractIndex.list_schemas()

Get all schema records as a materialized list.

Returns

Name	Type	Description
	list[dict]	List of schema records as dictionaries.

publish_schema

AbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)

Publish a schema for a sample type.

The sample_type is accepted as type rather than Type[Packable] to support @packable-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.

Parameters

Name	Type	Description	Default
sample_type	type	A Packable type (PackableSample subclass or @packable-decorated). Validated at runtime via the @runtime_checkable Packable protocol.	required
version	str	Semantic version string for the schema.	`'1.0.0'`
**kwargs		Additional backend-specific options.	`{}`

Returns

Name	Type	Description
	str	Schema reference string:
	str	- Local: ‘local://schemas/{module.Class}@version’
	str	- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’