Protocol for index operations - implemented by LocalIndex and AtmosphereIndex.
This protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses
A single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.
Optional Extensions
Some index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.
Examples
>>>def publish_and_list(index: AbstractIndex) ->None:... # Publish schemas for different types... schema1 = index.publish_schema(ImageSample, version="1.0.0")... schema2 = index.publish_schema(TextSample, version="1.0.0")...... # Insert datasets of different types... index.insert_dataset(image_ds, name="images")... index.insert_dataset(text_ds, name="texts")...... # List all datasets (mixed types)... for entry in index.list_datasets():... print(f"{entry.name} -> {entry.schema_ref}")
Reconstruct a Python Packable type from a stored schema.
This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.
The sample_type is accepted as type rather than Type[Packable] to support @packable-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.