local.Index

local.Index(
    redis=None,
    data_store=None,
    auto_stubs=False,
    stub_dir=None,
    **kwargs,
)

Redis-backed index for tracking datasets in a repository.

Implements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.

When initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.

Attributes

Name Type Description
_redis Redis connection for index storage.
_data_store Optional AbstractDataStore for writing dataset shards.

Methods

Name Description
add_entry Add a dataset to the index.
clear_stubs Remove all auto-generated stub files.
decode_schema Reconstruct a Python PackableSample type from a stored schema.
decode_schema_as Decode a schema with explicit type hint for IDE support.
get_dataset Get a dataset entry by name (AbstractIndex protocol).
get_entry Get an entry by its CID.
get_entry_by_name Get an entry by its human-readable name.
get_import_path Get the import path for a schema’s generated module.
get_schema Get a schema record by reference (AbstractIndex protocol).
get_schema_record Get a schema record as LocalSchemaRecord object.
insert_dataset Insert a dataset into the index (AbstractIndex protocol).
list_datasets Get all dataset entries as a materialized list (AbstractIndex protocol).
list_entries Get all index entries as a materialized list.
list_schemas Get all schema records as a materialized list (AbstractIndex protocol).
load_schema Load a schema and make it available in the types namespace.
publish_schema Publish a schema for a sample type to Redis.

add_entry

local.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)

Add a dataset to the index.

Creates a LocalDatasetEntry for the dataset and persists it to Redis.

Parameters

Name Type Description Default
ds Dataset The dataset to add to the index. required
name str Human-readable name for the dataset. required
schema_ref str | None Optional schema reference. If None, generates from sample type. None
metadata dict | None Optional metadata dictionary. If None, uses ds._metadata if available. None

Returns

Name Type Description
LocalDatasetEntry The created LocalDatasetEntry object.

clear_stubs

local.Index.clear_stubs()

Remove all auto-generated stub files.

Only works if auto_stubs was enabled when creating the Index.

Returns

Name Type Description
int Number of stub files removed, or 0 if auto_stubs is disabled.

decode_schema

local.Index.decode_schema(ref)

Reconstruct a Python PackableSample type from a stored schema.

This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.

If auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.

Parameters

Name Type Description Default
ref str Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…). required

Returns

Name Type Description
Type[Packable] A PackableSample subclass - either imported from a generated module
Type[Packable] (if auto_stubs is enabled) or dynamically created.

Raises

Name Type Description
KeyError If schema not found.
ValueError If schema cannot be decoded.

decode_schema_as

local.Index.decode_schema_as(ref, type_hint)

Decode a schema with explicit type hint for IDE support.

This is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.

Parameters

Name Type Description Default
ref str Schema reference string. required
type_hint type[T] The stub type to use for type hints. Import this from the generated stub file. required

Returns

Name Type Description
type[T] The decoded type, cast to match the type_hint for IDE support.

Examples

>>> # After enabling auto_stubs and configuring IDE extraPaths:
>>> from local.MySample_1_0_0 import MySample
>>>
>>> # This gives full IDE autocomplete:
>>> DecodedType = index.decode_schema_as(ref, MySample)
>>> sample = DecodedType(text="hello", value=42)  # IDE knows signature!

Note

The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.

get_dataset

local.Index.get_dataset(ref)

Get a dataset entry by name (AbstractIndex protocol).

Parameters

Name Type Description Default
ref str Dataset name. required

Returns

Name Type Description
LocalDatasetEntry IndexEntry for the dataset.

Raises

Name Type Description
KeyError If dataset not found.

get_entry

local.Index.get_entry(cid)

Get an entry by its CID.

Parameters

Name Type Description Default
cid str Content identifier of the entry. required

Returns

Name Type Description
LocalDatasetEntry LocalDatasetEntry for the given CID.

Raises

Name Type Description
KeyError If entry not found.

get_entry_by_name

local.Index.get_entry_by_name(name)

Get an entry by its human-readable name.

Parameters

Name Type Description Default
name str Human-readable name of the entry. required

Returns

Name Type Description
LocalDatasetEntry LocalDatasetEntry with the given name.

Raises

Name Type Description
KeyError If no entry with that name exists.

get_import_path

local.Index.get_import_path(ref)

Get the import path for a schema’s generated module.

When auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.

Parameters

Name Type Description Default
ref str Schema reference string. required

Returns

Name Type Description
str | None Import path like “local.MySample_1_0_0”, or None if auto_stubs
str | None is disabled.

Examples

>>> index = LocalIndex(auto_stubs=True)
>>> ref = index.publish_schema(MySample, version="1.0.0")
>>> index.load_schema(ref)
>>> print(index.get_import_path(ref))
local.MySample_1_0_0
>>> # Then in your code:
>>> # from local.MySample_1_0_0 import MySample

get_schema

local.Index.get_schema(ref)

Get a schema record by reference (AbstractIndex protocol).

Parameters

Name Type Description Default
ref str Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version). required

Returns

Name Type Description
dict Schema record as a dictionary with keys ‘name’, ‘version’,
dict ‘fields’, ‘$ref’, etc.

Raises

Name Type Description
KeyError If schema not found.
ValueError If reference format is invalid.

get_schema_record

local.Index.get_schema_record(ref)

Get a schema record as LocalSchemaRecord object.

Use this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.

Parameters

Name Type Description Default
ref str Schema reference string. required

Returns

Name Type Description
LocalSchemaRecord LocalSchemaRecord with schema details.

Raises

Name Type Description
KeyError If schema not found.
ValueError If reference format is invalid.

insert_dataset

local.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)

Insert a dataset into the index (AbstractIndex protocol).

If a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.

Parameters

Name Type Description Default
ds Dataset The Dataset to register. required
name str Human-readable name for the dataset. required
schema_ref str | None Optional schema reference. None
**kwargs Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first {}

Returns

Name Type Description
LocalDatasetEntry IndexEntry for the inserted dataset.

list_datasets

local.Index.list_datasets()

Get all dataset entries as a materialized list (AbstractIndex protocol).

Returns

Name Type Description
list[LocalDatasetEntry] List of IndexEntry for each dataset.

list_entries

local.Index.list_entries()

Get all index entries as a materialized list.

Returns

Name Type Description
list[LocalDatasetEntry] List of all LocalDatasetEntry objects in the index.

list_schemas

local.Index.list_schemas()

Get all schema records as a materialized list (AbstractIndex protocol).

Returns

Name Type Description
list[dict] List of schema records as dictionaries.

load_schema

local.Index.load_schema(ref)

Load a schema and make it available in the types namespace.

This method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.

Parameters

Name Type Description Default
ref str Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…). required

Returns

Name Type Description
Type[Packable] The decoded PackableSample subclass. Also available via
Type[Packable] index.types.<ClassName> after this call.

Raises

Name Type Description
KeyError If schema not found.
ValueError If schema cannot be decoded.

Examples

>>> # Load and use immediately
>>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0")
>>> sample = MyType(name="hello", value=42)
>>>
>>> # Or access later via namespace
>>> index.load_schema("atdata://local/sampleSchema/OtherType@1.0.0")
>>> other = index.types.OtherType(data="test")

publish_schema

local.Index.publish_schema(sample_type, *, version=None, description=None)

Publish a schema for a sample type to Redis.

Parameters

Name Type Description Default
sample_type type A Packable type (@packable-decorated or PackableSample subclass). required
version str | None Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists. None
description str | None Optional human-readable description. If None, uses the class docstring. None

Returns

Name Type Description
str Schema reference string: ‘atdata://local/sampleSchema/{name}@version’.

Raises

Name Type Description
ValueError If sample_type is not a dataclass.
TypeError If sample_type doesn’t satisfy the Packable protocol, or if a field type is not supported.