local.Index

local.Index(
    redis=None,
    data_store=None,
    auto_stubs=False,
    stub_dir=None,
    **kwargs,
)

Redis-backed index for tracking datasets in a repository.

Implements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.

When initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.

Attributes

Name	Type	Description
_redis		Redis connection for index storage.
_data_store		Optional AbstractDataStore for writing dataset shards.

Methods

Name	Description
add_entry	Add a dataset to the index.
clear_stubs	Remove all auto-generated stub files.
decode_schema	Reconstruct a Python PackableSample type from a stored schema.
decode_schema_as	Decode a schema with explicit type hint for IDE support.
get_dataset	Get a dataset entry by name (AbstractIndex protocol).
get_entry	Get an entry by its CID.
get_entry_by_name	Get an entry by its human-readable name.
get_import_path	Get the import path for a schema’s generated module.
get_schema	Get a schema record by reference (AbstractIndex protocol).
get_schema_record	Get a schema record as LocalSchemaRecord object.
insert_dataset	Insert a dataset into the index (AbstractIndex protocol).
list_datasets	Get all dataset entries as a materialized list (AbstractIndex protocol).
list_entries	Get all index entries as a materialized list.
list_schemas	Get all schema records as a materialized list (AbstractIndex protocol).
load_schema	Load a schema and make it available in the types namespace.
publish_schema	Publish a schema for a sample type to Redis.

add_entry

local.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)

Add a dataset to the index.

Creates a LocalDatasetEntry for the dataset and persists it to Redis.

Parameters

Name	Type	Description	Default
ds	Dataset	The dataset to add to the index.	required
name	str	Human-readable name for the dataset.	required
schema_ref	str \| None	Optional schema reference. If None, generates from sample type.	`None`
metadata	dict \| None	Optional metadata dictionary. If None, uses ds._metadata if available.	`None`

Returns

Name	Type	Description
	LocalDatasetEntry	The created LocalDatasetEntry object.

clear_stubs

local.Index.clear_stubs()

Remove all auto-generated stub files.

Only works if auto_stubs was enabled when creating the Index.

Returns

Name	Type	Description
	int	Number of stub files removed, or 0 if auto_stubs is disabled.

decode_schema

local.Index.decode_schema(ref)

Reconstruct a Python PackableSample type from a stored schema.

This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.

If auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).	required

Returns

Name	Type	Description
	Type[Packable]	A PackableSample subclass - either imported from a generated module
	Type[Packable]	(if auto_stubs is enabled) or dynamically created.

Raises

Name	Type	Description
	KeyError	If schema not found.
	ValueError	If schema cannot be decoded.

decode_schema_as

local.Index.decode_schema_as(ref, type_hint)

Decode a schema with explicit type hint for IDE support.

This is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string.	required
type_hint	type[T]	The stub type to use for type hints. Import this from the generated stub file.	required

Returns

Name	Type	Description
	type[T]	The decoded type, cast to match the type_hint for IDE support.

Examples

>>> # After enabling auto_stubs and configuring IDE extraPaths:
>>> from local.MySample_1_0_0 import MySample
>>>
>>> # This gives full IDE autocomplete:
>>> DecodedType = index.decode_schema_as(ref, MySample)
>>> sample = DecodedType(text="hello", value=42)  # IDE knows signature!

Note

The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.

get_dataset

local.Index.get_dataset(ref)

Get a dataset entry by name (AbstractIndex protocol).

Parameters

Name	Type	Description	Default
ref	str	Dataset name.	required

Returns

Name	Type	Description
	LocalDatasetEntry	IndexEntry for the dataset.

Raises

Name	Type	Description
	KeyError	If dataset not found.

get_entry

local.Index.get_entry(cid)

Get an entry by its CID.

Parameters

Name	Type	Description	Default
cid	str	Content identifier of the entry.	required

Returns

Name	Type	Description
	LocalDatasetEntry	LocalDatasetEntry for the given CID.

Raises

Name	Type	Description
	KeyError	If entry not found.

get_entry_by_name

local.Index.get_entry_by_name(name)

Get an entry by its human-readable name.

Parameters

Name	Type	Description	Default
name	str	Human-readable name of the entry.	required

Returns

Name	Type	Description
	LocalDatasetEntry	LocalDatasetEntry with the given name.

Raises

Name	Type	Description
	KeyError	If no entry with that name exists.

get_import_path

local.Index.get_import_path(ref)

Get the import path for a schema’s generated module.

When auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string.	required

Returns

Name	Type	Description
	str \| None	Import path like “local.MySample_1_0_0”, or None if auto_stubs
	str \| None	is disabled.

Examples

>>> index = LocalIndex(auto_stubs=True)
>>> ref = index.publish_schema(MySample, version="1.0.0")
>>> index.load_schema(ref)
>>> print(index.get_import_path(ref))
local.MySample_1_0_0
>>> # Then in your code:
>>> # from local.MySample_1_0_0 import MySample

get_schema

local.Index.get_schema(ref)

Get a schema record by reference (AbstractIndex protocol).

Parameters

Name	Type	Description	Default
ref	str	Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).	required

Returns

Name	Type	Description
	dict	Schema record as a dictionary with keys ‘name’, ‘version’,
	dict	‘fields’, ‘$ref’, etc.

Raises

Name	Type	Description
	KeyError	If schema not found.
	ValueError	If reference format is invalid.

get_schema_record

local.Index.get_schema_record(ref)

Get a schema record as LocalSchemaRecord object.

Use this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string.	required

Returns

Name	Type	Description
	LocalSchemaRecord	LocalSchemaRecord with schema details.

Raises

Name	Type	Description
	KeyError	If schema not found.
	ValueError	If reference format is invalid.

insert_dataset

local.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)

Insert a dataset into the index (AbstractIndex protocol).

If a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.

Parameters

Name	Type	Description	Default
ds	Dataset	The Dataset to register.	required
name	str	Human-readable name for the dataset.	required
schema_ref	str \| None	Optional schema reference.	`None`
**kwargs		Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first	`{}`

Returns

Name	Type	Description
	LocalDatasetEntry	IndexEntry for the inserted dataset.

list_datasets

local.Index.list_datasets()

Get all dataset entries as a materialized list (AbstractIndex protocol).

Returns

Name	Type	Description
	list[LocalDatasetEntry]	List of IndexEntry for each dataset.

list_entries

local.Index.list_entries()

Get all index entries as a materialized list.

Returns

Name	Type	Description
	list[LocalDatasetEntry]	List of all LocalDatasetEntry objects in the index.

list_schemas

local.Index.list_schemas()

Get all schema records as a materialized list (AbstractIndex protocol).

Returns

Name	Type	Description
	list[dict]	List of schema records as dictionaries.

load_schema

local.Index.load_schema(ref)

Load a schema and make it available in the types namespace.

This method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.

Parameters

Name	Type	Description	Default
ref	str	Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).	required

Returns

Name	Type	Description
	Type[Packable]	The decoded PackableSample subclass. Also available via
	Type[Packable]	`index.types.<ClassName>` after this call.

Raises

Name	Type	Description
	KeyError	If schema not found.
	ValueError	If schema cannot be decoded.

Examples

>>> # Load and use immediately
>>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0")
>>> sample = MyType(name="hello", value=42)
>>>
>>> # Or access later via namespace
>>> index.load_schema("atdata://local/sampleSchema/OtherType@1.0.0")
>>> other = index.types.OtherType(data="test")

publish_schema

local.Index.publish_schema(sample_type, *, version=None, description=None)

Publish a schema for a sample type to Redis.

Parameters

Name	Type	Description	Default
sample_type	type	A Packable type (@packable-decorated or PackableSample subclass).	required
version	str \| None	Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.	`None`
description	str \| None	Optional human-readable description. If None, uses the class docstring.	`None`

Returns

Name	Type	Description
	str	Schema reference string: ‘atdata://local/sampleSchema/{name}@version’.

Raises

Name	Type	Description
	ValueError	If sample_type is not a dataclass.
	TypeError	If sample_type doesn’t satisfy the Packable protocol, or if a field type is not supported.