API Reference

Core

Core types, decorators, and dataset classes

packable Decorator to convert a regular class into a PackableSample.
PackableSample Base class for samples that can be serialized with msgpack.
DictSample Dynamic sample type providing dict-like access to raw msgpack data.
Dataset A typed dataset built on WebDataset with lens transformations.
SampleBatch A batch of samples with automatic attribute aggregation.
Lens A bidirectional transformation between two sample types.
lens Lens-based type transformations for datasets.
load_dataset Load a dataset from local files, remote URLs, or an index.
DatasetDict A dictionary of split names to Dataset instances.

Protocols

Abstract protocols for storage backends

Packable Structural protocol for packable sample types.
IndexEntry Common interface for index entries (local or atmosphere).
AbstractIndex Protocol for index operations - implemented by LocalIndex and AtmosphereIndex.
AbstractDataStore Protocol for data storage operations.
DataSource Protocol for data sources that provide streams to Dataset.

Data Sources

Data source implementations for streaming

URLSource Data source for WebDataset-compatible URLs.
S3Source Data source for S3-compatible storage with explicit credentials.
BlobSource Data source for ATProto PDS blob storage.

Local Storage

Local Redis/S3 storage backend

local.Index Redis-backed index for tracking datasets in a repository.
local.LocalDatasetEntry Index entry for a dataset stored in the local repository.
local.S3DataStore S3-compatible data store implementing AbstractDataStore protocol.

Atmosphere

ATProto federation

AtmosphereClient ATProto client wrapper for atdata operations.
AtmosphereIndex ATProto index implementing AbstractIndex protocol.
AtmosphereIndexEntry Entry wrapper for ATProto dataset records implementing IndexEntry protocol.
PDSBlobStore PDS blob store implementing AbstractDataStore protocol.
SchemaPublisher Publishes PackableSample schemas to ATProto.
SchemaLoader Loads PackableSample schemas from ATProto.
DatasetPublisher Publishes dataset index records to ATProto.
DatasetLoader Loads dataset records from ATProto.
LensPublisher Publishes Lens transformation records to ATProto.
LensLoader Loads lens records from ATProto.
AtUri Parsed AT Protocol URI.

Promotion

Local to atmosphere migration

promote_to_atmosphere Promote a local dataset to the atmosphere network.