PDSBlobStore

atmosphere.PDSBlobStore(client)

PDS blob store implementing AbstractDataStore protocol.

Stores dataset shards as ATProto blobs, enabling decentralized dataset storage on the AT Protocol network.

Each shard is written to a temporary tar file, then uploaded as a blob to the user’s PDS. The returned URLs are AT URIs that can be resolved to HTTP URLs for streaming.

Attributes

Name Type Description
client 'AtmosphereClient' Authenticated AtmosphereClient instance.

Examples

>>> store = PDSBlobStore(client)
>>> urls = store.write_shards(dataset, prefix="training/v1")
>>> # Returns AT URIs like:
>>> # ['at://did:plc:abc/blob/bafyrei...', ...]

Methods

Name Description
create_source Create a BlobSource for reading these AT URIs.
read_url Resolve an AT URI blob reference to an HTTP URL.
supports_streaming PDS blobs support streaming via HTTP.
write_shards Write dataset shards as PDS blobs.

create_source

atmosphere.PDSBlobStore.create_source(urls)

Create a BlobSource for reading these AT URIs.

This is a convenience method for creating a DataSource that can stream the blobs written by this store.

Parameters

Name Type Description Default
urls list[str] List of AT URIs from write_shards(). required

Returns

Name Type Description
'BlobSource' BlobSource configured for the given URLs.

Raises

Name Type Description
ValueError If URLs are not valid AT URIs.

read_url

atmosphere.PDSBlobStore.read_url(url)

Resolve an AT URI blob reference to an HTTP URL.

Transforms at://did/blob/cid URIs to HTTP URLs that can be streamed by WebDataset.

Parameters

Name Type Description Default
url str AT URI in format at://{did}/blob/{cid}. required

Returns

Name Type Description
str HTTP URL for fetching the blob via PDS API.

Raises

Name Type Description
ValueError If URL format is invalid or PDS cannot be resolved.

supports_streaming

atmosphere.PDSBlobStore.supports_streaming()

PDS blobs support streaming via HTTP.

Returns

Name Type Description
bool True.

write_shards

atmosphere.PDSBlobStore.write_shards(
    ds,
    *,
    prefix,
    maxcount=10000,
    maxsize=3000000000.0,
    **kwargs,
)

Write dataset shards as PDS blobs.

Creates tar archives from the dataset and uploads each as a blob to the authenticated user’s PDS.

Parameters

Name Type Description Default
ds 'Dataset' The Dataset to write. required
prefix str Logical path prefix for naming (used in shard names only). required
maxcount int Maximum samples per shard (default: 10000). 10000
maxsize float Maximum shard size in bytes (default: 3GB, PDS limit). 3000000000.0
**kwargs Any Additional args passed to wds.ShardWriter. {}

Returns

Name Type Description
list[str] List of AT URIs for the written blobs, in format:
list[str] at://{did}/blob/{cid}

Raises

Name Type Description
ValueError If not authenticated.
RuntimeError If no shards were written.

Note

PDS blobs have size limits (typically 50MB-5GB depending on PDS). Adjust maxcount/maxsize to stay within limits.