BlobSource

BlobSource(blob_refs, pds_endpoint=None, _endpoint_cache=dict())

Data source for ATProto PDS blob storage.

Streams dataset shards stored as blobs on an ATProto Personal Data Server. Each shard is identified by a blob reference containing the DID and CID.

This source resolves blob references to HTTP URLs and streams the content directly, supporting efficient iteration over shards without downloading everything upfront.

Attributes

Name Type Description
blob_refs list[dict[str, str]] List of blob reference dicts with ‘did’ and ‘cid’ keys.
pds_endpoint str | None Optional PDS endpoint URL. If not provided, resolved from DID.

Examples

>>> source = BlobSource(
...     blob_refs=[
...         {"did": "did:plc:abc123", "cid": "bafyrei..."},
...         {"did": "did:plc:abc123", "cid": "bafyrei..."},
...     ],
... )
>>> for shard_id, stream in source.shards:
...     process(stream)

Methods

Name Description
from_refs Create BlobSource from blob reference dicts.
list_shards Return list of AT URI-style shard identifiers.
open_shard Open a single shard by its AT URI.

from_refs

BlobSource.from_refs(refs, *, pds_endpoint=None)

Create BlobSource from blob reference dicts.

Accepts blob references in the format returned by upload_blob: {"$type": "blob", "ref": {"$link": "cid"}, ...}

Also accepts simplified format: {"did": "...", "cid": "..."}

Parameters

Name Type Description Default
refs list[dict] List of blob reference dicts. required
pds_endpoint str | None Optional PDS endpoint to use for all blobs. None

Returns

Name Type Description
'BlobSource' Configured BlobSource.

Raises

Name Type Description
ValueError If refs is empty or format is invalid.

list_shards

BlobSource.list_shards()

Return list of AT URI-style shard identifiers.

open_shard

BlobSource.open_shard(shard_id)

Open a single shard by its AT URI.

Parameters

Name Type Description Default
shard_id str AT URI of the shard (at://did/blob/cid). required

Returns

Name Type Description
IO[bytes] Streaming response body for reading the blob.

Raises

Name Type Description
KeyError If shard_id is not in list_shards().
ValueError If shard_id format is invalid.