BlobSource
BlobSource(blob_refs, pds_endpoint=None, _endpoint_cache=dict())Data source for ATProto PDS blob storage.
Streams dataset shards stored as blobs on an ATProto Personal Data Server. Each shard is identified by a blob reference containing the DID and CID.
This source resolves blob references to HTTP URLs and streams the content directly, supporting efficient iteration over shards without downloading everything upfront.
Attributes
| Name | Type | Description |
|---|---|---|
| blob_refs | list[dict[str, str]] | List of blob reference dicts with ‘did’ and ‘cid’ keys. |
| pds_endpoint | str | None | Optional PDS endpoint URL. If not provided, resolved from DID. |
Examples
>>> source = BlobSource(
... blob_refs=[
... {"did": "did:plc:abc123", "cid": "bafyrei..."},
... {"did": "did:plc:abc123", "cid": "bafyrei..."},
... ],
... )
>>> for shard_id, stream in source.shards:
... process(stream)Methods
| Name | Description |
|---|---|
| from_refs | Create BlobSource from blob reference dicts. |
| list_shards | Return list of AT URI-style shard identifiers. |
| open_shard | Open a single shard by its AT URI. |
from_refs
BlobSource.from_refs(refs, *, pds_endpoint=None)Create BlobSource from blob reference dicts.
Accepts blob references in the format returned by upload_blob: {"$type": "blob", "ref": {"$link": "cid"}, ...}
Also accepts simplified format: {"did": "...", "cid": "..."}
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| refs | list[dict] | List of blob reference dicts. | required |
| pds_endpoint | str | None | Optional PDS endpoint to use for all blobs. | None |
Returns
| Name | Type | Description |
|---|---|---|
| 'BlobSource' | Configured BlobSource. |
Raises
| Name | Type | Description |
|---|---|---|
| ValueError | If refs is empty or format is invalid. |
list_shards
BlobSource.list_shards()Return list of AT URI-style shard identifiers.
open_shard
BlobSource.open_shard(shard_id)Open a single shard by its AT URI.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| shard_id | str | AT URI of the shard (at://did/blob/cid). | required |
Returns
| Name | Type | Description |
|---|---|---|
| IO[bytes] | Streaming response body for reading the blob. |
Raises
| Name | Type | Description |
|---|---|---|
| KeyError | If shard_id is not in list_shards(). | |
| ValueError | If shard_id format is invalid. |