local.S3DataStore

local.S3DataStore(credentials, *, bucket)

S3-compatible data store implementing AbstractDataStore protocol.

Handles writing dataset shards to S3-compatible object storage and resolving URLs for reading.

Attributes

Name Type Description
credentials S3 credentials dictionary.
bucket Target bucket name.
_fs S3FileSystem instance.

Methods

Name Description
read_url Resolve an S3 URL for reading/streaming.
supports_streaming S3 supports streaming reads.
write_shards Write dataset shards to S3.

read_url

local.S3DataStore.read_url(url)

Resolve an S3 URL for reading/streaming.

For S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.

For standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.

Parameters

Name Type Description Default
url str S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’). required

Returns

Name Type Description
str HTTPS URL if custom endpoint is configured, otherwise unchanged.
Example str ‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’

supports_streaming

local.S3DataStore.supports_streaming()

S3 supports streaming reads.

Returns

Name Type Description
bool True.

write_shards

local.S3DataStore.write_shards(ds, *, prefix, cache_local=False, **kwargs)

Write dataset shards to S3.

Parameters

Name Type Description Default
ds Dataset The Dataset to write. required
prefix str Path prefix within bucket (e.g., ‘datasets/mnist/v1’). required
cache_local bool If True, write locally first then copy to S3. False
**kwargs Additional args passed to wds.ShardWriter (e.g., maxcount). {}

Returns

Name Type Description
list[str] List of S3 URLs for the written shards.

Raises

Name Type Description
RuntimeError If no shards were written.