local.S3DataStore
local.S3DataStore(credentials, * , bucket)
S3-compatible data store implementing AbstractDataStore protocol.
Handles writing dataset shards to S3-compatible object storage and resolving URLs for reading.
Attributes
credentials
S3 credentials dictionary.
bucket
Target bucket name.
_fs
S3FileSystem instance.
Methods
read_url
local.S3DataStore.read_url(url)
Resolve an S3 URL for reading/streaming.
For S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.
For standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.
Parameters
url
str
S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).
required
Returns
str
HTTPS URL if custom endpoint is configured, otherwise unchanged.
Example
str
‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’
supports_streaming
local.S3DataStore.supports_streaming()
S3 supports streaming reads.
write_shards
local.S3DataStore.write_shards(ds, * , prefix, cache_local= False , ** kwargs)
Write dataset shards to S3.
Parameters
ds
Dataset
The Dataset to write.
required
prefix
str
Path prefix within bucket (e.g., ‘datasets/mnist/v1’).
required
cache_local
bool
If True, write locally first then copy to S3.
False
**kwargs
Additional args passed to wds.ShardWriter (e.g., maxcount).
{}
Returns
list [str ]
List of S3 URLs for the written shards.