URLSource
URLSource(url)Data source for WebDataset-compatible URLs.
Wraps WebDataset’s gopen to open URLs using built-in handlers for http, https, pipe, gs, hf, sftp, etc. Supports brace expansion for shard patterns like “data-{000..099}.tar”.
This is the default source type when a string URL is passed to Dataset.
Attributes
| Name | Type | Description |
|---|---|---|
| url | str | URL or brace pattern for the shards. |
Examples
>>> source = URLSource("https://example.com/train-{000..009}.tar")
>>> for shard_id, stream in source.shards:
... print(f"Streaming {shard_id}")Methods
| Name | Description |
|---|---|
| list_shards | Expand brace pattern and return list of shard URLs. |
| open_shard | Open a single shard by URL. |
list_shards
URLSource.list_shards()Expand brace pattern and return list of shard URLs.
open_shard
URLSource.open_shard(shard_id)Open a single shard by URL.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| shard_id | str | URL of the shard to open. | required |
Returns
| Name | Type | Description |
|---|---|---|
| IO[bytes] | File-like stream from gopen. |
Raises
| Name | Type | Description |
|---|---|---|
| KeyError | If shard_id is not in list_shards(). |