Deployment Guide
This guide covers deploying atdata in production environments, including Redis setup for LocalIndex, S3 storage configuration, and ATProto publishing considerations.
Local Storage Deployment
The local storage backend uses Redis for metadata indexing and S3-compatible storage for dataset files.
Redis Setup
Requirements
- Redis 6.0+ (for Redis-OM compatibility)
- Sufficient memory for index metadata (typically < 100MB for most deployments)
Docker Deployment
# Basic Redis
docker run -d \
--name atdata-redis \
-p 6379:6379 \
-v redis-data:/data \
redis:7-alpine \
redis-server --appendonly yes
# With password
docker run -d \
--name atdata-redis \
-p 6379:6379 \
-v redis-data:/data \
redis:7-alpine \
redis-server --appendonly yes --requirepass yourpasswordConfiguration
from redis import Redis
from atdata.local import LocalIndex
# Basic connection
redis = Redis(host="localhost", port=6379)
index = LocalIndex(redis=redis)
# With authentication
redis = Redis(
host="redis.example.com",
port=6379,
password="yourpassword",
ssl=True, # For production
)
index = LocalIndex(redis=redis)Redis Clustering
For high-availability deployments:
from redis.cluster import RedisCluster
# Redis Cluster connection
redis = RedisCluster(
host="redis-cluster.example.com",
port=6379,
password="yourpassword",
)
index = LocalIndex(redis=redis)Redis-OM (used internally) supports Redis Cluster mode. Ensure all nodes have the same configuration.
S3 Storage Setup
AWS S3
from atdata.local import S3DataStore
# Using environment credentials (recommended for AWS)
# Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
store = S3DataStore(
bucket="my-atdata-bucket",
prefix="datasets/",
)
# Explicit credentials
store = S3DataStore(
bucket="my-atdata-bucket",
prefix="datasets/",
credentials={
"AWS_ACCESS_KEY_ID": "...",
"AWS_SECRET_ACCESS_KEY": "...",
"AWS_DEFAULT_REGION": "us-west-2",
},
)S3-Compatible Storage (MinIO, Cloudflare R2, etc.)
store = S3DataStore(
bucket="my-bucket",
prefix="datasets/",
endpoint_url="https://s3.example.com",
credentials={
"AWS_ACCESS_KEY_ID": "...",
"AWS_SECRET_ACCESS_KEY": "...",
},
)MinIO Deployment
# Docker deployment
docker run -d \
--name minio \
-p 9000:9000 \
-p 9001:9001 \
-v minio-data:/data \
-e MINIO_ROOT_USER=minioadmin \
-e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ":9001"store = S3DataStore(
bucket="atdata",
endpoint_url="http://localhost:9000",
credentials={
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
},
)Production Checklist
ATProto Deployment
Account Setup
- Create a Bluesky account or use your existing account
- Generate an app-specific password at bsky.app/settings/app-passwords
- Never use your main account password in code
Security: Always use app passwords, never your main password. App passwords can be revoked without affecting your account.
Authentication Patterns
Environment Variables (Recommended)
import os
from atdata.atmosphere import AtmosphereClient
client = AtmosphereClient()
client.login(
os.environ["ATPROTO_HANDLE"],
os.environ["ATPROTO_APP_PASSWORD"],
)Session Persistence
For long-running services, persist and reuse sessions:
import os
from pathlib import Path
SESSION_FILE = Path("~/.atdata/session").expanduser()
client = AtmosphereClient()
if SESSION_FILE.exists():
# Restore existing session
session_string = SESSION_FILE.read_text()
try:
client.login_with_session(session_string)
except Exception:
# Session expired, re-authenticate
client.login(handle, app_password)
SESSION_FILE.parent.mkdir(parents=True, exist_ok=True)
SESSION_FILE.write_text(client.export_session())
else:
# Initial login
client.login(handle, app_password)
SESSION_FILE.parent.mkdir(parents=True, exist_ok=True)
SESSION_FILE.write_text(client.export_session())Custom PDS Deployment
For self-hosted ATProto infrastructure:
client = AtmosphereClient(base_url="https://pds.example.com")
client.login("handle.example.com", "app-password")See ATProto PDS documentation for self-hosting setup.
Rate Limiting Considerations
ATProto has rate limits. For bulk operations:
- Space out record creation (1-2 per second for bulk uploads)
- Use batch operations where available
- Implement exponential backoff for retries
- Consider blob storage limits (~50MB per blob)
import time
for i, dataset in enumerate(datasets_to_publish):
index.insert_dataset(dataset, name=f"dataset-{i}", ...)
time.sleep(1) # Rate limitingDocker Compose Example
Complete local deployment with Redis and MinIO:
# docker-compose.yml
version: '3.8'
services:
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
ports:
- "6379:6379"
volumes:
- redis-data:/data
minio:
image: minio/minio
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: ${MINIO_USER}
MINIO_ROOT_PASSWORD: ${MINIO_PASSWORD}
volumes:
- minio-data:/data
volumes:
redis-data:
minio-data:# .env
REDIS_PASSWORD=your-redis-password
MINIO_USER=minioadmin
MINIO_PASSWORD=your-minio-passwordMonitoring
Redis Metrics
Key metrics to monitor:
used_memory: Memory usageconnected_clients: Active connectionskeyspace_hits/misses: Cache efficiencyaof_last_write_status: Persistence health
redis-cli INFO | grep -E "used_memory|connected_clients|keyspace"S3 Metrics
- Request counts and latency
- Error rates (4xx, 5xx)
- Storage usage by prefix
- Data transfer costs
Security Best Practices
- Network Isolation: Run Redis and S3 in private networks
- TLS Everywhere: Encrypt connections to Redis and S3
- Credential Rotation: Rotate API keys and passwords regularly
- Access Logging: Enable S3 access logging for audit trails
- Least Privilege: Use minimal IAM permissions for S3 access
S3 IAM Policy Example
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-atdata-bucket",
"arn:aws:s3:::my-atdata-bucket/*"
]
}
]
}