Cloud GPU Services for AI/ML Workloads
When a local GPU is not available, cloud GPU services offer on-demand access to powerful hardware for training and inference workloads.
Services
- Modal — Serverless GPU compute with per-second billing. Good for inference endpoints and batch jobs. modal.com/pricing
- RunPod — On-demand and spot GPU instances with persistent storage. Supports custom Docker images. runpod.io
Choosing a Service
- Use Modal for event-driven inference (pay per call, no idle costs).
- Use RunPod for long-running training jobs or when you need a persistent GPU environment.
- For AWS-native workloads, consider EC2
p3/g4dninstances or Amazon SageMaker.
Notes
- Compare spot/interruptible pricing for large training runs — significant cost savings are possible.
- Ensure your Docker image or environment matches the CUDA version available on the chosen GPU.