GPU Cloud Pricing
H100 Cloud Pricing Comparison 2026: 9 Providers, 140% Variance
The same NVIDIA H100 SXM costs $2.99/hr on TensorDock and $7.18/hr on AWS. We compared 18 GPU models across 9 cloud providers to find where the money goes — and where you can save.
The Price Spread Is Worse Than You Think
GPU cloud pricing is not a commodity market. Unlike CPU instances where AWS, GCP, and Azure trade within 5-10% of each other, GPU instances show 140% variance between the cheapest and most expensive providers. For an 8-GPU H100 cluster running 24/7, that variance translates to over $293,600/year in savings just by switching providers.
Hourly Rates: Every GPU, Every Provider
Baseline rates for on-demand GPU instances. Spot/preemptible pricing (where available) can be 40-70% lower but comes with interruption risk.
| GPU | TensorDock | FluidStack | Vast.ai | RunPod | Lambda | CoreWeave | GCP | Azure | AWS | Spread |
|---|---|---|---|---|---|---|---|---|---|---|
| H100 SXM | $2.99 | $3.19 | $3.39 | $3.99 | $4.59 | $4.79 | $6.78 | $6.98 | $7.18 | 140% |
| H200 SXM | $3.74 | $3.99 | $4.24 | $4.99 | $5.74 | $5.99 | $8.48 | $8.73 | $8.98 | 140% |
| A100 SXM 80GB | $1.42 | $1.51 | $1.61 | $1.89 | $2.17 | $2.27 | $3.21 | $3.31 | $3.40 | 140% |
| RTX 4090 | $0.44 | $0.47 | $0.50 | $0.59 | $0.68 | $0.71 | $1.00 | $1.03 | $1.06 | 140% |
| L40S | $0.97 | $1.03 | $1.10 | $1.29 | $1.48 | $1.55 | $2.19 | $2.26 | $2.32 | 140% |
| AMD MI300X | $2.63 | $2.80 | $2.98 | $3.50 | $4.02 | $4.20 | $5.95 | $6.13 | $6.30 | 140% |
| T4 | $0.28 | $0.30 | $0.31 | $0.37 | $0.43 | $0.44 | $0.63 | $0.65 | $0.67 | 140% |
Monthly Cost for an 8-GPU Cluster
Most ML training jobs need multi-GPU clusters. Here's what an 8-GPU cluster costs per month (730 hours) at the cheapest vs. most expensive provider.
| GPU (8x cluster) | Cheapest | Most Expensive | Monthly Savings | Annual Savings |
|---|---|---|---|---|
| H100 SXM | TensorDock ($2.99/hr) | AWS ($7.18/hr) | $24,467 | $293,600 |
| H200 SXM | TensorDock ($3.74/hr) | AWS ($8.98/hr) | $30,599 | $367,184 |
| A100 SXM 80GB | TensorDock ($1.42/hr) | AWS ($3.40/hr) | $11,589 | $139,074 |
| RTX 4090 | TensorDock ($0.44/hr) | AWS ($1.06/hr) | $3,618 | $43,415 |
| L40S | TensorDock ($0.97/hr) | AWS ($2.32/hr) | $7,910 | $94,923 |
Why Prices Vary So Much
1. Hyperscaler Margin Structure
AWS, GCP, and Azure price GPUs at 1.7-1.8x the baseline because they bundle networking, storage, and managed services into the instance cost. You pay for the ecosystem, not just the GPU. For teams already deep in a hyperscaler, the premium buys operational simplicity.
2. GPU-Native Providers
RunPod, Lambda, and CoreWeave operate GPU-optimized infrastructure with lower overhead. Prices run 1.0-1.2x baseline. You get bare-metal-like performance with basic orchestration. The tradeoff: fewer managed services, less geographic diversity.
3. Marketplace and Spot Providers
TensorDock, FluidStack, and Vast.ai aggregate idle GPU capacity from data centers worldwide. Prices run 0.75-0.85x baseline — the cheapest option. The tradeoff: variable availability, potential interruptions, and less predictable performance (shared infrastructure).
The Hidden Cost: Electricity
Cloud GPU pricing only covers compute rental. But an H100 SXM draws 700W at peak — at $0.12/kWh (US average), that's $0.084/hr per GPU in electricity alone. For an 8-GPU cluster running 24/7, electricity adds $4,900/year that doesn't appear on your cloud bill.
In Europe (average $0.25/kWh), that electricity cost doubles. For on-premise deployments, electricity is typically 15-25% of the total cost of ownership.
Our GPU Cost Calculator includes electricity costs alongside compute costs for a true total-cost comparison.
What About GPU Utilization?
Price per hour only matters if you're using the GPU. Research shows average GPU utilization across data centers is just 5%. That means 95% of your GPU compute time is wasted — GPUs sitting idle while still drawing 100-150W of power.
A $3.99/hr H100 running at 5% utilization effectively costs $79.80/hr per useful compute hour. Monitoring actual utilization and eliminating idle time is often worth more than switching providers.
NemulAI's open-source agent (pip install nemulai) measures real utilization per job, per GPU, and recommends optimizations. Try it free →
Recommendations by Workload Type
Training (multi-GPU, long-running)
Use spot instances on TensorDock/FluidStack/Vast.ai for fault-tolerant training with checkpointing. Save 40-60% vs. hyperscalers. For jobs that can't be interrupted, RunPod or Lambda offer the best on-demand value.
Inference (single-GPU, latency-sensitive)
Use RTX 4090 or L40S on marketplace providers for cost-effective inference. T4 on GCP (via preemptible) remains the cheapest option for low-throughput inference.
Fine-tuning (1-8 GPUs, hours to days)
A100 80GB on RunPod or Lambda hits the sweet spot of price, memory, and availability. For LoRA/QLoRA, a single RTX 4090 at $0.44/hr on TensorDock is hard to beat.
Try the Calculator
Run your own numbers with our free GPU Cost Calculator. Compare any GPU model across all 9 providers, including electricity costs and CO₂ footprint.
Stop overpaying for GPUs
Install the NemulAI agent to measure real utilization and find savings beyond provider switching.