GPU Cloud Pricing

H100 Cloud Pricing Comparison 2026: 9 Providers, 140% Variance

The same NVIDIA H100 SXM costs $2.99/hr on TensorDock and $7.18/hr on AWS. We compared 18 GPU models across 9 cloud providers to find where the money goes — and where you can save.

Updated June 2026·8 min read

The Price Spread Is Worse Than You Think

GPU cloud pricing is not a commodity market. Unlike CPU instances where AWS, GCP, and Azure trade within 5-10% of each other, GPU instances show 140% variance between the cheapest and most expensive providers. For an 8-GPU H100 cluster running 24/7, that variance translates to over $293,600/year in savings just by switching providers.

Hourly Rates: Every GPU, Every Provider

Baseline rates for on-demand GPU instances. Spot/preemptible pricing (where available) can be 40-70% lower but comes with interruption risk.

GPUTensorDockFluidStackVast.aiRunPodLambdaCoreWeaveGCPAzureAWSSpread
H100 SXM$2.99$3.19$3.39$3.99$4.59$4.79$6.78$6.98$7.18140%
H200 SXM$3.74$3.99$4.24$4.99$5.74$5.99$8.48$8.73$8.98140%
A100 SXM 80GB$1.42$1.51$1.61$1.89$2.17$2.27$3.21$3.31$3.40140%
RTX 4090$0.44$0.47$0.50$0.59$0.68$0.71$1.00$1.03$1.06140%
L40S$0.97$1.03$1.10$1.29$1.48$1.55$2.19$2.26$2.32140%
AMD MI300X$2.63$2.80$2.98$3.50$4.02$4.20$5.95$6.13$6.30140%
T4$0.28$0.30$0.31$0.37$0.43$0.44$0.63$0.65$0.67140%

Monthly Cost for an 8-GPU Cluster

Most ML training jobs need multi-GPU clusters. Here's what an 8-GPU cluster costs per month (730 hours) at the cheapest vs. most expensive provider.

GPU (8x cluster)CheapestMost ExpensiveMonthly SavingsAnnual Savings
H100 SXMTensorDock ($2.99/hr)AWS ($7.18/hr)$24,467$293,600
H200 SXMTensorDock ($3.74/hr)AWS ($8.98/hr)$30,599$367,184
A100 SXM 80GBTensorDock ($1.42/hr)AWS ($3.40/hr)$11,589$139,074
RTX 4090TensorDock ($0.44/hr)AWS ($1.06/hr)$3,618$43,415
L40STensorDock ($0.97/hr)AWS ($2.32/hr)$7,910$94,923

Why Prices Vary So Much

1. Hyperscaler Margin Structure

AWS, GCP, and Azure price GPUs at 1.7-1.8x the baseline because they bundle networking, storage, and managed services into the instance cost. You pay for the ecosystem, not just the GPU. For teams already deep in a hyperscaler, the premium buys operational simplicity.

2. GPU-Native Providers

RunPod, Lambda, and CoreWeave operate GPU-optimized infrastructure with lower overhead. Prices run 1.0-1.2x baseline. You get bare-metal-like performance with basic orchestration. The tradeoff: fewer managed services, less geographic diversity.

3. Marketplace and Spot Providers

TensorDock, FluidStack, and Vast.ai aggregate idle GPU capacity from data centers worldwide. Prices run 0.75-0.85x baseline — the cheapest option. The tradeoff: variable availability, potential interruptions, and less predictable performance (shared infrastructure).

The Hidden Cost: Electricity

Cloud GPU pricing only covers compute rental. But an H100 SXM draws 700W at peak — at $0.12/kWh (US average), that's $0.084/hr per GPU in electricity alone. For an 8-GPU cluster running 24/7, electricity adds $4,900/year that doesn't appear on your cloud bill.

In Europe (average $0.25/kWh), that electricity cost doubles. For on-premise deployments, electricity is typically 15-25% of the total cost of ownership.

Our GPU Cost Calculator includes electricity costs alongside compute costs for a true total-cost comparison.

What About GPU Utilization?

Price per hour only matters if you're using the GPU. Research shows average GPU utilization across data centers is just 5%. That means 95% of your GPU compute time is wasted — GPUs sitting idle while still drawing 100-150W of power.

A $3.99/hr H100 running at 5% utilization effectively costs $79.80/hr per useful compute hour. Monitoring actual utilization and eliminating idle time is often worth more than switching providers.

NemulAI's open-source agent (pip install nemulai) measures real utilization per job, per GPU, and recommends optimizations. Try it free →

Recommendations by Workload Type

Training (multi-GPU, long-running)

Use spot instances on TensorDock/FluidStack/Vast.ai for fault-tolerant training with checkpointing. Save 40-60% vs. hyperscalers. For jobs that can't be interrupted, RunPod or Lambda offer the best on-demand value.

Inference (single-GPU, latency-sensitive)

Use RTX 4090 or L40S on marketplace providers for cost-effective inference. T4 on GCP (via preemptible) remains the cheapest option for low-throughput inference.

Fine-tuning (1-8 GPUs, hours to days)

A100 80GB on RunPod or Lambda hits the sweet spot of price, memory, and availability. For LoRA/QLoRA, a single RTX 4090 at $0.44/hr on TensorDock is hard to beat.

Try the Calculator

Run your own numbers with our free GPU Cost Calculator. Compare any GPU model across all 9 providers, including electricity costs and CO₂ footprint.

Stop overpaying for GPUs

Install the NemulAI agent to measure real utilization and find savings beyond provider switching.