GPU Cloud Pricing

H100 Cloud Pricing Comparison 2026: 9 Providers, 140% Variance

The same NVIDIA H100 SXM costs $2.99/hr on TensorDock and $7.18/hr on AWS. We compared 18 GPU models across 9 cloud providers to find where the money goes — and where you can save.

Updated June 2026·8 min read

The Price Spread Is Worse Than You Think

GPU cloud pricing is not a commodity market. Unlike CPU instances where AWS, GCP, and Azure trade within 5-10% of each other, GPU instances show 140% variance between the cheapest and most expensive providers. For an 8-GPU H100 cluster running 24/7, that variance translates to over $293,600/year in savings just by switching providers.

Hourly Rates: Every GPU, Every Provider

Baseline rates for on-demand GPU instances. Spot/preemptible pricing (where available) can be 40-70% lower but comes with interruption risk.

GPU	TensorDock	FluidStack	Vast.ai	RunPod	Lambda	CoreWeave	GCP	Azure	AWS	Spread
H100 SXM	$2.99	$3.19	$3.39	$3.99	$4.59	$4.79	$6.78	$6.98	$7.18	140%
H200 SXM	$3.74	$3.99	$4.24	$4.99	$5.74	$5.99	$8.48	$8.73	$8.98	140%
A100 SXM 80GB	$1.42	$1.51	$1.61	$1.89	$2.17	$2.27	$3.21	$3.31	$3.40	140%
RTX 4090	$0.44	$0.47	$0.50	$0.59	$0.68	$0.71	$1.00	$1.03	$1.06	140%
L40S	$0.97	$1.03	$1.10	$1.29	$1.48	$1.55	$2.19	$2.26	$2.32	140%
AMD MI300X	$2.63	$2.80	$2.98	$3.50	$4.02	$4.20	$5.95	$6.13	$6.30	140%
T4	$0.28	$0.30	$0.31	$0.37	$0.43	$0.44	$0.63	$0.65	$0.67	140%

Monthly Cost for an 8-GPU Cluster

Most ML training jobs need multi-GPU clusters. Here's what an 8-GPU cluster costs per month (730 hours) at the cheapest vs. most expensive provider.

GPU (8x cluster)	Cheapest	Most Expensive	Monthly Savings	Annual Savings
H100 SXM	TensorDock ($2.99/hr)	AWS ($7.18/hr)	$24,467	$293,600
H200 SXM	TensorDock ($3.74/hr)	AWS ($8.98/hr)	$30,599	$367,184
A100 SXM 80GB	TensorDock ($1.42/hr)	AWS ($3.40/hr)	$11,589	$139,074
RTX 4090	TensorDock ($0.44/hr)	AWS ($1.06/hr)	$3,618	$43,415
L40S	TensorDock ($0.97/hr)	AWS ($2.32/hr)	$7,910	$94,923

Why Prices Vary So Much

1. Hyperscaler Margin Structure

AWS, GCP, and Azure price GPUs at 1.7-1.8x the baseline because they bundle networking, storage, and managed services into the instance cost. You pay for the ecosystem, not just the GPU. For teams already deep in a hyperscaler, the premium buys operational simplicity.

2. GPU-Native Providers

RunPod, Lambda, and CoreWeave operate GPU-optimized infrastructure with lower overhead. Prices run 1.0-1.2x baseline. You get bare-metal-like performance with basic orchestration. The tradeoff: fewer managed services, less geographic diversity.

3. Marketplace and Spot Providers

TensorDock, FluidStack, and Vast.ai aggregate idle GPU capacity from data centers worldwide. Prices run 0.75-0.85x baseline — the cheapest option. The tradeoff: variable availability, potential interruptions, and less predictable performance (shared infrastructure).

The Hidden Cost: Electricity

Cloud GPU pricing only covers compute rental. But an H100 SXM draws 700W at peak — at $0.12/kWh (US average), that's $0.084/hr per GPU in electricity alone. For an 8-GPU cluster running 24/7, electricity adds $4,900/year that doesn't appear on your cloud bill.

In Europe (average $0.25/kWh), that electricity cost doubles. For on-premise deployments, electricity is typically 15-25% of the total cost of ownership.

Our GPU Cost Calculator includes electricity costs alongside compute costs for a true total-cost comparison.

What About GPU Utilization?

Price per hour only matters if you're using the GPU. Research shows average GPU utilization across data centers is just 5%. That means 95% of your GPU compute time is wasted — GPUs sitting idle while still drawing 100-150W of power.

A $3.99/hr H100 running at 5% utilization effectively costs $79.80/hr per useful compute hour. Monitoring actual utilization and eliminating idle time is often worth more than switching providers.

NemulAI's open-source agent (pip install nemulai) measures real utilization per job, per GPU, and recommends optimizations. Try it free →

Recommendations by Workload Type

Training (multi-GPU, long-running)

Use spot instances on TensorDock/FluidStack/Vast.ai for fault-tolerant training with checkpointing. Save 40-60% vs. hyperscalers. For jobs that can't be interrupted, RunPod or Lambda offer the best on-demand value.

Inference (single-GPU, latency-sensitive)

Use RTX 4090 or L40S on marketplace providers for cost-effective inference. T4 on GCP (via preemptible) remains the cheapest option for low-throughput inference.

Fine-tuning (1-8 GPUs, hours to days)

A100 80GB on RunPod or Lambda hits the sweet spot of price, memory, and availability. For LoRA/QLoRA, a single RTX 4090 at $0.44/hr on TensorDock is hard to beat.

Try the Calculator

Run your own numbers with our free GPU Cost Calculator. Compare any GPU model across all 9 providers, including electricity costs and CO₂ footprint.

Stop overpaying for GPUs

Install the NemulAI agent to measure real utilization and find savings beyond provider switching.

GPU Cost Calculator Start Free — 4 GPUs