REVIEW

The $0.20/hr GPU is lying to you

We ran the same Llama-3.1-8B LoRA fine-tune on Vast.ai, Modal, and RunPod — three serverless GPU providers we already review. The cheapest hourly rate did not win the job. Here are the wall-clock numbers and the one we kept.

READ · 10 MIN UPDATED · 2026-04-23 BY · PINTOED AI STUDIO

The job, exactly

Llama-3.1-8B base, LoRA rank-16, ~120K instruction pairs, 3 epochs, sequence length 4096, bf16, gradient checkpointing on. One A100 80GB. The dataset is a real client's support-ticket corpus that we cleaned and packed; total tokens trained ≈ 410M. We ran the same script, same Hugging Face stack, same seed, on each provider.

Why this job? It's the smallest non-toy fine-tune we still run for paying clients in 2026. Big enough that the gotchas show up, small enough that the bill doesn't ruin the experiment.

The headline numbers

Sticker price per A100-80GB hour, then what we actually paid for the job end-to-end:

Vast was the cheapest by ~38%. So why didn't we keep using it?

What the bill doesn't show

Engineer time. The Vast run took us four attempts to land — the first three either failed to pull the image, lost the host before training started, or hit a peer host with a bad CUDA driver pin. We lost ~3 hours of engineering time before the run that actually finished. At a real billable rate, that erases the savings twice over.

The pattern: Vast wins when you're running sweeps you can retry cheaply, and loses when a single canonical run has to land reliably. Research compute, not production fine-tunes.

What we use Modal for, what we use RunPod for

Modal is the most expensive per hour and we still default to it for anything we ship to a client. The reasons aren't subtle:

RunPod is what we recommend when the client's procurement team won't sign off on a Modal contract but still wants reliability. Same class of result, harder workflow, lower price. For a one-shot fine-tune that won't recur, RunPod's Secure Cloud is hard to beat.

Vast we keep around for hyperparameter sweeps where we'll do 30 cheap runs and only one needs to finish. Burning $20 on flaky runs to find the right LR is a fine trade.

The three numbers we wish people would compute first

We've watched too many teams pick a GPU provider on hourly rate alone. Before you do, run these three numbers on your job:

  1. Effective $/job, not $/hr. Provider B at 2x the hourly rate but 30% faster wall-clock and 90% fewer retries usually wins.
  2. Engineer-hours wasted on reliability. If you're babysitting a run, multiply that by your fully-loaded rate and add it to the bill.
  3. Cold-start tax over the engagement. Add up every "let me try one more thing" you'll do across the project, multiplied by cold-start time. This is where Modal's pricing pays for itself.

Our GPU cost calculator lets you plug in wall-clock and hourly rate to compare $/job directly. Reliability multiplier you'll have to calibrate yourself — we use 1.4x on Vast, 1.05x on RunPod, 1.0x on Modal, derived from the last 18 months of internal logs.

The one-line summary

Modal for production. RunPod when budget says no to Modal. Vast for sweeps where flakiness is fine. The $0.20/hr A100 headline is real but mostly applies to people who can absorb retries — which is fewer teams than you'd think.

Full reviews of each: Modal · RunPod · Vast.ai. And if you'd rather we run the fine-tune for you, book a scoping call.

Need a fine-tune to land on time and on budget? We've shipped a few.

BOOK A SCOPING CALL → SEE SERVICES →