The job, exactly
Llama-3.1-8B base, LoRA rank-16, ~120K instruction pairs, 3 epochs, sequence length 4096, bf16, gradient checkpointing on. One A100 80GB. The dataset is a real client's support-ticket corpus that we cleaned and packed; total tokens trained ≈ 410M. We ran the same script, same Hugging Face stack, same seed, on each provider.
Why this job? It's the smallest non-toy fine-tune we still run for paying clients in 2026. Big enough that the gotchas show up, small enough that the bill doesn't ruin the experiment.
The headline numbers
Sticker price per A100-80GB hour, then what we actually paid for the job end-to-end:
- Vast.ai — sticker $0.79/hr (community A100). Job took 8h 41m on the host we got. Cold-start: 22 min (image pull + dataset upload to peer host). Total bill: $7.18. One mid-job preemption ate ~12 min on resume. Reliability: rough.
- RunPod — sticker $1.89/hr (Secure Cloud A100). Job took 6h 04m. Cold-start: 4 min. Total bill: $11.62. Zero preemptions. Reliability: clean.
- Modal — sticker $2.78/hr (A100-80GB). Job took 6h 12m. Cold-start: 38 seconds (image cached after first run). Total bill: $17.54. Zero preemptions. Reliability: clean.
Vast was the cheapest by ~38%. So why didn't we keep using it?
What the bill doesn't show
Engineer time. The Vast run took us four attempts to land — the first three either failed to pull the image, lost the host before training started, or hit a peer host with a bad CUDA driver pin. We lost ~3 hours of engineering time before the run that actually finished. At a real billable rate, that erases the savings twice over.
The pattern: Vast wins when you're running sweeps you can retry cheaply, and loses when a single canonical run has to land reliably. Research compute, not production fine-tunes.
What we use Modal for, what we use RunPod for
Modal is the most expensive per hour and we still default to it for anything we ship to a client. The reasons aren't subtle:
- 38-second cold-start (after first run) versus 4–22 minutes elsewhere. When the client wants to re-run with a tweak, that delta compounds across the engagement.
- The Python-decorator deployment model means a fine-tune script and the inference endpoint that serves it live in one repo and ship together. We've never lost a Friday afternoon to "wait, which version of the training script produced this checkpoint?"
- Image build is cached at the layer level. RunPod and Vast both make you re-pull a 12GB CUDA image more often than you'd expect.
RunPod is what we recommend when the client's procurement team won't sign off on a Modal contract but still wants reliability. Same class of result, harder workflow, lower price. For a one-shot fine-tune that won't recur, RunPod's Secure Cloud is hard to beat.
Vast we keep around for hyperparameter sweeps where we'll do 30 cheap runs and only one needs to finish. Burning $20 on flaky runs to find the right LR is a fine trade.
The three numbers we wish people would compute first
We've watched too many teams pick a GPU provider on hourly rate alone. Before you do, run these three numbers on your job:
- Effective $/job, not $/hr. Provider B at 2x the hourly rate but 30% faster wall-clock and 90% fewer retries usually wins.
- Engineer-hours wasted on reliability. If you're babysitting a run, multiply that by your fully-loaded rate and add it to the bill.
- Cold-start tax over the engagement. Add up every "let me try one more thing" you'll do across the project, multiplied by cold-start time. This is where Modal's pricing pays for itself.
Our GPU cost calculator lets you plug in wall-clock and hourly rate to compare $/job directly. Reliability multiplier you'll have to calibrate yourself — we use 1.4x on Vast, 1.05x on RunPod, 1.0x on Modal, derived from the last 18 months of internal logs.
The one-line summary
Modal for production. RunPod when budget says no to Modal. Vast for sweeps where flakiness is fine. The $0.20/hr A100 headline is real but mostly applies to people who can absorb retries — which is fewer teams than you'd think.
Full reviews of each: Modal · RunPod · Vast.ai. And if you'd rather we run the fine-tune for you, book a scoping call.