Tool-use vs structured output: when each one is the right tool

The two patterns, briefly

Structured output (JSON mode, schema-constrained generation) tells the model: "your reply must conform to this schema." The model returns one valid object and stops. The control stays with you — your code parses it and decides what happens next.

Tool-use tells the model: "here are functions you can call. Decide which to call, with what arguments, and I'll give you the result." The model can call multiple tools across multiple turns. The control flow is shared between you and the model.

Both produce JSON. Both feel like "structured outputs." The similarity is at the wire format. The semantics are not the same.

Use structured output when there is one answer

If the question can be answered by extracting fields from input that's already in front of the model, structured output wins. Examples we ship constantly:

Classification: "what category is this support ticket?" → {category, confidence, reasoning}.
Extraction: "pull the dates and dollar amounts out of this contract." → array of objects.
Reformatting: "rewrite this dump as the schema below."
Single-step decisioning: "based on this profile, what tier?" → {tier, score, reason}.

The pattern: input is fully present, output is one shape, no external action required. Structured output handles this faster, cheaper, with fewer moving parts.

Use tool-use when the model needs to do things

If answering correctly requires the model to fetch information it doesn't have, take an action, or chain multiple decisions, tool-use is the right primitive. Examples:

Lookups: "answer this question, calling search_docs(query) as needed."
Multi-step transactions: "schedule the meeting" — calls calendar tool, then email tool, then confirms.
Conditional branches: "diagnose this error" — runs get_logs, decides if more lookups needed, eventually returns a verdict.
Agents: anything that loops until done.

The pattern: the model needs to decide what information or actions are required, and that decision depends on intermediate results. Forcing this through structured output means hand-coding the orchestration in your application. Tool-use lets the model own the orchestration.

The trap: doing structured output through tool-use

The most common mistake we see: teams use tool-use for what is really a single-extraction problem because tool-use was the first thing they learned. Symptoms:

You declared one tool, named extract_fields, and the model is "supposed to" call it once.
Your code has logic to handle the case where the model decides not to call the tool.
You're paying for a multi-turn conversation when you needed one response.

Switch to structured output. It's faster, cheaper, has no risk of the model deciding to chat instead of return JSON, and the schema-conformance guarantees are stronger.

The other trap: doing tool-use through structured output

Less common, more painful: teams build a "structured output that describes what to do next," parse it in their app, dispatch, then send the result back as another structured output call. They're reinventing tool-use, badly, by hand.

Symptoms: you have a state machine in your app code that mirrors the model's decisions. You're parsing JSON, calling a function, formatting the result, calling the model again with the result and a "what next?" prompt. Multi-turn glue everywhere.

Use tool-use. The model is already running this loop natively. Your glue code is just a worse version of what the API already does.

The decision matrix

The two questions we ask first on any new feature:

Does answering require information the model doesn't already have? Yes → tool-use. No → structured output.
Are there multiple steps where each step's input depends on the previous step's output? Yes → tool-use. No → structured output.

That's it. There's a third question for edge cases — "do I want the model to be able to ask the user for clarification" — which pushes toward tool-use as well. But the first two cover ~95% of the decisions we make.

Cost and latency, briefly

Structured output is one round-trip. Tool-use is N round-trips where N depends on how many tool calls the model decides to make. For a "search the docs and answer" agent, N is typically 2–4.

That means tool-use roughly multiplies your latency and cost by N vs. an equivalent structured output call. On classification at volume, this is the difference between $200/mo and $2,000/mo. Don't reach for tool-use when structured output answers the question — and don't reach for structured output when the question genuinely needs N steps.

Where they overlap (and how we resolve it)

Some workloads sit in the middle. The classic: "extract fields, but if you're unsure, call lookup(value) to verify." The right answer is usually tool-use with one tool — let the model call it conditionally — but force a final response in a structured schema as the terminating message.

Both Claude and ChatGPT support this pattern; Claude's implementation is cleaner today (single API call, the schema is enforced on the final assistant turn). For the longer breakdown see our Claude vs ChatGPT for production agents piece.

The one-line rule

If the model has to do something, tool-use. If the model has to say something in a specific shape, structured output. The hardest cases are the ones where you're not sure whether "answering" requires "doing." Sit with that question for ten minutes before you start coding. It's worth it.

Tool-use vs structured output