Session 7· 04· 10 min

Performance & Cost Comparison

What you'll learn
  • Compare LLM call counts across multi-agent patterns for different scenarios
  • Understand how call count relates to cost and latency
  • Know which pattern is best-in-class for each scenario

Architecture diagrams look great on a whiteboard, but what actually matters is how many LLM calls a pattern makes and how many tokens it burns. More calls mean more latency and more cost. Let's put numbers on it.

Why call count matters
Each LLM call adds 0.5-3 seconds of latency (depending on the model and input size) plus token costs for both input and output. A pattern that makes 8 calls instead of 3 can easily be 3x slower and 2-3x more expensive. These numbers compound quickly in production with thousands of requests per day.

Scenario 1: One-shot task

A single, focused request — like "What is the weather in Paris?" — that requires one tool call and one response. This is the simplest benchmark.

 SubagentsHandoffsSkillsRouter
LLM calls4333

Subagents require an extra call because the supervisor must first decide which subagent to invoke before the subagent itself runs. The other patterns all settle at 3 calls for a one-shot task.

Scenario 2: Repeat query (same domain)

Two consecutive questions that hit the same domain — like asking about weather in Paris, then weather in London. Can the pattern reuse context from the first query?

 SubagentsHandoffsSkillsRouter
LLM calls8556

Handoffs and Skills shine here because they can maintain context within the active agent. Subagents pay the supervisor tax twice. Router has a slight overhead because it re-classifies the second query.

Scenario 3: Multi-domain task

A complex request that spans multiple domains — "Book a flight to Paris and find the weather there." This tests how patterns handle cross-domain coordination.

 SubagentsHandoffsSkillsRouter
LLM calls57+35
Tokens~9K14K+~15K~9K

Here the trade-offs get interesting. Skills use the fewest calls (the parent agent calls both skill agents in one turn) but consume more tokens because each skill carries its own context. Handoffs are the most expensive because they must transfer conversational context sequentially across agents.

Best-in-class by scenario

Handoffs
Best for repeat queries
Maintains conversational context efficiently
Skills
Best for one-shot multi-domain
Fewest calls when tasks are independent
Router
Best for simple dispatch
Lightweight classification with low overhead
Token cost vs call count
Fewer calls does not always mean lower cost. Skills may make fewer calls but send larger payloads (because each skill agent has its own system prompt). Always benchmark both call count and total tokens for your specific use case.

The cost formula

For any pattern, the total cost is roughly:

Total cost = (number of LLM calls) x (avg input tokens x input price + avg output tokens x output price)

Input tokens are usually 3-10x cheaper than output tokens, so patterns that generate long outputs at every step (like handoffs passing full conversation history) pay a higher tax per call.

Check your understanding

Knowledge Check
A chatbot handles 1,000 conversations/day, averaging 4 turns each. Switching from Subagents (8 calls/repeat) to Handoffs (5 calls/repeat) saves how many LLM calls per day?
Knowledge Check
Why might the Skills pattern use more tokens than Subagents despite making fewer LLM calls?
Up next
You now have the numbers. The next lesson gives you a decision framework — a flowchart that maps your constraints to the right pattern.