Session 5· 07· 10 min

Incomplete Output Handling

What you'll learn
  • Detect truncated output via finish_reason
  • Set max_tokens intentionally low to trigger truncation
  • Build safe parsing that does not crash on partial JSON

If max_tokens is too low, the model gets cut off mid-sentence. finish_reason will be "length" instead of "stop", and the content will be partial. This lesson deliberately triggers that and shows defensive parsing.

07_incomplete_output_handling.py
response = client.chat.completions.create(
    model=model,
    messages=[...],
    max_tokens=30,                                              ①
)

choice = response.choices[0]
if choice.finish_reason == "length":                            ②
    print("⚠️ Output was truncated — retry with higher max_tokens")
else:
    data = json.loads(choice.message.content)
Deliberately low cap to force truncation.
finish_reason tells you WHY the model stopped: "stop" = natural, "length" = hit the cap, "tool_calls" = paused to call a tool.
$ python 07_incomplete_output_handling.py
Code Check
Which finish_reason indicates the output was cut off by your max_tokens cap?
Recap — what you just learned
  • Check finish_reason on every response in production
  • "length" = truncated, "stop" = complete, "tool_calls" = wants to call a tool
  • Never blindly json.loads() partial JSON — it will crash
Next up: 08 — Schema Mistakes