Session 5· 07· 10 min

Incomplete Output Handling

What you'll learn

▸Detect truncated output via finish_reason
▸Set max_tokens intentionally low to trigger truncation
▸Build safe parsing that does not crash on partial JSON

If max_tokens is too low, the model gets cut off mid-sentence. finish_reason will be "length" instead of "stop", and the content will be partial. This lesson deliberately triggers that and shows defensive parsing.

07_incomplete_output_handling.py

response = client.chat.completions.create(
    model=model,
    messages=[...],
    max_tokens=30,                                              ①
)

choice = response.choices[0]
if choice.finish_reason == "length":                            ②
    print("⚠️ Output was truncated — retry with higher max_tokens")
else:
    data = json.loads(choice.message.content)

①Deliberately low cap to force truncation.

②finish_reason tells you WHY the model stopped: "stop" = natural, "length" = hit the cap, "tool_calls" = paused to call a tool.

$ python 07_incomplete_output_handling.py

Code Check

Which finish_reason indicates the output was cut off by your max_tokens cap?

Recap — what you just learned

✓Check finish_reason on every response in production
✓"length" = truncated, "stop" = complete, "tool_calls" = wants to call a tool
✓Never blindly json.loads() partial JSON — it will crash

Next up: 08 — Schema Mistakes