Session 5· 07· 10 min
Incomplete Output Handling
What you'll learn
- ▸Detect truncated output via finish_reason
- ▸Set max_tokens intentionally low to trigger truncation
- ▸Build safe parsing that does not crash on partial JSON
If max_tokens is too low, the model gets cut off mid-sentence. finish_reason will be "length" instead of "stop", and the content will be partial. This lesson deliberately triggers that and shows defensive parsing.
07_incomplete_output_handling.py
response = client.chat.completions.create(
model=model,
messages=[...],
max_tokens=30, ①
)
choice = response.choices[0]
if choice.finish_reason == "length": ②
print("⚠️ Output was truncated — retry with higher max_tokens")
else:
data = json.loads(choice.message.content)①Deliberately low cap to force truncation.
②finish_reason tells you WHY the model stopped: "stop" = natural, "length" = hit the cap, "tool_calls" = paused to call a tool.
$ python 07_incomplete_output_handling.py
Code Check
Which finish_reason indicates the output was cut off by your max_tokens cap?
Recap — what you just learned
- ✓Check finish_reason on every response in production
- ✓"length" = truncated, "stop" = complete, "tool_calls" = wants to call a tool
- ✓Never blindly json.loads() partial JSON — it will crash