Session 6· 03· 10 min

Streaming Basics

What you'll learn
  • Use model.stream() to get tokens as they arrive
  • Print chunks live for a typewriter effect
  • Understand the difference between invoke() and stream()

invoke() waits for the full reply. stream() returns an iterator of AIMessageChunk objects — you get tokens as the model produces them. Print each chunk.content to build the typewriter effect chatbots use.

stream()
start request
chunk
token 1
chunk
token 2
chunk
…more…
done
iterator ends
scripts/03_stream_basics.py
for chunk in model.stream("Tell me a fun fact about space."):  ①
    print(chunk.content, end="", flush=True)               ②
print()                                                      ③
stream() returns an iterator. Loop over it.
end="" prevents newlines between chunks; flush=True forces output to appear immediately.
Final newline once streaming finishes.
$ python scripts/03_stream_basics.py
Why streaming matters for UX
Users perceive a streamed response as faster even when the total time is identical. Time-to-first-token is the metric that matters. Always stream when a human is waiting.
Code Check
What does chunk.content hold on each iteration of model.stream()?
Recap — what you just learned
  • stream() returns an iterator of AIMessageChunk objects
  • Each chunk.content is a delta (new tokens only)
  • Use for chatbots, live UIs, or anywhere a human is waiting
Next up: 04 — Batch