Session 6· 03· 10 min
Streaming Basics
What you'll learn
- ▸Use model.stream() to get tokens as they arrive
- ▸Print chunks live for a typewriter effect
- ▸Understand the difference between invoke() and stream()
invoke() waits for the full reply. stream() returns an iterator of AIMessageChunk objects — you get tokens as the model produces them. Print each chunk.content to build the typewriter effect chatbots use.
stream()
start request
chunk
token 1
chunk
token 2
chunk
…more…
done
iterator ends
scripts/03_stream_basics.py
for chunk in model.stream("Tell me a fun fact about space."): ①
print(chunk.content, end="", flush=True) ②
print() ③①stream() returns an iterator. Loop over it.
②end="" prevents newlines between chunks; flush=True forces output to appear immediately.
③Final newline once streaming finishes.
$ python scripts/03_stream_basics.py
Why streaming matters for UX
Users perceive a streamed response as faster even when the total time is identical. Time-to-first-token is the metric that matters. Always stream when a human is waiting.
Code Check
What does chunk.content hold on each iteration of model.stream()?
Recap — what you just learned
- ✓stream() returns an iterator of AIMessageChunk objects
- ✓Each chunk.content is a delta (new tokens only)
- ✓Use for chatbots, live UIs, or anywhere a human is waiting