Session 6· 03· 10 min

Streaming Basics

What you'll learn

▸Use model.stream() to get tokens as they arrive
▸Print chunks live for a typewriter effect
▸Understand the difference between invoke() and stream()

invoke() waits for the full reply. stream() returns an iterator of AIMessageChunk objects — you get tokens as the model produces them. Print each chunk.content to build the typewriter effect chatbots use.

stream()

start request

chunk

token 1

chunk

token 2

chunk

…more…

done

iterator ends

scripts/03_stream_basics.py

for chunk in model.stream("Tell me a fun fact about space."):  ①
    print(chunk.content, end="", flush=True)               ②
print()                                                      ③

①stream() returns an iterator. Loop over it.

②end="" prevents newlines between chunks; flush=True forces output to appear immediately.

③Final newline once streaming finishes.

$ python scripts/03_stream_basics.py

Why streaming matters for UX

Users perceive a streamed response as faster even when the total time is identical. Time-to-first-token is the metric that matters. Always stream when a human is waiting.

Code Check

What does chunk.content hold on each iteration of model.stream()?

Recap — what you just learned

✓stream() returns an iterator of AIMessageChunk objects
✓Each chunk.content is a delta (new tokens only)
✓Use for chatbots, live UIs, or anywhere a human is waiting

Next up: 04 — Batch