Session 8· 05· 15 min
Managing Long Conversations
What you'll learn
- ▸Add SummarizationMiddleware to auto-summarize old messages
- ▸Understand the token-budget tradeoff of long histories
- ▸See trimming and summarization strategies
Memory is great until the conversation grows past the context window. SummarizationMiddleware watches the token count and replaces old messages with a summary when a threshold is hit.
05_long_conversations.py
from langchain.agents.middleware import SummarizationMiddleware ①
agent = create_agent(
model=f"openai:{model_name}",
tools=[search_docs],
middleware=[
SummarizationMiddleware(
model=f"openai:{model_name}", ②
trigger=("tokens", 2000), ③
keep=("messages", 10), ④
),
],
checkpointer=InMemorySaver(),
)①SummarizationMiddleware monitors and manages conversation length.
②Uses a (possibly cheaper) model to generate the summary.
③When total tokens exceed 2000, trigger summarization.
④Always keep the last 10 messages — summarize the rest.
$ python 05_long_conversations.py
Knowledge Check
Why not just keep the entire conversation history forever?
Recap — what you just learned
- ✓SummarizationMiddleware triggers at a token threshold
- ✓Old messages get replaced with a summary; recent ones stay
- ✓Use a cheaper model for the summarization call to save cost