Session 8· 05· 15 min

Managing Long Conversations

What you'll learn
  • Add SummarizationMiddleware to auto-summarize old messages
  • Understand the token-budget tradeoff of long histories
  • See trimming and summarization strategies

Memory is great until the conversation grows past the context window. SummarizationMiddleware watches the token count and replaces old messages with a summary when a threshold is hit.

05_long_conversations.py
from langchain.agents.middleware import SummarizationMiddleware  ①

agent = create_agent(
    model=f"openai:{model_name}",
    tools=[search_docs],
    middleware=[
        SummarizationMiddleware(
            model=f"openai:{model_name}",                        ②
            trigger=("tokens", 2000),                             ③
            keep=("messages", 10),                                ④
        ),
    ],
    checkpointer=InMemorySaver(),
)
SummarizationMiddleware monitors and manages conversation length.
Uses a (possibly cheaper) model to generate the summary.
When total tokens exceed 2000, trigger summarization.
Always keep the last 10 messages — summarize the rest.
$ python 05_long_conversations.py
Knowledge Check
Why not just keep the entire conversation history forever?
Recap — what you just learned
  • SummarizationMiddleware triggers at a token threshold
  • Old messages get replaced with a summary; recent ones stay
  • Use a cheaper model for the summarization call to save cost
Next up: 06 — Streaming