Session 8· 05· 15 min

Managing Long Conversations

What you'll learn

▸Add SummarizationMiddleware to auto-summarize old messages
▸Understand the token-budget tradeoff of long histories
▸See trimming and summarization strategies

Memory is great until the conversation grows past the context window. SummarizationMiddleware watches the token count and replaces old messages with a summary when a threshold is hit.

05_long_conversations.py

from langchain.agents.middleware import SummarizationMiddleware  ①

agent = create_agent(
    model=f"openai:{model_name}",
    tools=[search_docs],
    middleware=[
        SummarizationMiddleware(
            model=f"openai:{model_name}",                        ②
            trigger=("tokens", 2000),                             ③
            keep=("messages", 10),                                ④
        ),
    ],
    checkpointer=InMemorySaver(),
)

①SummarizationMiddleware monitors and manages conversation length.

②Uses a (possibly cheaper) model to generate the summary.

③When total tokens exceed 2000, trigger summarization.

④Always keep the last 10 messages — summarize the rest.

$ python 05_long_conversations.py

Knowledge Check

Why not just keep the entire conversation history forever?

Recap — what you just learned

✓SummarizationMiddleware triggers at a token threshold
✓Old messages get replaced with a summary; recent ones stay
✓Use a cheaper model for the summarization call to save cost

Next up: 06 — Streaming