Session 1· H5· 15 min

Simulate Memory by Sending Context

What you'll learn

▸Maintain a growing list of messages across turns
▸Replay prior turns so the model "remembers" them
▸See why long conversations cost more tokens

What you will build

A working chatbot loop. You keep a Python list called history. Every time the user types, you append their message, call the API with the full history, then append the model's reply. The model "remembers" because you keep feeding it everything.

The mental model

Every chatbot works this way

ChatGPT, Claude, every assistant you have ever used — under the hood is a growing messages list that gets resent in full on every turn. Memory is an illusion YOU maintain on the client side.

The code

src/05_context_memory_demo.py (excerpt)

history = [                                                ①
    {"role": "system", "content": "You are a friendly assistant."},
]

def turn(user_text: str) -> str:
    history.append({"role": "user", "content": user_text})    ②
    r = client.responses.create(
        model=model,
        input=history,                                         ③
    )
    reply = r.output_text.strip()
    history.append({"role": "assistant", "content": reply})   ④
    return reply

print(turn("My name is Arya."))
print(turn("What is my name?"))                               ⑤

①The history list starts with just the system message.

②Append the new user turn BEFORE calling the API.

③Send the ENTIRE history on every call, not just the latest message.

④Append the assistant reply so it becomes part of the next call.

⑤This time the model replies "Your name is Arya." because turn 1 is in the history.

Run it

$ python src/05_context_memory_demo.py

The hidden cost

Long chats grow linearly

Turn 10 resends all 10 prior turns. Turn 50 resends all 50. Your token bill scales with conversation length. Production chatbots summarise or truncate old turns to keep costs down.

Turn 1 cost

100 input tokens

Turn 10 cost

1,000+ tokens

Turn 50 cost

10,000+ tokens

Try it yourself

Add a third turn that asks "what did I tell you about myself?" and watch the model pull from turn 1.
Remove the history.append(assistant) line. How does the behaviour break?
Add a print(len(history)) after every turn to see the list grow.

Knowledge check

Knowledge Check

In a chatbot with memory, what do you send on turn 10?

Code Check

Why does history need BOTH the user and the assistant message appended each turn?

Recap — what you just learned

✓You simulate memory by keeping a growing messages list on your side
✓Append BOTH the user turn and the assistant reply each round
✓Every API call resends the full history — there is no "session"
✓Long chats cost more tokens linearly — production apps summarise old turns

Next up: H6 — Image Analysis