Session 1· H4· 10 min

No Memory Between API Calls

What you'll learn
  • Prove that the API is stateless by design
  • Understand why the model "forgets" between calls
  • Motivate why H5 has to resend context for multi-turn chats

What you will build

A deliberately broken chatbot. You tell the model your name in call 1, ask your name back in call 2, and watch it not know you. This 10-line exercise teaches the single most misunderstood thing about LLM APIs.

The concept

Statelessness in one line
The OpenAI API is stateless. Every call is an independent HTTPS request. The model has zero knowledge of any previous call you made. It does not "remember" — it cannot.
What students expect
Call 1
"my name is Arya"
Brain
somewhere
Call 2
"what is my name?"
Reply
"Arya"
What actually happens
Call 1
"my name is Arya"
Call 2
"what is my name?"
Reply
"I don't know"

The code

src/04_no_memory_demo.py (excerpt)
# Call 1 — tell the model a fact
client.responses.create(
    model=model,
    input="My name is Arya. Remember it.",                    ①
)

# Call 2 — brand new HTTPS request, new everything
r = client.responses.create(
    model=model,
    input="What is my name?",                                  ②
)
print(r.output_text)                                           ③
This call lands on some OpenAI server. It returns. Done. Gone.
This call lands on a different (or same) server but with ZERO memory of call 1.
The model will say something like "I don't have access to your name."

Run it

$ python src/04_no_memory_demo.py

Why is it built this way?

Stateless servers scale horizontally. OpenAI runs millions of requests per minute on a fleet of machines. If each request had to look up "what did user 47 say yesterday?", the entire architecture would break. Instead YOU keep the history on your side and send it with every request. That is what H5 shows.

Knowledge check

Knowledge Check
Why does the OpenAI API not remember previous calls automatically?
Code Check
What would make call 2 reply "Your name is Arya"?
Recap — what you just learned
  • LLM APIs are stateless — each call is independent
  • The model cannot "remember" anything between calls
  • If you want memory, YOU have to send the prior conversation in the next request
  • This is by design, for scalability
Next up: H5 — Simulate Memory by Sending Context