LLM API Basics — request in, text out
- ▸Understand the request/response shape of an LLM API call
- ▸Know what system, user, and assistant roles mean
- ▸Understand tokens, temperature, and the context window
An LLM API is remarkably simple: you send a JSON request over HTTPS, the provider runs the model, and you get JSON back with the generated text. No databases, no GPUs, no infrastructure. Just one HTTP call.
The round trip
A minimal request
This is the entire shape of an LLM call. Everything else is a variation.
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4o-mini",
input=[
{"role": "system", "content": "You are a helpful tutor."},
{"role": "user", "content": "Explain HTTP in 2 lines."},
],
temperature=0.7,
max_output_tokens=200,
)
print(response.output_text)The three message roles
- system — sets persona, rules, and output format. "You are a JSON-only assistant."
- user — the human's question or request.
- assistant — the model's previous replies (only when simulating multi-turn memory).
The three parameters that matter most
Tokens — the units you pay for
You pay for tokens, not characters or words. Both input (prompt) and output (reply) are metered. Here is what the request above looks like as tokens:
The context window
Every model has a maximum number of tokens it can process in a single call — the context window. It includes BOTH your input AND the reply the model generates. Exceed it and the API rejects the request.
The response object
The API returns a JSON object with the generated text plus metadata: which model ran, how many tokens were used (input + output), and why generation stopped ("stop", "length", or "tool_calls").
{
"id": "resp_abc123",
"model": "gpt-4o-mini",
"output_text": "HTTP is a request-response protocol used by the web...",
"usage": {
"input_tokens": 14,
"output_tokens": 27,
"total_tokens": 41
},
"finish_reason": "stop"
}