Session 0· 03· 10 min

LLM API Basics — request in, text out

What you'll learn

▸Understand the request/response shape of an LLM API call
▸Know what system, user, and assistant roles mean
▸Understand tokens, temperature, and the context window

An LLM API is remarkably simple: you send a JSON request over HTTPS, the provider runs the model, and you get JSON back with the generated text. No databases, no GPUs, no infrastructure. Just one HTTP call.

The round trip

What happens when you call client.responses.create()

Your code

Python / JS / curl

HTTPS request

JSON body + API key

Provider

Runs the model

HTTPS response

JSON with output

Your code

Reads the text

A minimal request

This is the entire shape of an LLM call. Everything else is a variation.

minimal_call.py

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "system", "content": "You are a helpful tutor."},
        {"role": "user",   "content": "Explain HTTP in 2 lines."},
    ],
    temperature=0.7,
    max_output_tokens=200,
)

print(response.output_text)

The three message roles

system — sets persona, rules, and output format. "You are a JSON-only assistant."
user — the human's question or request.
assistant — the model's previous replies (only when simulating multi-turn memory).

The system message is powerful

The model weighs system instructions heavily. Put persona, rules, format, and constraints there — never in the user message. It makes prompts reusable across different questions.

The three parameters that matter most

model

Which model

e.g. gpt-4o-mini, claude-sonnet-4-5

temp

Creativity

0 = deterministic · 1 = default · 1.5 = wild

max

Max tokens

Cap on response length

Tokens — the units you pay for

You pay for tokens, not characters or words. Both input (prompt) and output (reply) are metered. Here is what the request above looks like as tokens:

You are a helpful tutor. Explain HTTP in 2 lines.

Input prompt — 12 tokens

Rule of thumb

1 token ≈ 0.75 English words · 1 page of text ≈ 500 tokens · Today's frontier models bill roughly $0.15 – $5 per MILLION tokens, depending on model and direction (input vs output).

The context window

Every model has a maximum number of tokens it can process in a single call — the context window. It includes BOTH your input AND the reply the model generates. Exceed it and the API rejects the request.

Small model

~16 pages of text

128k

Modern standard

~300 pages of text

Gemini long-context

~4000 pages

Longer context is not free

Even when a model supports 2 million tokens, every token costs money AND slows down the response. Send only what the model actually needs.

The response object

The API returns a JSON object with the generated text plus metadata: which model ran, how many tokens were used (input + output), and why generation stopped ("stop", "length", or "tool_calls").

response.json (trimmed)

{
  "id": "resp_abc123",
  "model": "gpt-4o-mini",
  "output_text": "HTTP is a request-response protocol used by the web...",
  "usage": {
    "input_tokens": 14,
    "output_tokens": 27,
    "total_tokens": 41
  },
  "finish_reason": "stop"
}

That is the whole API

Send messages. Pick a model. Tune temperature and max_tokens. Read the response text. Everything else you will meet in this course — tools, structured output, streaming — is a layer on top of this one pattern.