Session 1· H2· 20 min

Parameter Lab — temperature, top_p, max_tokens

What you'll learn

▸Understand how temperature and top_p shape output creativity
▸See the effect of max_output_tokens on response length
▸Run the same prompt under multiple settings and compare

What you will build

A script that runs the same prompt under three "creativity profiles" — low, balanced, high — and prints all three outputs so you can compare them side by side.

The three parameters that matter most

temperature

randomness dial

•0.0 = deterministic, same answer every time
•0.7 = default, a bit of variation
•1.2+ = wild, creative, riskier
•Think of it as "how willing is the model to pick unusual words"

top_p

token pool size

•1.0 = consider every possible token
•0.5 = consider only the top 50% probability mass
•Lower = safer wording
•Usually leave at 1.0 and tune temperature instead

max_output_tokens

length cap

•Hard limit on reply length
•Output is cut off at the cap
•Great for comparisons and for saving money
•1 token ≈ 0.75 English words

The core loop

src/02_text_params_lab.py (excerpt)

configs = [                                               ①
    {"label": "Low creativity",  "temperature": 0.1, "top_p": 1.0},
    {"label": "Balanced",        "temperature": 0.7, "top_p": 1.0},
    {"label": "High creativity", "temperature": 1.2, "top_p": 1.0},
]

for cfg in configs:                                           ②
    print(f"\n--- {cfg['label']} ---")
    result = client.responses.create(                          ③
        model=model,
        input=args.prompt,
        temperature=cfg["temperature"],                         ④
        top_p=cfg["top_p"],
        max_output_tokens=args.max_output_tokens,
    )
    print(result.output_text.strip())

①Three configs in a list. Each is a dict with a label and parameter values.

②Loop through each config and call the API with it.

③Same script, same prompt, different parameters. This is the whole experiment.

④The parameter is passed as a keyword argument to responses.create.

Run it

$ python src/02_text_params_lab.py --prompt "Write a tagline for a tea brand"

Run it twice

Run the SAME command a second time. The low-creativity output barely changes. The high-creativity output is noticeably different. That is temperature in action.

Try it yourself

Add a fourth config with temperature=2.0. What do you see?
Set max_output_tokens to 20. How is the reply cut off?
Swap the prompt to "Explain gravity to a 5-year-old" and rerun. Does creativity still matter for factual answers?

Knowledge check

Knowledge Check

Which parameter should you set to 0 for maximum reproducibility?

Code Check

If you set max_output_tokens=5 and ask for a 100-word essay, what finish_reason will you see on the response?

Recap — what you just learned

✓temperature controls randomness — low for facts, higher for creativity
✓top_p is a related knob — usually leave it at 1.0
✓max_output_tokens caps reply length (useful for comparisons AND cost)
✓The SAME prompt under different parameters produces genuinely different text

Next up: H3 — Prompt Roles