Session 1· H2· 20 min

Parameter Lab — temperature, top_p, max_tokens

What you'll learn
  • Understand how temperature and top_p shape output creativity
  • See the effect of max_output_tokens on response length
  • Run the same prompt under multiple settings and compare

What you will build

A script that runs the same prompt under three "creativity profiles" — low, balanced, high — and prints all three outputs so you can compare them side by side.

The three parameters that matter most

temperature
randomness dial
  • 0.0 = deterministic, same answer every time
  • 0.7 = default, a bit of variation
  • 1.2+ = wild, creative, riskier
  • Think of it as "how willing is the model to pick unusual words"
top_p
token pool size
  • 1.0 = consider every possible token
  • 0.5 = consider only the top 50% probability mass
  • Lower = safer wording
  • Usually leave at 1.0 and tune temperature instead
max_output_tokens
length cap
  • Hard limit on reply length
  • Output is cut off at the cap
  • Great for comparisons and for saving money
  • 1 token ≈ 0.75 English words

The core loop

src/02_text_params_lab.py (excerpt)
configs = [                                               ①
    {"label": "Low creativity",  "temperature": 0.1, "top_p": 1.0},
    {"label": "Balanced",        "temperature": 0.7, "top_p": 1.0},
    {"label": "High creativity", "temperature": 1.2, "top_p": 1.0},
]

for cfg in configs:                                           ②
    print(f"\n--- {cfg['label']} ---")
    result = client.responses.create(                          ③
        model=model,
        input=args.prompt,
        temperature=cfg["temperature"],                         ④
        top_p=cfg["top_p"],
        max_output_tokens=args.max_output_tokens,
    )
    print(result.output_text.strip())
Three configs in a list. Each is a dict with a label and parameter values.
Loop through each config and call the API with it.
Same script, same prompt, different parameters. This is the whole experiment.
The parameter is passed as a keyword argument to responses.create.

Run it

$ python src/02_text_params_lab.py --prompt "Write a tagline for a tea brand"
Run it twice
Run the SAME command a second time. The low-creativity output barely changes. The high-creativity output is noticeably different. That is temperature in action.

Try it yourself

  • Add a fourth config with temperature=2.0. What do you see?
  • Set max_output_tokens to 20. How is the reply cut off?
  • Swap the prompt to "Explain gravity to a 5-year-old" and rerun. Does creativity still matter for factual answers?

Knowledge check

Knowledge Check
Which parameter should you set to 0 for maximum reproducibility?
Code Check
If you set max_output_tokens=5 and ask for a 100-word essay, what finish_reason will you see on the response?
Recap — what you just learned
  • temperature controls randomness — low for facts, higher for creativity
  • top_p is a related knob — usually leave it at 1.0
  • max_output_tokens caps reply length (useful for comparisons AND cost)
  • The SAME prompt under different parameters produces genuinely different text
Next up: H3 — Prompt Roles