Session 1· H2· 20 min
Parameter Lab — temperature, top_p, max_tokens
What you'll learn
- ▸Understand how temperature and top_p shape output creativity
- ▸See the effect of max_output_tokens on response length
- ▸Run the same prompt under multiple settings and compare
What you will build
A script that runs the same prompt under three "creativity profiles" — low, balanced, high — and prints all three outputs so you can compare them side by side.
The three parameters that matter most
temperature
randomness dial
- •0.0 = deterministic, same answer every time
- •0.7 = default, a bit of variation
- •1.2+ = wild, creative, riskier
- •Think of it as "how willing is the model to pick unusual words"
top_p
token pool size
- •1.0 = consider every possible token
- •0.5 = consider only the top 50% probability mass
- •Lower = safer wording
- •Usually leave at 1.0 and tune temperature instead
max_output_tokens
length cap
- •Hard limit on reply length
- •Output is cut off at the cap
- •Great for comparisons and for saving money
- •1 token ≈ 0.75 English words
The core loop
src/02_text_params_lab.py (excerpt)
configs = [ ①
{"label": "Low creativity", "temperature": 0.1, "top_p": 1.0},
{"label": "Balanced", "temperature": 0.7, "top_p": 1.0},
{"label": "High creativity", "temperature": 1.2, "top_p": 1.0},
]
for cfg in configs: ②
print(f"\n--- {cfg['label']} ---")
result = client.responses.create( ③
model=model,
input=args.prompt,
temperature=cfg["temperature"], ④
top_p=cfg["top_p"],
max_output_tokens=args.max_output_tokens,
)
print(result.output_text.strip())①Three configs in a list. Each is a dict with a label and parameter values.
②Loop through each config and call the API with it.
③Same script, same prompt, different parameters. This is the whole experiment.
④The parameter is passed as a keyword argument to responses.create.
Run it
$ python src/02_text_params_lab.py --prompt "Write a tagline for a tea brand"
Run it twice
Run the SAME command a second time. The low-creativity output barely changes. The high-creativity output is noticeably different. That is temperature in action.
Try it yourself
- Add a fourth config with temperature=2.0. What do you see?
- Set max_output_tokens to 20. How is the reply cut off?
- Swap the prompt to "Explain gravity to a 5-year-old" and rerun. Does creativity still matter for factual answers?
Knowledge check
Knowledge Check
Which parameter should you set to 0 for maximum reproducibility?
Code Check
If you set max_output_tokens=5 and ask for a 100-word essay, what finish_reason will you see on the response?
Recap — what you just learned
- ✓temperature controls randomness — low for facts, higher for creativity
- ✓top_p is a related knob — usually leave it at 1.0
- ✓max_output_tokens caps reply length (useful for comparisons AND cost)
- ✓The SAME prompt under different parameters produces genuinely different text