Session 9· 9 questions

End-of-session Quiz

What you'll learn

9 questions covering eval fundamentals, dataset generation, and dual grading. Aim for 7+ before moving to Session 10.

Phase 1 — Eval Fundamentals

Knowledge Check

Why is eyeballing LLM outputs insufficient for evaluating prompt changes?

Knowledge Check

What three fields should every eval test case have?

Code Check

What is the prefill + stop_sequences trick used for?

Knowledge Check

When should you generate eval datasets with Claude instead of hand-writing them?

Knowledge Check

Why use temperature=1.0 for dataset generation but temperature=0 for eval runs?

Knowledge Check

What is the purpose of adversarial test cases?

Code Check

What does grade_syntax() check that grade_by_model() cannot?

Code Check

How is the blended score calculated in dual grading?

Knowledge Check

What does the HTML eval report show that raw JSON results do not?

Scored 7+ out of 9?

You are ready for Session 10 — Prompt Engineering. If you missed more than 2, revisit the matching lessons before moving on.