Session 1· H8· 15 min

Image Generation (DALL·E / gpt-image-1)

What you'll learn
  • Call the image generation endpoint from Python
  • Decode the base64 response and save a PNG to disk
  • Experiment with prompt phrasing for better results

What you will build

A script that takes a text prompt and saves a 1024×1024 PNG to output/generated.png. Different endpoint from text generation (client.images.generate) but the same client object and the same .env key.

New concept — base64 payloads

Why base64?
Image bytes can contain characters that do not survive HTTP/JSON. Base64 encodes any bytes as an ASCII string so they travel safely. You decode back to bytes and write them to a .png file on disk.

The code

src/08_image_generation.py (excerpt)
result = client.images.generate(                             ①
    model=env_model("MODEL_IMAGE", "gpt-image-1"),
    prompt=args.prompt,
    size="1024x1024",                                           ②
)

import base64, pathlib
b64 = result.data[0].b64_json                                  ③
pathlib.Path("output/generated.png").write_bytes(base64.b64decode(b64))  ④
print("Saved -> output/generated.png")
Different endpoint: client.images.generate() instead of responses.create().
size is a fixed set of strings the API accepts (1024x1024, 1024x1792, 1792x1024).
The image bytes come back as a base64 string in result.data[0].b64_json.
Decode with base64.b64decode() and write_bytes() to save a real .png file.

Run it

$ python src/08_image_generation.py --prompt "A mascot robot teaching Python in a classroom, flat illustration"
Prompt like a photographer
Include subject, style, lighting, and angle. "A cheerful robot teaching Python, flat illustration, soft classroom light, wide shot" beats "robot teacher" every time.

Knowledge check

Knowledge Check
Why does the API return the image as a base64 string instead of raw bytes?
Code Check
What is the type of result.data?
Recap — what you just learned
  • Image generation uses client.images.generate() (different endpoint, same client)
  • Results come back as base64 strings — you decode and save them yourself
  • Prompt phrasing (style, lighting, angle) massively changes output quality
Next up: H9 — Text to Speech