Session 1· H6· 15 min
Image Analysis (Vision)
What you'll learn
- ▸Send an image plus a question to a multimodal model
- ▸Convert a local image file into a base64 data URL
- ▸Read free-form answers about image contents
What you will build
A script that takes a local image file plus a question and asks the model to describe what it sees. Same responses.create() call as H1 — you just pass a multi-part content list instead of a plain string.
New concept — data URLs
What is a data URL?
A way to embed a file's raw bytes directly inside a URL string, encoded as base64. Looks like "data:image/png;base64,iVBORw0KGgo...". The API reads it exactly like a normal URL but the bytes come from your disk, not the internet.
Local file → API call
sample1.png
on disk
Read bytes
pathlib
base64 encode
string
data:image URL
prepended prefix
API call
with URL
The code
src/06_image_analysis_basic.py (excerpt)
from common import to_data_url ①
image_url = to_data_url(args.image) ②
response = client.responses.create(
model=env_model("MODEL_VISION", "gpt-4.1-mini"), ③
input=[
{"role": "user", "content": [ ④
{"type": "input_text", "text": args.question},
{"type": "input_image", "image_url": image_url},
]},
],
)
print(response.output_text)①to_data_url is a helper in common.py that handles file → base64 → data URL.
②Convert your local image path into a data URL string.
③Use a vision-capable model (gpt-4.1-mini supports images).
④Content is now a LIST — mixing a text item and an image item in the same user message.
Run it
$ python src/06_image_analysis_basic.py --image assets/sample1.png --question "What do you see?"
Common errors
Knowledge check
Knowledge Check
Why must local images be converted to data URLs before sending them?
Code Check
In the content list, what would you add to send TWO images plus a question?
Recap — what you just learned
- ✓Vision uses the same responses.create() call — only the content shape changes
- ✓content becomes a LIST with input_text and input_image items
- ✓Local files need to be converted to base64 data URLs before sending
- ✓You can mix text and any number of images in a single message