Session 0· 01· 10 min

What is a Large Language Model?

What you'll learn

▸Define LLMs in one sentence
▸Understand how LLMs turn text into tokens and predict the next one
▸See the scale numbers that separate LLMs from smaller models

A Large Language Model is a neural network trained to predict the next word (more precisely, the next token) in a sequence of text. That sounds simple — and it is — but when you scale it to trillions of examples and hundreds of billions of parameters, "predict the next word" becomes "write essays, answer questions, debug code, translate languages, and reason about problems."

One-sentence definition

An LLM is a statistical text-prediction engine that got so good at completion that it started to look like understanding.

Step 1 — Text becomes tokens

LLMs do not see letters or words. They see tokens — sub-word chunks produced by a tokenizer. Here is how the sentence "Understanding LLMs is fun!" gets split:

Understanding LLMs is fun!

Each coloured chunk is one token — 7 tokens

On average 1 token ≈ 0.75 English words. Tokens matter because APIs charge per token and every model has a maximum token window (the "context length").

Step 2 — The model predicts the next token

Given the tokens so far, the model assigns a probability to every possible next token and picks one. Then it takes the updated sequence and predicts the token after that. It loops until it decides to stop (or hits your max_tokens limit).

Single generation step

Input text

Your prompt

Tokenize

Split into tokens

Predict

Next-token probs

Sample

Pick one token

Append & loop

Until stop

Step 3 — Scale is what makes it magical

The prediction mechanism has existed for decades. What changed is the scale of training data, model size, and compute. Here are the numbers that separate a "language model" from a large language model:

100B+

Parameters

internal weights the model learned

15T

Training tokens

text shown during training

200k

Context window

tokens the model can read at once

Parameters = what the model learned

Think of parameters as knobs the model tunes during training. A 100-billion-parameter model has 100 billion knobs — enough to encode a huge amount of world knowledge. You never tune them yourself; you just send prompts and read replies.

What LLMs are NOT

Not a database. They cannot reliably recall facts or citations.
Not deterministic. The same prompt can give different answers unless you set temperature to 0.
Not reasoning engines. They pattern-match on reasoning, which works most of the time — but can fail in surprising ways.
Not aware of anything after their training cutoff. For fresh information you have to feed it to them in the prompt (RAG) or give them a search tool.

You are ready for the next topic

Now that you know what an LLM is, the next page shows exactly how it differs from the "classical" machine learning you may have seen before.