Session 0· 01· 10 min

What is a Large Language Model?

What you'll learn
  • Define LLMs in one sentence
  • Understand how LLMs turn text into tokens and predict the next one
  • See the scale numbers that separate LLMs from smaller models

A Large Language Model is a neural network trained to predict the next word (more precisely, the next token) in a sequence of text. That sounds simple — and it is — but when you scale it to trillions of examples and hundreds of billions of parameters, "predict the next word" becomes "write essays, answer questions, debug code, translate languages, and reason about problems."

One-sentence definition
An LLM is a statistical text-prediction engine that got so good at completion that it started to look like understanding.

Step 1 — Text becomes tokens

LLMs do not see letters or words. They see tokens — sub-word chunks produced by a tokenizer. Here is how the sentence "Understanding LLMs is fun!" gets split:

Understanding LLMs is fun!
Each coloured chunk is one token7 tokens

On average 1 token ≈ 0.75 English words. Tokens matter because APIs charge per token and every model has a maximum token window (the "context length").

Step 2 — The model predicts the next token

Given the tokens so far, the model assigns a probability to every possible next token and picks one. Then it takes the updated sequence and predicts the token after that. It loops until it decides to stop (or hits your max_tokens limit).

Single generation step
Input text
Your prompt
Tokenize
Split into tokens
Predict
Next-token probs
Sample
Pick one token
Append & loop
Until stop

Step 3 — Scale is what makes it magical

The prediction mechanism has existed for decades. What changed is the scale of training data, model size, and compute. Here are the numbers that separate a "language model" from a large language model:

100B+
Parameters
internal weights the model learned
15T
Training tokens
text shown during training
200k
Context window
tokens the model can read at once
Parameters = what the model learned
Think of parameters as knobs the model tunes during training. A 100-billion-parameter model has 100 billion knobs — enough to encode a huge amount of world knowledge. You never tune them yourself; you just send prompts and read replies.

What LLMs are NOT

  • Not a database. They cannot reliably recall facts or citations.
  • Not deterministic. The same prompt can give different answers unless you set temperature to 0.
  • Not reasoning engines. They pattern-match on reasoning, which works most of the time — but can fail in surprising ways.
  • Not aware of anything after their training cutoff. For fresh information you have to feed it to them in the prompt (RAG) or give them a search tool.
You are ready for the next topic
Now that you know what an LLM is, the next page shows exactly how it differs from the "classical" machine learning you may have seen before.