What is a Large Language Model?
- ▸Define LLMs in one sentence
- ▸Understand how LLMs turn text into tokens and predict the next one
- ▸See the scale numbers that separate LLMs from smaller models
A Large Language Model is a neural network trained to predict the next word (more precisely, the next token) in a sequence of text. That sounds simple — and it is — but when you scale it to trillions of examples and hundreds of billions of parameters, "predict the next word" becomes "write essays, answer questions, debug code, translate languages, and reason about problems."
Step 1 — Text becomes tokens
LLMs do not see letters or words. They see tokens — sub-word chunks produced by a tokenizer. Here is how the sentence "Understanding LLMs is fun!" gets split:
On average 1 token ≈ 0.75 English words. Tokens matter because APIs charge per token and every model has a maximum token window (the "context length").
Step 2 — The model predicts the next token
Given the tokens so far, the model assigns a probability to every possible next token and picks one. Then it takes the updated sequence and predicts the token after that. It loops until it decides to stop (or hits your max_tokens limit).
Step 3 — Scale is what makes it magical
The prediction mechanism has existed for decades. What changed is the scale of training data, model size, and compute. Here are the numbers that separate a "language model" from a large language model:
What LLMs are NOT
- Not a database. They cannot reliably recall facts or citations.
- Not deterministic. The same prompt can give different answers unless you set temperature to 0.
- Not reasoning engines. They pattern-match on reasoning, which works most of the time — but can fail in surprising ways.
- Not aware of anything after their training cutoff. For fresh information you have to feed it to them in the prompt (RAG) or give them a search tool.