The short version
When you talk to ChatGPT, Claude, or Gemini, you're talking to an LLM. These systems have read enormous amounts of text (books, websites, code, conversations) and learned patterns in how language works. They use those patterns to generate responses that are coherent, contextual, and often surprisingly useful.
They don't "understand" the way humans do. They predict the most likely next word, billions of times, very quickly. But the result is powerful enough to write code, explain concepts, translate languages, and reason through problems.
How it works
The key concepts:
- Training: the model reads billions of words and adjusts millions (or billions) of internal parameters to predict what comes next. This takes weeks on thousands of specialised chips.
- Tokens: LLMs don't read words; they read tokens (roughly ¾ of a word). "Understanding" becomes three tokens: "under", "stand", "ing".
- Context window: how much text the model can consider at once. Early models handled ~4,000 tokens. Modern ones handle 100,000 to 1,000,000+.
- Inference: when you send a prompt and get a response, that's inference. The model generates one token at a time, each informed by everything before it.
- Temperature: a setting that controls randomness. Low temperature = more predictable responses. High temperature = more creative, more varied.
The "large" in LLM refers to the number of parameters, the internal dials the model adjusts during training. GPT-4 and Claude have hundreds of billions. More parameters generally means more capability, but also more cost to run.
Why it matters
LLMs are the foundation of the current AI wave. They power chatbots, coding assistants, search engines, and content tools. Understanding what they can and can't do (that they're pattern matchers, not thinkers) helps you use them more effectively and spot their limitations.