What are tokens?

The short version

LLMs don't read words. They read tokens. A token might be a whole word ("hello"), part of a word ("un" + "believe" + "able"), or a single character. On average, one token is about 0.75 words, or roughly 4 characters in English.

This matters because tokens determine three things: how much a request costs, how long a response takes, and how much text the model can handle at once.

How it works

When you send a prompt to an LLM, the text is split into tokens before the model processes it. The sentence "What is an API?" becomes roughly 5 tokens. A 1,000-word article is about 1,300 tokens.

Key concepts:

Input tokens: the text you send (your prompt, system instructions, any context). You pay for these.
Output tokens: the text the model generates in response. You pay for these too, usually at a higher rate.
Context window: the maximum number of tokens the model can handle in a single conversation (input + output combined). Claude's context window is up to 200,000 tokens. GPT-4 Turbo handles 128,000.
Token limits on output: even within a large context window, models have a maximum output length per response. This is separate from the context window.

Pricing is per token. For example, Claude Sonnet might cost $3 per million input tokens and $15 per million output tokens. A typical API call might use 2,000 input tokens and 500 output tokens, costing fractions of a penny.

You can estimate token counts before sending a request. Most API providers offer a tokeniser tool, and a rough rule of thumb works for planning: 1 token ≈ 4 characters ≈ 0.75 words.

Why it matters

Understanding tokens helps you estimate costs, design prompts that fit within limits, and debug why a response got cut off. If you're building anything that calls an LLM API, token counts are the unit of measurement for everything: pricing, speed, and capacity.

The short version

How it works

Why it matters

//Read more