A token is a unit of text that a language model processes, such as a word, part of a word, punctuation mark, or special symbol.
1. What it is
- Language models do not usually read raw text character by character or word by word in the ordinary human sense.
- Instead, text is broken into tokens.
- A token might be:
- a full word
- part of a word
- punctuation
- a special marker
In simple terms:
- humans see text
- the model sees tokens
2. What problem token solves
Raw text needs to be split into manageable units before a language model can process it.
Tokenization solves this by turning text into smaller pieces that balance:
- flexibility
- storage efficiency
- coverage across many words and languages
This helps with:
- handling rare or unseen words
- representing punctuation and numbers cleanly
- making text manageable for the model
In practice, this means one human word is not always one model token.
3. Where you see it
Tokens matter in:
- prompt length limits
- API pricing based on token usage
- chunking for retrieval systems
- model input and output processing
- context window limits in chatbots
How they show up in LLM behavior:
- the model predicts the next token, not literally the next human word
- longer prompts use more of the context window
- unusual words may split into several smaller pieces
4. How it works internally
Intuition version
- Start with raw text.
- Break it into token pieces.
- Convert each token into an integer id.
- Feed those ids into the model.
So tokenization is basically:
text → token pieces → token ids
Tokenizer version
Modern models use tokenization algorithms that try to reuse common subword pieces.
This allows the system to handle:
- common words efficiently
- rare words by splitting them into smaller parts
- punctuation and symbols
- multiple languages
What happens next
After tokenization:
- each token gets an integer id
- each id is mapped to an embedding vector
- the transformer processes those embeddings
- the model predicts the next token id
Concrete example
The word: unbelievable
might be split into multiple tokens rather than treated as one whole word.
So when people say a model predicts the “next word,” that is a useful simplification, but more precisely it predicts the next token.
5. Background
- Modern language models use tokenization algorithms to split text into pieces that balance flexibility and efficiency.