Token

A token is a unit of text that a language model processes, such as a word, part of a word, punctuation mark, or special symbol.

1. What it is

Language models do not usually read raw text character by character or word by word in the ordinary human sense.
Instead, text is broken into tokens.
A token might be:
- a full word
- part of a word
- punctuation
- a special marker

In simple terms:

humans see text
the model sees tokens

2. What problem token solves

Raw text needs to be split into manageable units before a language model can process it.

Tokenization solves this by turning text into smaller pieces that balance:

flexibility
storage efficiency
coverage across many words and languages

This helps with:

handling rare or unseen words
representing punctuation and numbers cleanly
making text manageable for the model

In practice, this means one human word is not always one model token.

3. Where you see it

Tokens matter in:

prompt length limits
API pricing based on token usage
chunking for retrieval systems
model input and output processing
context window limits in chatbots

How they show up in LLM behavior:

the model predicts the next token, not literally the next human word
longer prompts use more of the context window
unusual words may split into several smaller pieces

4. How it works internally

Intuition version

Start with raw text.
Break it into token pieces.
Convert each token into an integer id.
Feed those ids into the model.

So tokenization is basically:

text → token pieces → token ids

Tokenizer version

Modern models use tokenization algorithms that try to reuse common subword pieces.

This allows the system to handle:

common words efficiently
rare words by splitting them into smaller parts
punctuation and symbols
multiple languages

What happens next

After tokenization:

each token gets an integer id
each id is mapped to an embedding vector
the transformer processes those embeddings
the model predicts the next token id

Concrete example

The word: unbelievable

might be split into multiple tokens rather than treated as one whole word.

So when people say a model predicts the “next word,” that is a useful simplification, but more precisely it predicts the next token.

5. Background

Modern language models use tokenization algorithms to split text into pieces that balance flexibility and efficiency.

Notes

Explorer