A token is a unit of text that a language model processes, such as a word, part of a word, punctuation mark, or special symbol.

1. What it is

  • Language models do not usually read raw text character by character or word by word in the ordinary human sense.
  • Instead, text is broken into tokens.
  • A token might be:
    • a full word
    • part of a word
    • punctuation
    • a special marker

In simple terms:

  • humans see text
  • the model sees tokens

2. What problem token solves

Raw text needs to be split into manageable units before a language model can process it.

Tokenization solves this by turning text into smaller pieces that balance:

  • flexibility
  • storage efficiency
  • coverage across many words and languages

This helps with:

  • handling rare or unseen words
  • representing punctuation and numbers cleanly
  • making text manageable for the model

In practice, this means one human word is not always one model token.

3. Where you see it

Tokens matter in:

  • prompt length limits
  • API pricing based on token usage
  • chunking for retrieval systems
  • model input and output processing
  • context window limits in chatbots

How they show up in LLM behavior:

  • the model predicts the next token, not literally the next human word
  • longer prompts use more of the context window
  • unusual words may split into several smaller pieces

4. How it works internally

Intuition version

  • Start with raw text.
  • Break it into token pieces.
  • Convert each token into an integer id.
  • Feed those ids into the model.

So tokenization is basically:

text token pieces token ids

Tokenizer version

Modern models use tokenization algorithms that try to reuse common subword pieces.

This allows the system to handle:

  • common words efficiently
  • rare words by splitting them into smaller parts
  • punctuation and symbols
  • multiple languages

What happens next

After tokenization:

  1. each token gets an integer id
  2. each id is mapped to an embedding vector
  3. the transformer processes those embeddings
  4. the model predicts the next token id

Concrete example

The word: unbelievable

might be split into multiple tokens rather than treated as one whole word.

So when people say a model predicts the “next word,” that is a useful simplification, but more precisely it predicts the next token.


5. Background

  • Modern language models use tokenization algorithms to split text into pieces that balance flexibility and efficiency.