Tokens and Tokenisation

A practical guide to understanding how AI models process text and why tokens matter for product teams.

What tokens are, how tokenisation affects AI behaviour and cost, and what designers and product teams need to know when building AI features.

22 May 20264 min read

What it is

Tokens are the units of text that AI language glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term work with. Rather than processing whole words, models break text down into smaller pieces called tokens before processing it.

A token is roughly equivalent to three or four characters of text. Common words like "the" or "is" are typically a single token. Longer or unusual words may be split into multiple tokens. Spaces, punctuation, and formatting also consume tokens.

Tokenisation is the glossaryProcessA process is a defined sequence of steps used to achieve a specific outcome.Open glossary term of converting text into this sequence of tokens before it is passed to the glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term.

Every part of an glossaryInteractionInteraction refers to any action a user takes within a product and how the system responds. It includes clicks, taps, gestures, and inputs that drive the user experience.Open glossary term — the glossarySystemA system is a collection of interconnected components that work together to achieve a specific function or outcome.Open glossary term glossaryPromptA prompt is the input or instruction given to an AI system to guide its output or response.Open glossary term, the conversation history, the user message, and the model's response — is counted in tokens. This affects both the cost of using AI and the amount of content that fits within a context window.

Understanding tokens helps you reason about cost, glossaryContextThe surrounding conditions that shape behaviour and decisions.Open glossary term limits, and why AI sometimes handles unusual words or languages differently.

When to use it

Understand when token awareness is practically useful. It matters most when:

It matters less when:

You are estimating or managing AI running costs

You are designing features that process large amounts of text

Context window limits are a constraint for your use case

You are working with multilingual content where token counts vary significantly

You are building products where response length needs to be controlled

You are using AI for short, infrequent tasks where cost and limits are not a concern

Key takeaway

Tokens are the currency of AI usage. Understanding them helps you manage cost and design features that behave reliably at scale.

How it works

Understand the basic mechanism. Before processing any text, a language glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term passes it through a tokeniser — a component that splits the text into the token sequences the model understands.

Different glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term use different tokenisers, so the same text can produce different token counts depending on which model is used.

When AI providers charge for usage, they typically charge per token — both for the input (everything sent to the glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term) and the output (the glossaryResponseA response is the data or result returned by a server after receiving a request.Open glossary term generated). Understanding this helps you estimate and control costs.

What this means for designers and product teams. Long guideSystem PromptsWhat system prompts do, how they define an AI's role and constraints, and what product and design teams need to know when working with them.Open guide, large documents, and verbose glossaryResponseA response is the data or result returned by a server after receiving a request.Open glossary term all cost more and consume more of the glossaryContextThe surrounding conditions that shape behaviour and decisions.Open glossary term window. Concise, well-structured prompts are both cheaper and more effective.

Multilingual content tokenises differently. Languages with more complex character sets, such as Chinese or Arabic, often produce more tokens per word than English, which affects both cost and glossaryContextThe surrounding conditions that shape behaviour and decisions.Open glossary term usage.

What to look for

Focus on:

Prompt length — whether system prompts and instructions are as concise as they can be

Response length — whether the model is generating longer responses than necessary

Document size — how much context space large inputs consume

Cost estimation — whether token counts have been factored into unit economics

Multilingual variation — whether token counts have been considered for non-English content

Where it goes wrong

Most issues come from: Token costs at small scale look trivial — at production scale, they are not.

Verbose system prompts that consume large amounts of context unnecessarily

No cost modelling until after launch, when usage is already significant

Assuming token counts are consistent across languages

Not setting output length limits and receiving unexpectedly long responses

Underestimating how quickly token costs accumulate at scale

What you get from it

Understanding tokens gives you:

A foundation for estimating and managing AI running costs

A clearer picture of how context window limits work in practice

Better decisions about prompt length and response design

More informed conversations with engineers about cost and performance

Key takeaway

Tokens are not just a technical detail — they are a cost and design constraint that should be factored in from the start.

FAQ

Common questions

A few practical answers to the questions that usually come up around this method.

What is a token in AI?

A token is the basic unit of text that an glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term glossaryProcessA process is a defined sequence of steps used to achieve a specific outcome.Open glossary term. Rather than reading whole words, models work with tokens — roughly three or four characters each. A sentence of ten words might contain fifteen to twenty tokens depending on the words used.

Why do AI companies charge per token?

Because processing tokens is the main computational cost of running a language glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term. The more tokens in an input and output, the more computation required. Charging per token is a direct proxy for the cost of generating a glossaryResponseA response is the data or result returned by a server after receiving a request.Open glossary term.

How many tokens is a typical page of text?

A standard page of English text contains roughly 500 words, which translates to approximately 600 to 750 tokens. This varies depending on the complexity of the vocabulary, punctuation, and formatting.

Does it cost more to process long conversations?

Yes. Every part of the conversation — including the full history — is passed to the glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term with each new message. As a conversation grows, the token count for each glossaryInteractionInteraction refers to any action a user takes within a product and how the system responds. It includes clicks, taps, gestures, and inputs that drive the user experience.Open glossary term increases, meaning each subsequent message costs more to glossaryProcessA process is a defined sequence of steps used to achieve a specific outcome.Open glossary term than the one before it.

Can I control how many tokens a model uses?

To a degree. You can set maximum output length limits to prevent the glossaryModelA model is a system or representation used to process data and generate outputs, often trained to perform specific tasks.Open glossary term generating unnecessarily long glossaryResponseA response is the data or result returned by a server after receiving a request.Open glossary term. You can also write concise glossaryPromptA prompt is the input or instruction given to an AI system to guide its output or response.Open glossary term and manage what context is included. Beyond that, the model determines how many tokens it needs to generate a response.

Quick take

Understanding tokens helps you understand AI costs, context limits, and occasionally why AI behaves unexpectedly with certain inputs.

Related Services

Artificial Intelligence