AI

Tokens and Tokenisation

A practical guide to understanding how AI models process text and why tokens matter for product teams.

What tokens are, how tokenisation affects AI behaviour and cost, and what designers and product teams need to know when building AI features.

22 May 20264 min read

What it is

Tokens are the units of text that AI language work with. Rather than processing whole words, models break text down into smaller pieces called tokens before processing it.

A token is roughly equivalent to three or four characters of text. Common words like "the" or "is" are typically a single token. Longer or unusual words may be split into multiple tokens. Spaces, punctuation, and formatting also consume tokens.

Tokenisation is the of converting text into this sequence of tokens before it is passed to the .

Every part of an — the , the conversation history, the user message, and the model's response — is counted in tokens. This affects both the cost of using AI and the amount of content that fits within a context window.

Understanding tokens helps you reason about cost, limits, and why AI sometimes handles unusual words or languages differently.

When to use it

Understand when token awareness is practically useful. It matters most when:

It matters less when:

You are estimating or managing AI running costs
You are designing features that process large amounts of text
Context window limits are a constraint for your use case
You are working with multilingual content where token counts vary significantly
You are building products where response length needs to be controlled
You are using AI for short, infrequent tasks where cost and limits are not a concern

Key takeaway

Tokens are the currency of AI usage. Understanding them helps you manage cost and design features that behave reliably at scale.

How it works

Understand the basic mechanism. Before processing any text, a language passes it through a tokeniser — a component that splits the text into the token sequences the model understands.

Different use different tokenisers, so the same text can produce different token counts depending on which model is used.

When AI providers charge for usage, they typically charge per token — both for the input (everything sent to the ) and the output (the generated). Understanding this helps you estimate and control costs.

What this means for designers and product teams. Long , large documents, and verbose all cost more and consume more of the window. Concise, well-structured prompts are both cheaper and more effective.

Multilingual content tokenises differently. Languages with more complex character sets, such as Chinese or Arabic, often produce more tokens per word than English, which affects both cost and usage.

What to look for

Focus on:

Prompt length — whether system prompts and instructions are as concise as they can be
Response length — whether the model is generating longer responses than necessary
Document size — how much context space large inputs consume
Cost estimation — whether token counts have been factored into unit economics
Multilingual variation — whether token counts have been considered for non-English content

Where it goes wrong

Most issues come from: Token costs at small scale look trivial — at production scale, they are not.

Verbose system prompts that consume large amounts of context unnecessarily
No cost modelling until after launch, when usage is already significant
Assuming token counts are consistent across languages
Not setting output length limits and receiving unexpectedly long responses
Underestimating how quickly token costs accumulate at scale

What you get from it

Understanding tokens gives you:

A foundation for estimating and managing AI running costs
A clearer picture of how context window limits work in practice
Better decisions about prompt length and response design
More informed conversations with engineers about cost and performance

Key takeaway

Tokens are not just a technical detail — they are a cost and design constraint that should be factored in from the start.

FAQ

Common questions

A few practical answers to the questions that usually come up around this method.

What is a token in AI?

A token is the basic unit of text that an . Rather than reading whole words, models work with tokens — roughly three or four characters each. A sentence of ten words might contain fifteen to twenty tokens depending on the words used.

Why do AI companies charge per token?

Because processing tokens is the main computational cost of running a language . The more tokens in an input and output, the more computation required. Charging per token is a direct proxy for the cost of generating a .

How many tokens is a typical page of text?

A standard page of English text contains roughly 500 words, which translates to approximately 600 to 750 tokens. This varies depending on the complexity of the vocabulary, punctuation, and formatting.

Does it cost more to process long conversations?

Yes. Every part of the conversation — including the full history — is passed to the with each new message. As a conversation grows, the token count for each increases, meaning each subsequent message costs more to than the one before it.

Can I control how many tokens a model uses?

To a degree. You can set maximum output length limits to prevent the generating unnecessarily long . You can also write concise and manage what context is included. Beyond that, the model determines how many tokens it needs to generate a response.

Quick take

Understanding tokens helps you understand AI costs, context limits, and occasionally why AI behaves unexpectedly with certain inputs.

Related Services

LET'S WORK TOGETHER

Ready to improve your product?

UX, research and product leadership for teams tackling complex digital services. The work usually starts where things have become harder than they need to be: unclear journeys, inconsistent products, competing priorities, or teams trying to move forward without a clear direction. I help simplify the problem, shape the right next step, and turn complexity into something people can actually use.

Previous feedback

Will Parkhouse

Senior Content Designer

01/20