Token

The basic unit of text processed by an LLM — roughly 3/4 of a word in English — that models use to read and generate language.

What Is a Token?

A token is the smallest unit of text that a large language model reads, processes, and generates. LLMs do not operate on raw characters or whole words — they operate on tokens, which are chunks of text produced by a tokenizer algorithm applied to the training corpus.

A rough rule of thumb for English text:

1 token ≈ 4 characters ≈ ¾ of a word
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words
1 page of text ≈ 500–600 tokens

Why Tokens Instead of Words or Characters?

Characters are too granular — the vocabulary becomes enormous and learning meaningful patterns from individual letters is inefficient.

Words are better, but natural language has millions of words including rare words, names, code identifiers, and morphological variants. A fixed word vocabulary cannot handle unseen words.

Tokens (via algorithms like Byte Pair Encoding) strike a balance: common words become single tokens, rare words are split into subword pieces. This gives the model a manageable vocabulary (~50,000–150,000 entries) that can still represent any text.

Token Examples

"Hello, world!"         → ["Hello", ",", " world", "!"]           = 4 tokens
"unbelievable"          → ["un", "believ", "able"]                 = 3 tokens
"https://example.com"   → ["https", "://", "example", ".", "com"] = 5 tokens
"2024-03-15"            → ["2024", "-", "03", "-", "15"]           = 5 tokens

Code tends to tokenize less efficiently than prose — a line of code may use 2–3x more tokens than an equivalent amount of English text.

Tokens and Pricing

Every major LLM API charges by tokens — both input (prompt) tokens and output (completion) tokens. Understanding token counts is essential for cost estimation:

Model	Input price	Output price
GPT-4o	$2.50 / 1M tokens	$10.00 / 1M tokens
Claude Opus 4	$15.00 / 1M tokens	$75.00 / 1M tokens
Gemini 1.5 Pro	$1.25 / 1M tokens	$5.00 / 1M tokens

Prices are approximate and change frequently — check provider documentation.

Counting Tokens

import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4o");
const tokens = enc.encode("Hello, how many tokens is this?");
console.log(tokens.length); // → 8
enc.free();

Tokens and Context Windows

Every LLM has a context window — a maximum number of tokens it can process in a single call (input + output combined). Exceeding this limit causes an error. This makes token awareness critical when building RAG pipelines where you are injecting potentially large documents.

KnowledgeSDK tracks token usage across your account, so you always know how many tokens your extraction and search operations consume. The /v1/account endpoint returns a live usage summary including tokens_used for the current billing period.

const account = await sdk.account();
console.log(account.usage.tokens); // total tokens used this month

Tokens Are Not Words — Practical Implications

A multilingual prompt costs more tokens than the same information in English (many languages tokenize less efficiently).
Code, JSON, and URLs use more tokens per unit of information than prose.
Whitespace and punctuation consume tokens.
Repeated whitespace, excessive formatting, and HTML noise all inflate token counts — a key reason to clean web content before sending it to an LLM.

Related Terms

LLMsbeginner

Tokenization

The process of converting raw text into a sequence of tokens that an LLM can process using a vocabulary-based algorithm like BPE.

RAG & Retrievalbeginner

Context Window

The maximum number of tokens an LLM can process in a single inference call, including both input and output.

LLMsbeginner

Large Language Model

A neural network trained on vast text corpora that can generate, summarize, translate, and reason about language.

← Throughput Token Bucket →

Try it now

Build with Token using one API.

Extract, index, and search any web content. First 1,000 requests free.

GET API KEY →

← Back to glossary