What Is a Token?
A token is the smallest unit of text that a large language model reads, processes, and generates. LLMs do not operate on raw characters or whole words — they operate on tokens, which are chunks of text produced by a tokenizer algorithm applied to the training corpus.
A rough rule of thumb for English text:
- 1 token ≈ 4 characters ≈ ¾ of a word
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words
- 1 page of text ≈ 500–600 tokens
Why Tokens Instead of Words or Characters?
Characters are too granular — the vocabulary becomes enormous and learning meaningful patterns from individual letters is inefficient.
Words are better, but natural language has millions of words including rare words, names, code identifiers, and morphological variants. A fixed word vocabulary cannot handle unseen words.
Tokens (via algorithms like Byte Pair Encoding) strike a balance: common words become single tokens, rare words are split into subword pieces. This gives the model a manageable vocabulary (~50,000–150,000 entries) that can still represent any text.
Token Examples
"Hello, world!" → ["Hello", ",", " world", "!"] = 4 tokens
"unbelievable" → ["un", "believ", "able"] = 3 tokens
"https://example.com" → ["https", "://", "example", ".", "com"] = 5 tokens
"2024-03-15" → ["2024", "-", "03", "-", "15"] = 5 tokens
Code tends to tokenize less efficiently than prose — a line of code may use 2–3x more tokens than an equivalent amount of English text.
Tokens and Pricing
Every major LLM API charges by tokens — both input (prompt) tokens and output (completion) tokens. Understanding token counts is essential for cost estimation:
| Model | Input price | Output price |
|---|---|---|
| GPT-4o | $2.50 / 1M tokens | $10.00 / 1M tokens |
| Claude Opus 4 | $15.00 / 1M tokens | $75.00 / 1M tokens |
| Gemini 1.5 Pro | $1.25 / 1M tokens | $5.00 / 1M tokens |
Prices are approximate and change frequently — check provider documentation.
Counting Tokens
import { encoding_for_model } from "tiktoken";
const enc = encoding_for_model("gpt-4o");
const tokens = enc.encode("Hello, how many tokens is this?");
console.log(tokens.length); // → 8
enc.free();
Tokens and Context Windows
Every LLM has a context window — a maximum number of tokens it can process in a single call (input + output combined). Exceeding this limit causes an error. This makes token awareness critical when building RAG pipelines where you are injecting potentially large documents.
KnowledgeSDK tracks token usage across your account, so you always know how many tokens your extraction and search operations consume. The /v1/account endpoint returns a live usage summary including tokens_used for the current billing period.
const account = await sdk.account();
console.log(account.usage.tokens); // total tokens used this month
Tokens Are Not Words — Practical Implications
- A multilingual prompt costs more tokens than the same information in English (many languages tokenize less efficiently).
- Code, JSON, and URLs use more tokens per unit of information than prose.
- Whitespace and punctuation consume tokens.
- Repeated whitespace, excessive formatting, and HTML noise all inflate token counts — a key reason to clean web content before sending it to an LLM.