A token is the smallest unit of text that AI models process. Instead of reading words like humans do, AI models like GPT-4o, Claude, and Gemini break text into tokens — which can be whole words, word fragments, or individual characters. Understanding tokens is essential because API pricing is based on token count, not word count.
⚡ Quick facts about tokens:
- 1 token ≈ 4 characters or ≈ 0.75 words in English
- The word “hello” is 1 token, but “tokenization” is 2-3 tokens
- The same text produces different token counts on different models
- GPT-4o costs $2.50 per 1 million input tokens
- Count your tokens for free →
📋 In this guide:
What is Tokenization?
Tokenization is the process of breaking text into smaller pieces called tokens that AI models can understand and process. Think of it like breaking a sentence into puzzle pieces — but instead of splitting at word boundaries, the model splits at boundaries that are most efficient for its vocabulary.
What is a Token in ChatGPT?
If you are using ChatGPT or the OpenAI API, you are engaging with the GPT-4o or GPT-4o Mini models which rely on the `o200k_base` tokenizer. When asking "what is a token in ChatGPT?", the answer is slightly more complex than a single word: a token for ChatGPT usually represents about 4 characters of text or roughly 0.75 of an average English word.
For example, the sentence “I love programming” might tokenize as:
The centerdot (·) represents a space character — in most tokenizers, the space is attached to the following word as a single token. This is more efficient than treating spaces as separate tokens.
But a less common word like “cryptocurrency” might be split into multiple tokens:
This happens because “cryptocurrency” isn't common enough to merit its own single-token entry in the vocabulary. The tokenizer finds the most efficient way to represent it using existing sub-word pieces.
How Does BPE Tokenization Work?
Most modern AI models use a technique called Byte Pair Encoding (BPE) for tokenization. Here's how it works in simple terms:
- Start with characters — The tokenizer begins by treating every character as its own token
- Find the most common pair — It looks at the training data and finds which pair of adjacent tokens appears most frequently
- Merge them — That pair becomes a new token in the vocabulary
- Repeat — This process repeats until the vocabulary reaches a target size (e.g., 100,000 or 200,000 tokens)
The result is a vocabulary where common words like “the”, “is”, and “a” are single tokens, while rare or technical words are split into smaller sub-word pieces. This gives the model flexibility to handle any input text, including words it has never seen before.
🧠 Key insight: Token vocabulary size matters
GPT-4o uses o200k_base with a 200,000-token vocabulary — double the size of GPT-3.5's cl100k_base (100,000 tokens). A larger vocabulary means more words can be represented as single tokens, which means fewer tokens for the same text and lower costs.
Why Do Different AI Models Give Different Token Counts?
Each AI provider trains their own tokenizer on their own data, resulting in different vocabularies. The exact same text will produce different token counts depending on which model you're using. Here's a comparison:
For simple English text, the differences are usually small (5-15%). But for non-English languages, code, or technical content, the differences can be much larger. Use our token calculator to compare exact counts across models.
Tokens vs. Words vs. Characters: What's the Difference?
These three measurements are related but not interchangeable:
The key rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words for English text. This means 1,000 words is roughly 1,300-1,500 tokens. But this ratio varies significantly for non-English languages — Chinese, Japanese, and Hindi text typically uses 2-3x more tokens per word.
How Do Tokens Affect API Pricing?
Every major LLM API charges based on token count, with separate rates for input tokens (your prompt) and output tokens (the model's response). Output tokens are always more expensive because they require more computation.
The price difference is dramatic: processing 1,000 words costs $0.0001 with Gemini Flash but $0.014 with GPT-4 Turbo — a 140x price difference. For full pricing details, see our LLM Pricing Comparison.
How to Count Tokens for Free
The easiest way to count tokens is to use our free Token Calculator. It runs entirely in your browser using the same tiktoken library that OpenAI uses — your text never leaves your device.
Method 1: Use Our Web Calculator (Recommended)
- Go to the Token Calculator homepage
- Select your AI model (GPT-4o, Claude, Gemini, etc.)
- Type or paste your text — token count updates in real time
- Toggle the Token Visualizer to see each individual token
Method 2: Use Our Free API
curl "/api/count-tokens?text=your+text+here&model=gpt-4o"
# Response:
# { "counts": { "tokens": 5, "words": 4, "characters": 19 } }Method 3: Python with tiktoken
import tiktoken
enc = tiktoken.get_encoding("o200k_base") # GPT-4o
tokens = enc.encode("Hello, world!")
print(f"Token count: {len(tokens)}") # Output: 4Frequently Asked Questions
How many tokens is 1,000 words?
In English, 1,000 words is approximately 1,300-1,500 tokens with GPT-4o's o200k_base tokenizer. The exact count depends on word complexity — simple words like “the” are 1 token, while technical terms may be 2-4 tokens. Use our calculator for exact counts.
Is a token the same as a word?
No. A token can be a whole word, part of a word, or even a single character. Common words like “the” are typically 1 token, while less common words like “tokenization” might be split into 2-3 tokens. On average, 1 token ≈ 0.75 English words.
Do spaces and punctuation count as tokens?
Yes. Spaces are often merged with the following word as a single token (e.g., “ hello” with a leading space is 1 token). Punctuation marks like periods and commas are usually individual tokens. Line breaks may count as 1-2 tokens.
How much does 1 million tokens cost?
Costs vary widely by model: Gemini 1.5 Flash charges just $0.075 per 1M input tokens (cheapest), while GPT-4 Turbo costs $10.00 per 1M (most expensive). See the full breakdown on our pricing comparison page.
Why does GPT-4o produce fewer tokens than GPT-3.5?
GPT-4o uses the o200k_base encoding with a 200,000-token vocabulary — double the size of GPT-3.5's cl100k_base (100,000 tokens). More vocabulary entries mean more words can be represented as single tokens, resulting in fewer tokens overall and lower costs per character.
📚 Related articles:
- LLM API Pricing Comparison 2026 — Compare costs across all major models
- GPT-4o Token Calculator — Count GPT-4o tokens with exact o200k_base encoding
- Claude Token Calculator — Count Anthropic Claude tokens
- Token Calculator — Free real-time token counter for all AI models