BERT Tokenizer
Count tokens and visualize how the BERT tokenizer splits your text. Paste anything below — tokenization runs in your browser.
Tokens
—
Characters
—
Words
—
Loading tokenizer…
Try this on Android in AIniverse
All 14 tokenizers in your pocket — plus the prompt store, AI models catalog, MCP servers, and more.
Open in AppHow it works
Tokenizers turn text into numbers a language model understands. Different models use different tokenizers — what counts as one token in GPT-4 may split into three for Llama 3, and vice versa. That affects your API bill (most providers charge per-token) and your context window (more tokens means fewer messages fit).
Tokenization happens entirely in your browser using Transformers.js. No text is sent to any server.
Why count tokens?
- Cost — OpenAI, Anthropic, Google all bill per input + output token. A 1,000-character paragraph may be 200 tokens for GPT-4 but 400 for an older tokenizer.
- Context window — every model has a hard token limit. Hit it and the request is truncated or rejected.
- Debugging — odd tokenization can cause garbled output. Whitespace, emoji, code blocks all tokenize unpredictably.