BERT Tokenizer

Count tokens and visualize how the BERT tokenizer splits your text. Paste anything below — tokenization runs in your browser.

Tokenizer

Text

Tokens

—

Characters

—

Words

—

Loading tokenizer…

Try this on Android in AIniverse

All 14 tokenizers in your pocket — plus the prompt store, AI models catalog, MCP servers, and more.

Open in App

How it works

Tokenizers turn text into numbers a language model understands. Different models use different tokenizers — what counts as one token in GPT-4 may split into three for Llama 3, and vice versa. That affects your API bill (most providers charge per-token) and your context window (more tokens means fewer messages fit).

Tokenization happens entirely in your browser using Transformers.js. No text is sent to any server.

Why count tokens?

Cost — OpenAI, Anthropic, Google all bill per input + output token. A 1,000-character paragraph may be 200 tokens for GPT-4 but 400 for an older tokenizer.
Context window — every model has a hard token limit. Hit it and the request is truncated or rejected.
Debugging — odd tokenization can cause garbled output. Whitespace, emoji, code blocks all tokenize unpredictably.

BERT Tokenizer

Try this on Android in AIniverse

How it works

Why count tokens?

Other tokenizers