lolb.link

Tokenizer

Language models break down text into small chunks called tokens. These tokens are common letter combinations that appear frequently in text. The models learn how these tokens relate to each other and use this to predict what comes next.

You can use the tool below to see how text gets broken into tokens and count how many tokens are in your text. You can also upload documents to analyse their token count. As a quick guide: 1 token is roughly 4 characters or ¾ of a word in English. So 100 tokens equals about 75 words.

0
Tokens
0
Characters
0
Words
0 tokens
Each colour represents a single token. Colours cycle so you can see where one token ends and the next begins — they have no other meaning. Watch how longer words split into multiple tokens, and how spaces and punctuation become tokens of their own.
Drop a text file here, or click to browse
Supports .txt, .md, .csv, .json, .html, .js, .ts, .py, .css, .xml, .yaml, .docx, .xlsx, .xls, .pdf — select multiple files at once
Recent sessions
History is session-only — nothing is saved or stored. It clears when you close the tab.

Token counts use a BPE approximation and are indicative, not exact.