lolb.link

Tokenizer

Language models break down text into small chunks called tokens. These tokens are common letter combinations that appear frequently in text. The models learn how these tokens relate to each other and use this to predict what comes next.

You can use the tool below to see how text gets broken into tokens and count how many tokens are in your text. You can also upload documents to analyse their token count. As a quick guide: 1 token is roughly 4 characters or ¾ of a word in English. So 100 tokens equals about 75 words.

Tokens

Characters

Words

0 tokens

Drop a text file here, or click to browse

Supports .txt, .md, .csv, .json, .html, .js, .ts, .py, .css, .xml, .yaml, .docx, .xlsx, .xls, .pdf — select multiple files at once

Recent sessions

History is session-only — nothing is saved or stored. It clears when you close the tab.

Token counts use a BPE approximation and are indicative, not exact.