Fine dark grains falling through a matte-black hourglass — tokens.

AtomEons / Learn / calc / tools / token-counter

::calculator · Fast token estimates across Claude, GPT, and Gemini — char-rule approximation, not exact tokenization

Token Counter

A token is the atomic unit of billing and context for every modern LLM. If you do not know how many tokens are in your input, you do not know what your prompt costs, you do not know how much context you have left, and you cannot honestly forecast spend across a fleet of agents. This calculator gives you a fast, deterministic, provider-by-provider estimate from raw character count — the back-of-envelope number you use before you reach for the real tokenizer. The math is the classic chars/4 rule, with small per-family corrections. As of June 2026, the working approximation in production code at most labs is: Claude tokens land near chars/4 on English prose, GPT (cl100k_base / o200k_base) runs about 5% leaner because of more aggressive BPE merges on common English n-grams, and Gemini's SentencePiece vocabulary runs about 5% heavier because it shards uncommon words more eagerly. All three converge on roughly 0.75 tokens per English word, which is the second sanity-check most operators carry in their head. What this is good for: pre-flight cost checks, ballpark context-window math, comparing prompt sizes before you commit to a provider, sizing system prompts during prompt engineering. Sub-10% error on English prose, sub-20% on code, sub-30% on heavy Unicode (CJK, emoji, math). What this is NOT good for: billing reconciliation, hard context-window limits, cross-provider benchmarking, anything where the exact tokenizer matters. For any of those, paste your text into the actual tokenizer — `tiktoken` for OpenAI, the Anthropic SDK's `count_tokens` for Claude, the Gemini API's `countTokens` endpoint for Google. The cost estimates use June 2026 published list pricing: Claude 3.5 Sonnet at $3/M input, GPT-4o at $2.50/M input, Gemini 1.5 Pro at $1.25/M input (tier-1 prompts under 128K). Output tokens cost 4-6× more and are not estimated here because output length is a property of the response, not your prompt. The honest move is to treat these numbers as a fast estimate, ship the prompt, and reconcile against the real provider invoice at the end of the billing cycle.

::inputs

Character countchars

Total characters in your text (whitespace included). Use Ctrl+A then a char counter.

Word countwords

Whitespace-separated word count. Used as a sanity-check cross-tally.

Content type

Adjusts the per-provider correction factor. Code and heavy-Unicode text tokenize differently.

::result

Claude tokens (est.)

1,000 tokens

GPT tokens (est.)

950 tokens

Gemini tokens (est.)

1,050 tokens

::how this calculates

Starts from raw character count and applies the chars/4 rule to get a baseline token estimate. Then applies a per-provider correction: Claude stays at the baseline (factor 1.00), GPT runs about 5% leaner due to denser BPE merges on common English (factor 0.95), and Gemini runs about 5% heavier due to SentencePiece sharding (factor 1.05). All three numbers are rounded up because you pay for partial tokens.

::worked examples

Short prompt — single paragraph user query

chars: 400words: 70contentType: english_prose

A typical 70-word user prompt lands near 100 tokens on all three providers. Cost on Claude 3.5 Sonnet input: ~$0.0003. Negligible per-call, but multiply by 10K daily calls and you are at $3/day in input alone.

Medium document — 4K char system prompt

chars: 4000words: 700contentType: english_prose

A 700-word system prompt is the sweet spot for most agent setups. 1,000 tokens on Claude, 950 on GPT, 1,050 on Gemini. At $3/M input on Claude, this is $0.003 per call. With 100K calls/month you spend $300 just shipping the system prompt — which is why prompt caching matters.

Long context — 50K char research document

chars: 50000words: 8500contentType: english_prose

Pasting a research paper into context. ~12,500 tokens on Claude. Still well inside the 200K Claude 3.5 Sonnet window, but you are now paying $0.0375 per call at $3/M. Run it 1000 times and that is $37.50 in input alone — output will be 4-6× more expensive.

Code dump — 10K char file

chars: 10000words: 1200contentType: code

Code tokenizes denser than prose because identifiers and punctuation shard more aggressively. Real tokenizer counts will run ~10-15% higher than this estimate for code. Treat 2,500 tokens here as a floor; the actual count is closer to 2,800.

::what this does NOT capture

○The chars/4 rule is a population average for English prose. Individual prompts vary by ±15% even within English. Run the real tokenizer for anything that ships to billing.
○Per-provider correction factors (1.00 / 0.95 / 1.05) are coarse. They are derived from public tokenizer comparisons on English prose corpora. They do not capture vocabulary updates Anthropic, OpenAI, or Google ship between model releases.
○Code, JSON, and structured markup tokenize 10-20% denser than this estimate predicts. The content-type selector is shown but does not currently change the math — it is a marker for the operator to mentally adjust.
○Heavy Unicode (CJK, emoji, math symbols, agglutinative languages) tokenizes 30-100% denser. A 1,000-char Chinese document can hit 500-700 tokens, not 250. Do not use this calculator for non-Latin scripts.
○Output tokens are not estimated. Output length is a property of the response, not the prompt, and output pricing runs 4-6× input pricing on all three providers. Forecast output separately.
○Prompt caching (Anthropic, Google) and batch API discounts (OpenAI, Anthropic) can cut input costs by 50-90%. Cost estimates here are list-price uncached single-call. Real production cost is usually lower.
○June 2026 pricing snapshot: Claude 3.5 Sonnet $3/M input, GPT-4o $2.50/M input, Gemini 1.5 Pro $1.25/M input under 128K. Pricing changes; verify against the provider's current rate card before forecasting spend.
○Tokens are rounded up because providers bill on the ceiling. A 0.3-token fragment costs 1 token.

← back to /learn·/tools index →