
::calculator · Fast token estimates across Claude, GPT, and Gemini — char-rule approximation, not exact tokenization
Token Counter
::inputs
Total characters in your text (whitespace included). Use Ctrl+A then a char counter.
Whitespace-separated word count. Used as a sanity-check cross-tally.
Adjusts the per-provider correction factor. Code and heavy-Unicode text tokenize differently.
::result
Claude tokens (est.)
1,000 tokens
GPT tokens (est.)
950 tokens
Gemini tokens (est.)
1,050 tokens
::how this calculates
Starts from raw character count and applies the chars/4 rule to get a baseline token estimate. Then applies a per-provider correction: Claude stays at the baseline (factor 1.00), GPT runs about 5% leaner due to denser BPE merges on common English (factor 0.95), and Gemini runs about 5% heavier due to SentencePiece sharding (factor 1.05). All three numbers are rounded up because you pay for partial tokens.
::worked examples
Short prompt — single paragraph user query
A typical 70-word user prompt lands near 100 tokens on all three providers. Cost on Claude 3.5 Sonnet input: ~$0.0003. Negligible per-call, but multiply by 10K daily calls and you are at $3/day in input alone.
Medium document — 4K char system prompt
A 700-word system prompt is the sweet spot for most agent setups. 1,000 tokens on Claude, 950 on GPT, 1,050 on Gemini. At $3/M input on Claude, this is $0.003 per call. With 100K calls/month you spend $300 just shipping the system prompt — which is why prompt caching matters.
Long context — 50K char research document
Pasting a research paper into context. ~12,500 tokens on Claude. Still well inside the 200K Claude 3.5 Sonnet window, but you are now paying $0.0375 per call at $3/M. Run it 1000 times and that is $37.50 in input alone — output will be 4-6× more expensive.
Code dump — 10K char file
Code tokenizes denser than prose because identifiers and punctuation shard more aggressively. Real tokenizer counts will run ~10-15% higher than this estimate for code. Treat 2,500 tokens here as a floor; the actual count is closer to 2,800.
::what this does NOT capture
- ○The chars/4 rule is a population average for English prose. Individual prompts vary by ±15% even within English. Run the real tokenizer for anything that ships to billing.
- ○Per-provider correction factors (1.00 / 0.95 / 1.05) are coarse. They are derived from public tokenizer comparisons on English prose corpora. They do not capture vocabulary updates Anthropic, OpenAI, or Google ship between model releases.
- ○Code, JSON, and structured markup tokenize 10-20% denser than this estimate predicts. The content-type selector is shown but does not currently change the math — it is a marker for the operator to mentally adjust.
- ○Heavy Unicode (CJK, emoji, math symbols, agglutinative languages) tokenizes 30-100% denser. A 1,000-char Chinese document can hit 500-700 tokens, not 250. Do not use this calculator for non-Latin scripts.
- ○Output tokens are not estimated. Output length is a property of the response, not the prompt, and output pricing runs 4-6× input pricing on all three providers. Forecast output separately.
- ○Prompt caching (Anthropic, Google) and batch API discounts (OpenAI, Anthropic) can cut input costs by 50-90%. Cost estimates here are list-price uncached single-call. Real production cost is usually lower.
- ○June 2026 pricing snapshot: Claude 3.5 Sonnet $3/M input, GPT-4o $2.50/M input, Gemini 1.5 Pro $1.25/M input under 128K. Pricing changes; verify against the provider's current rate card before forecasting spend.
- ○Tokens are rounded up because providers bill on the ceiling. A 0.3-token fragment costs 1 token.