L41 · Pilot~25 min · free · cc-by 4.0

Long-context strategy: when 200K is right, when chunking wins

Long context is a tool, not a default · know what degrades, what costs you, and when chunking beats stuffing.

::TL;DR · the whole lesson in three lines

MOVELong context is a tool, not a default · know what degrades, what costs you, and when chunking beats stuffing.
DRILLYou will take one real long document, run the same question against three strategies, and learn which one your real work wants.
WINYou have real cost math for one of your long-context workflows.

jump to drill ↓or read the full concept first →

::concept · what's actually happening

Modern frontier models support context windows from 200K to 2M+ tokens · enough to fit entire codebases, whole books, or hundreds of meeting transcripts in a single prompt. The capability is real. The performance ceiling at the edge of the window is also real.

read full concept · 4 more paragraphs →

Performance does not stay flat as context grows · models reliably handle the first ~20% and the last ~20% of a long context, with a measurable 'lost in the middle' degradation through the middle band. A fact buried at token 80K of a 200K prompt is genuinely harder to retrieve than the same fact at token 5K.

Cost is the other tax · long context inputs cost real money per token, and a 200K-token prompt is two orders of magnitude more expensive than a 2K-token prompt. If you are sending the same long context on every turn of an interactive session, you are billing yourself for inertia.

Chunking + retrieval beats raw long-context when the question can be answered from a slice you can identify cheaply · split the source into chunks, find the relevant 5-20K tokens, send only those. The retrieval step adds latency but slashes cost and improves accuracy on middle-of-document facts.

Caching is the structural answer when you genuinely need to ask many questions against the same long context · Anthropic's prompt caching (and similar features elsewhere) let you pay full price once and ~10% on every subsequent reuse. If you are not using it on stable long contexts, you are paying for nothing.

::drill · do the thing

You will take one real long document, run the same question against three strategies, and learn which one your real work wants.

::L41 drill · copy-paste into any AI chat

I have a long document I work with: [DESCRIBE · e.g. 'a 60-page contract,' 'a 200-page technical manual,' 'six months of journal entries']. Estimated total tokens: [ESTIMATE]. I have a recurring question I ask against it: [DESCRIBE THE QUESTION]. Walk me through three strategies, with real cost math: Strategy A · dump the whole document into a single long-context prompt with the question. Strategy B · chunk the document by section, retrieve the 3 most relevant chunks, send those plus the question. Strategy C · use prompt caching on the full document and ask the question against the cached version (assume I will ask 10+ similar questions). For each strategy, give me: estimated input tokens per query, estimated cost per query, expected accuracy tradeoffs, and a verdict on which strategy I should use given my query pattern.

I have a long document I work with: [DESCRIBE · e.g. 'a 60-page contract,' 'a 200-page technical manual,' 'six months of journal entries']. Estimated total tokens: [ESTIMATE]. I have a recurring question I ask against it: [DESCRIBE THE QUESTION]. Walk me through three strategies, with real cost math: Strategy A · dump the whole document into a single long-context prompt with the question. Strategy B · chunk the document by section, retrieve the 3 most relevant chunks, send those plus the question. Strategy C · use prompt caching on the full document and ask the question against the cached version (assume I will ask 10+ similar questions). For each strategy, give me: estimated input tokens per query, estimated cost per query, expected accuracy tradeoffs, and a verdict on which strategy I should use given my query pattern.

::or open one in a new tab — then paste

Claude↗ChatGPT↗Gemini↗

::steps

01Pick one real long document you query repeatedly.
02Estimate its token count (1 page ≈ 500 tokens roughly).
03Run the prompt and get cost math for each strategy.
04Pick the winning strategy and run one real query.
05If caching wins, look up whether your AI client actually supports it (Claude API does).
06Compute your real cost-per-query and decide if it's worth optimizing further.

::outcome · what should be true

You have real cost math for one of your long-context workflows.
You picked a strategy and ran one real query under it.
You can name the 'lost in the middle' phenomenon and what it implies.
You know whether prompt caching is available to you and how to use it.

::trap · the most common failure

Operators stuff entire codebases into every prompt because 'the model can handle it,' then wonder why their bill tripled and answers got vaguer. Long context is a load-bearing tool · use it when needed, cache it when reused, chunk it when slicing is cheaper.

::end of the curriculum

You're at Pilot level. There's no Level 6.

The next move is doing the work, not another lesson. If you want operator-grade infrastructure, that's /orangebox. If you want the lab's working journal, /founders-view. If you want to collaborate on the curriculum itself, the source is public on GitHub.

::other lessons at Pilot level

L12~20 min

← back to /learn full lesson library →

Long-context strategy: when 200K is right, when chunking wins

You're at Pilot level. There's no Level 6.

Outgrowing the chat box — when chat isn't the right surface anymore

Receipts and paper trail — audit your own AI use

AI for kids and teachers — the next-generation curriculum

The senior-engineer pattern — talk to AI like a senior

Open weights vs closed weights

AI receipts: building your own audit trail

Voice cloning: ethics and practical workflows