RAG vs long context: when to retrieve, when to dump
RAG fetches the right slice of your data at query time · long context stuffs everything in · know which problem you actually have.
::TL;DR · the whole lesson in three lines
- MOVERAG fetches the right slice of your data at query time · long context stuffs everything in · know which problem you actually have.
- DRILLYou will take one real corpus of yours and decide honestly whether it wants RAG, long context, or hybrid · then prove the choice with a small test.
- WINYou can estimate the token size of one of your real corpora.
::concept · what's actually happening
RAG (Retrieval-Augmented Generation) is a pattern, not a product · at query time, you search your data for relevant chunks, then put those chunks plus the user question into the prompt. The model answers using both the retrieved context and its trained knowledge.
read full concept · 4 more paragraphs →collapse concept ↑
Long context is the opposite strategy · stuff all the relevant material directly into the prompt and let the model find what it needs. Modern frontier models support 200K, 1M, even 2M tokens of context · which means many problems that needed RAG five years ago no longer do.
The honest decision tree: if your data fits comfortably in the context window with room to spare, just dump it · long context is simpler, more accurate, and avoids the retrieval-quality bottleneck. If your data is much larger than the window or you only need a small slice at a time, RAG earns its complexity.
Retrieval quality is the silent killer of RAG systems · if your retrieval grabs the wrong chunks, the model answers confidently from wrong context and you cannot tell from the output. Most 'RAG hallucinations' are actually retrieval failures dressed up as model failures.
Hybrid approaches win in production · use RAG to narrow to the right neighborhood, then put that neighborhood (5-50K tokens) into long context and let the model synthesize. Pure-pure of either approach is rarely the answer at scale.
::drill · do the thing
You will take one real corpus of yours and decide honestly whether it wants RAG, long context, or hybrid · then prove the choice with a small test.
::L36 drill · copy-paste into any AI chat
I am deciding between RAG and long context for this real corpus of mine: [DESCRIBE · e.g. '400 of my journal entries,' '12 PDFs of legal docs,' 'all my Slack messages from one channel for the year']. Help me decide: 1) what is the approximate total token size of this corpus (rule-of-thumb me an estimate), 2) does it fit in a 200K context window? a 1M? 3) at query time, do I typically need the whole thing or a small slice? 4) what is the worst-case if retrieval grabs the wrong chunk · would I notice? Based on those answers, give me a verdict: pure long-context, pure RAG, or hybrid · and the simplest possible first implementation for the winning strategy.
::steps
- 01Pick one real corpus you have on disk or in a tool.
- 02Estimate or measure its token count (1 page ≈ 500 tokens roughly).
- 03Run the prompt and get a verdict.
- 04If long-context wins, build a simple prompt that dumps it all in and asks one question.
- 05If RAG wins, sketch the retrieval step (even pseudocode is fine).
- 06Run one real query end-to-end and grade the answer.
::outcome · what should be true
- You can estimate the token size of one of your real corpora.
- You have a defensible verdict (RAG / long-context / hybrid) for that corpus.
- You ran one real query against the winning approach.
- You can name the retrieval-failure mode that would bite you.
::trap · the most common failure
Operators reach for RAG because it sounds advanced, then build a vector database for a corpus that would have fit in a single Claude prompt. Long context is usually the simpler right answer · use RAG when the data genuinely will not fit, not because it sounds sophisticated.
::end of the curriculum
You're at Pilot level. There's no Level 6.
The next move is doing the work, not another lesson. If you want operator-grade infrastructure, that's /orangebox. If you want the lab's working journal, /founders-view. If you want to collaborate on the curriculum itself, the source is public on GitHub.
::other lessons at Operator level
Local AI · Ollama — privacy, offline, and the limit of free
At Operator level you need an honest opinion about local-only AI. Even if you don't use it daily, you should have run it once.
Model routing — switching between Claude, GPT, Gemini mid-task
Operators don't pick one AI. They route each task to the model that does it best. Knowing the strengths is the skill.
MCP servers — the plug socket that turned AI into a real tool
Model Context Protocol is the standard plug. Knowing what plugs in changes what your AI can actually touch — your files, your inbox, your calendar, your repos.
Agent mode — when AI takes action, not just answers
The frontier of useful AI is agents that DO things — browse, click, file, send. The actual skill is the safety pattern, not the magic.
Computer use — when AI takes the mouse and keyboard
Claude in Chrome, ChatGPT Atlas, computer-use beta — the frontier is AI that drives your browser like a human. Knowing the safety pattern is the actual skill.
What AI cannot replace — taste, judgment, relationships
The operators winning in 2026 are the ones who learned what AI is for and what is theirs. Knowing the line is more valuable than any prompt.
Agents 101: model plus tools plus loop
An agent is a model with tools running in a loop until done · know when you need one and when you don't.
MCP: structured tools for AI
Model Context Protocol is the USB-C of AI tooling · learn the shape before you wire anything.
Skill primers: teach a session your context in 30 seconds
A skill is a reusable file that primes a fresh AI session with your project, voice, and rules · stop re-explaining yourself.
Local models with Ollama
Run Llama, Qwen, or Mistral on your own laptop · no API, no logs, no monthly bill for the work that should stay home.
Vision models: when to use them
Vision lets the model see images · powerful for screenshots and diagrams · weak for precise spatial work · know the line.
Audio and Whisper transcription
Whisper turns audio into text · meetings, voice memos, interviews · the AI-era replacement for note-taking.
Embeddings: meaning as numbers
An embedding is a list of numbers that captures the meaning of text · learn the shape and you unlock semantic search, deduplication, and clustering.
Fine-tuning vs prompt engineering
For individuals, fine-tuning is almost never worth it · know exactly when it actually is.
AI safety in personal use
PII, NDAs, financial data, and other people's secrets · know the rules of what you do not paste.
Multimodal prompting: combining text, image, audio
The strongest prompts use the medium that fits the question · sometimes you describe, sometimes you show, sometimes you do both.
Chain-of-thought: making the model show its work
Asking the model to reason step-by-step before answering raises accuracy on hard problems · know when it earns its cost.
Tool use and structured output
Function calling makes the model return JSON your code can use · know the contract before you build on it.
Cost optimization: tokens, caching, model selection
AI is metered · the operators who stay profitable measure what they spend and choose the model that fits the task.
::part of the AtomEons /learn curriculum · 45 lessons · 5 levels · cc-by 4.0