L33 · Operator~30 min · free · cc-by 4.0

Local models with Ollama

Run Llama, Qwen, or Mistral on your own laptop · no API, no logs, no monthly bill for the work that should stay home.

::TL;DR · the whole lesson in three lines

MOVERun Llama, Qwen, or Mistral on your own laptop · no API, no logs, no monthly bill for the work that should stay home.
DRILLYou will install Ollama, pull one model, and run it on a real privacy-sensitive task you would otherwise have sent to the cloud.
WINOllama is running on your machine and a model file is downloaded.

jump to drill ↓or read the full concept first →

::concept · what's actually happening

Ollama is a runtime that lets you download an open-weight model and run it locally with one command. You get a chat or API surface on localhost, your data never leaves the machine, and the model file lives in a folder you own. That is the whole pitch.

read full concept · 4 more paragraphs →

Local models trade capability for sovereignty · a 7B-parameter model running on your laptop will not match GPT-5 or Claude on hard tasks. It will however match them on the simple, repetitive 80% of work where the cost of round-tripping to a cloud is the actual bottleneck.

The hardware math matters: a 7B model needs roughly 5-8GB of RAM, a 13B needs 12-16GB, a 70B needs 40GB+ or aggressive quantization. Apple Silicon with unified memory punches above its weight here. Modest hardware is fine for modest models.

Privacy-critical work has an obvious home here · medical questions, legal drafts, NDA-covered code, journal entries, anything you would not want logged to a cloud provider. The legal blast radius of 'my therapist's chat became training data' is too high for hosted models on sensitive content.

The honest limitation: open-weight models in 2026 are still behind frontier closed-weight models on hard reasoning, long-context coherence, and agentic tool use. The gap has narrowed but it has not closed. Use local for what local is good at.

::drill · do the thing

You will install Ollama, pull one model, and run it on a real privacy-sensitive task you would otherwise have sent to the cloud.

::L33 drill · copy-paste into any AI chat

Walk me through installing Ollama on [YOUR OS · macOS / Windows / Linux] and pulling one mid-tier model suitable for my hardware: I have [RAM AMOUNT] of RAM and [APPLE SILICON / NVIDIA GPU / CPU ONLY]. Recommend one specific model name to start with (pin the version), give me the exact pull command, and the exact run command to start a chat. Then give me one privacy-sensitive prompt I should try first to feel the difference · something I would not want logged to a cloud API. Skip the marketing about open source · just the install steps and the first real use.

Walk me through installing Ollama on [YOUR OS · macOS / Windows / Linux] and pulling one mid-tier model suitable for my hardware: I have [RAM AMOUNT] of RAM and [APPLE SILICON / NVIDIA GPU / CPU ONLY]. Recommend one specific model name to start with (pin the version), give me the exact pull command, and the exact run command to start a chat. Then give me one privacy-sensitive prompt I should try first to feel the difference · something I would not want logged to a cloud API. Skip the marketing about open source · just the install steps and the first real use.

::or open one in a new tab — then paste

Claude↗ChatGPT↗Gemini↗

::steps

01Run the prompt to get install steps tailored to your machine.
02Install Ollama and pull the recommended model (this takes 5-15 min on broadband).
03Run the model with `ollama run <model>` and confirm you get a prompt.
04Try the suggested privacy-sensitive task · feel the latency and quality.
05Compare the same prompt to a cloud model to calibrate the gap.
06Decide which of your recurring tasks will move local.

::outcome · what should be true

Ollama is running on your machine and a model file is downloaded.
You have completed one real task end-to-end with the local model.
You can articulate the capability gap versus the cloud honestly.
You have a written list of which tasks you will route local going forward.

::trap · the most common failure

Operators install Ollama, run 'hello world,' get bored, and never use it again. The win only shows up when you route a real task through it · usually a privacy-critical one. Without that first real use, local stays a demo.

::end of the curriculum

You're at Pilot level. There's no Level 6.

The next move is doing the work, not another lesson. If you want operator-grade infrastructure, that's /orangebox. If you want the lab's working journal, /founders-view. If you want to collaborate on the curriculum itself, the source is public on GitHub.

::other lessons at Operator level

L10~30 min

← back to /learn full lesson library →

Local models with Ollama

You're at Pilot level. There's no Level 6.

Local AI · Ollama — privacy, offline, and the limit of free

Model routing — switching between Claude, GPT, Gemini mid-task

MCP servers — the plug socket that turned AI into a real tool

Agent mode — when AI takes action, not just answers

Computer use — when AI takes the mouse and keyboard

What AI cannot replace — taste, judgment, relationships

Agents 101: model plus tools plus loop

MCP: structured tools for AI

Skill primers: teach a session your context in 30 seconds

Vision models: when to use them

Audio and Whisper transcription

RAG vs long context: when to retrieve, when to dump

Embeddings: meaning as numbers

Fine-tuning vs prompt engineering

AI safety in personal use

Multimodal prompting: combining text, image, audio

Chain-of-thought: making the model show its work

Tool use and structured output

Cost optimization: tokens, caching, model selection