2022-10
ReAct: Synergizing Reasoning and Acting in Language Models
Yao, Zhao, et al. (Princeton + Google)
The foundational paper for modern LLM agents. Established the Thought → Action → Observation loop.

Agentic AI · the atlas
The most overused word in AI in 2024-2026, and the most undertheorized. This page is the anti-hype version: foundational papers, the workflow-vs-agent distinction that finally clarifies the space, the commercial platforms actually shipping, the failure modes nobody markets, and what to watch in the next 12-24 months.
Public sources only. Vendor blogs, arXiv papers, conference talks, reputable journalism.
Foundational papers + posts
2022-10
Yao, Zhao, et al. (Princeton + Google)
The foundational paper for modern LLM agents. Established the Thought → Action → Observation loop.
2023-02
Schick et al. (Meta AI)
Showed LLMs can learn to call APIs (calculator, search, translation) via in-context examples.
2023-04
Park et al. (Stanford + Google)
The 'Smallville' paper — 25 LLM-driven agents acting in a simulated town. Established agent-as-character research direction.
2023-06
Wang et al. (NVIDIA)
Minecraft agent that builds its own skill library autonomously. Influential on agent-with-memory architectures.
2023-08
Toran Bruce Richards (Significant Gravitas)
Not a peer-reviewed paper but the first viral consumer-facing agent. Spawned the agent-craze of 2023-2024.
2024-03
Jimenez, Yang, et al. (Princeton + Cognition Labs)
Established the canonical agent benchmark for software engineering. SWE-bench Verified is the active leaderboard.
2024-10
Erik Schluntz + Barry Zhang (Anthropic)
Hugely influential practitioner write-up distinguishing 'workflows' (predefined orchestration) from 'agents' (LLM-directed loops). Required reading.
What's actually shipping
Anthropic's terminal coding agent. Reads codebases, edits files, runs tests, follows multi-step instructions. The agent the operator uses to build atomeons.com itself. Powers many of the workflows referenced across this site.
VS Code fork with deeply integrated LLM agents. Composer mode (multi-file edits with planning) is a leading agentic coding pattern. Public valuation $9B+ as of mid-2025.
Marketed as 'the first AI software engineer.' Long-running session model with browser + shell + editor access. Cognition publicly raised at $4B+ in 2024. SWE-bench Verified leaderboard regular.
Browser-based agent that takes over your Chrome tab to complete web tasks (booking, shopping, research). Launched January 2025 as ChatGPT Pro feature. Required computer-use multimodal model.
Claude 3.5 Sonnet API capability allowing screenshot + click + type → drives any GUI application. Released October 2024. Foundational for desktop-automation agents.
Chinese-origin general agent platform that went viral early 2025. Multi-tool browsing + computation + asynchronous task completion model.
Build-from-prompt full-stack apps in Replit's IDE. Strong code-gen pipeline with embedded deployment. Targets non-developers learning to ship.
Enterprise CRM agent platform announced 2024. Per-conversation pricing model. Major enterprise distribution channel for agentic AI.
Architectural patterns
The most useful distinction in 2026, popularized by Anthropic's October 2024 post. A workflow has predefined orchestration — you (the human) wrote the control flow, the LLM fills in the slots. An agent is LLM-directed — the model decides at runtime what tool to call, how to interpret results, when to stop. Workflows are easier to debug, cheaper, more reliable. Agents have more raw capability ceiling. Most production 'AI agents' are actually workflows with a single LLM-loop step inside.
Single-loop agents are one LLM call that iterates with a tool-use loop until done. Orchestrator-worker patterns have one 'planner' LLM that dispatches subtasks to other LLM 'workers' (sometimes specialized, sometimes identical). Anthropic's workflow Tool exemplifies orchestrator-worker — a JS script (the orchestrator) deterministically routes work to multiple parallel subagents.
Tool use means the LLM has a structured catalog of functions it can call (web search, code interpreter, database query). Tools are well-typed inputs + outputs. Computer use means the LLM has unstructured GUI access — it sees screenshots, decides where to click, types into fields. Computer use is more general but much more error-prone. The frontier in 2026: hybrid systems that use structured tools when available and fall back to computer use only when necessary.
Context window is the model's working memory — typically 200k-2M tokens in 2026 frontier models. Memory is persistent state that survives across context-window resets — vector databases (Pinecone, Weaviate, Turbopuffer), structured stores (Mem0, Letta née MemGPT), or filesystem (the simplest and most underused pattern). All non-trivial agents need both.
The deepest tradeoff. More deterministic agents (workflows) are easier to verify and cheaper to run; less deterministic agents (free-roaming LLM loops) can solve more open-ended problems but require more guardrails. Production-quality agent systems lean heavily on determinism for the parts that can be deterministic, and reserve LLM autonomy only for the steps that genuinely require it.
Failure modes
Infinite loops. LLMs get stuck repeating the same approach without recognizing it's not working. Hard cap on loop iterations is required.
Tool-call hallucination. LLM invents parameters or calls non-existent functions. Schema-validated tool use mitigates this; the safest production pattern.
Context window collapse. Long-running agents accumulate so much history that the actual task gets lost in the middle. Aggressive summarization is the workaround.
Prompt injection from tool output. Web pages, emails, file contents the agent reads can contain attacker-supplied instructions. The agent has no inherent way to distinguish 'system prompt' from 'data.' Massive open problem.
Over-trust by users. Agents producing confident-sounding output trick humans into skipping verification. Worst in domains where the human can't easily check (medical, legal, financial).
Catastrophic action with no rollback. Agent deletes a file, sends an email, places an order. Constraining action surface is the practical mitigation.
What to watch · next 12-24 months
Multi-agent systems where coordination cost stays below the value gain. Most multi-agent demos fail this test.
Agentic evals beyond SWE-bench. The field is hungry for richer agent benchmarks.
Persistent-memory agents that genuinely learn across sessions. Most current 'memory' systems are simple retrieval.
Safety in computer use. Operator and Anthropic's computer-use APIs ship without robust guardrails; this is the area where catastrophic mistakes will scale.
Cost-per-task economics. Many 2024-2025 'agent' demos hide $5-$50 per task LLM costs. Sustainable agent products need this under $1 per task for most use cases.