A black sphere, a black cube, and a black cylinder on dark slate — three modalities, one form factor.

Agentic AI · the atlas

What “agents” actually are.

The most overused word in AI in 2024-2026, and the most undertheorized. This page is the anti-hype version: foundational papers, the workflow-vs-agent distinction that finally clarifies the space, the commercial platforms actually shipping, the failure modes nobody markets, and what to watch in the next 12-24 months.

Public sources only. Vendor blogs, arXiv papers, conference talks, reputable journalism.

Foundational papers + posts

Seven references that built the field.

2022-10

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, Zhao, et al. (Princeton + Google)

The foundational paper for modern LLM agents. Established the Thought → Action → Observation loop.

2023-02

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick et al. (Meta AI)

Showed LLMs can learn to call APIs (calculator, search, translation) via in-context examples.

2023-04

Generative Agents: Interactive Simulacra of Human Behavior

Park et al. (Stanford + Google)

The 'Smallville' paper — 25 LLM-driven agents acting in a simulated town. Established agent-as-character research direction.

2023-06

Voyager: An Open-Ended Embodied Agent with Large Language Models

Wang et al. (NVIDIA)

Minecraft agent that builds its own skill library autonomously. Influential on agent-with-memory architectures.

2023-08

AutoGPT public release

Toran Bruce Richards (Significant Gravitas)

Not a peer-reviewed paper but the first viral consumer-facing agent. Spawned the agent-craze of 2023-2024.

2024-03

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Jimenez, Yang, et al. (Princeton + Cognition Labs)

Established the canonical agent benchmark for software engineering. SWE-bench Verified is the active leaderboard.

2024-10

Building effective AI agents (Anthropic blog post)

Erik Schluntz + Barry Zhang (Anthropic)

Hugely influential practitioner write-up distinguishing 'workflows' (predefined orchestration) from 'agents' (LLM-directed loops). Required reading.

What's actually shipping

Eight commercial agent platforms.

Claude Code

Anthropic's terminal coding agent. Reads codebases, edits files, runs tests, follows multi-step instructions. The agent the operator uses to build atomeons.com itself. Powers many of the workflows referenced across this site.

Cursor

VS Code fork with deeply integrated LLM agents. Composer mode (multi-file edits with planning) is a leading agentic coding pattern. Public valuation $9B+ as of mid-2025.

Devin (Cognition Labs)

Marketed as 'the first AI software engineer.' Long-running session model with browser + shell + editor access. Cognition publicly raised at $4B+ in 2024. SWE-bench Verified leaderboard regular.

OpenAI Operator

Browser-based agent that takes over your Chrome tab to complete web tasks (booking, shopping, research). Launched January 2025 as ChatGPT Pro feature. Required computer-use multimodal model.

Anthropic Computer Use

Claude 3.5 Sonnet API capability allowing screenshot + click + type → drives any GUI application. Released October 2024. Foundational for desktop-automation agents.

Manus

Chinese-origin general agent platform that went viral early 2025. Multi-tool browsing + computation + asynchronous task completion model.

Replit Agent

Build-from-prompt full-stack apps in Replit's IDE. Strong code-gen pipeline with embedded deployment. Targets non-developers learning to ship.

Salesforce Agentforce

Enterprise CRM agent platform announced 2024. Per-conversation pricing model. Major enterprise distribution channel for agentic AI.

Architectural patterns

Five tradeoffs every agent system makes.

Workflow vs Agent

The most useful distinction in 2026, popularized by Anthropic's October 2024 post. A workflow has predefined orchestration — you (the human) wrote the control flow, the LLM fills in the slots. An agent is LLM-directed — the model decides at runtime what tool to call, how to interpret results, when to stop. Workflows are easier to debug, cheaper, more reliable. Agents have more raw capability ceiling. Most production 'AI agents' are actually workflows with a single LLM-loop step inside.

Single-loop vs orchestrator-worker

Single-loop agents are one LLM call that iterates with a tool-use loop until done. Orchestrator-worker patterns have one 'planner' LLM that dispatches subtasks to other LLM 'workers' (sometimes specialized, sometimes identical). Anthropic's workflow Tool exemplifies orchestrator-worker — a JS script (the orchestrator) deterministically routes work to multiple parallel subagents.

Tool use vs computer use

Tool use means the LLM has a structured catalog of functions it can call (web search, code interpreter, database query). Tools are well-typed inputs + outputs. Computer use means the LLM has unstructured GUI access — it sees screenshots, decides where to click, types into fields. Computer use is more general but much more error-prone. The frontier in 2026: hybrid systems that use structured tools when available and fall back to computer use only when necessary.

Context window vs memory

Context window is the model's working memory — typically 200k-2M tokens in 2026 frontier models. Memory is persistent state that survives across context-window resets — vector databases (Pinecone, Weaviate, Turbopuffer), structured stores (Mem0, Letta née MemGPT), or filesystem (the simplest and most underused pattern). All non-trivial agents need both.

Determinism vs autonomy

The deepest tradeoff. More deterministic agents (workflows) are easier to verify and cheaper to run; less deterministic agents (free-roaming LLM loops) can solve more open-ended problems but require more guardrails. Production-quality agent systems lean heavily on determinism for the parts that can be deterministic, and reserve LLM autonomy only for the steps that genuinely require it.

Failure modes

Six ways agent systems break.

01
Infinite loops. LLMs get stuck repeating the same approach without recognizing it's not working. Hard cap on loop iterations is required.
02
Tool-call hallucination. LLM invents parameters or calls non-existent functions. Schema-validated tool use mitigates this; the safest production pattern.
03
Context window collapse. Long-running agents accumulate so much history that the actual task gets lost in the middle. Aggressive summarization is the workaround.
04
Prompt injection from tool output. Web pages, emails, file contents the agent reads can contain attacker-supplied instructions. The agent has no inherent way to distinguish 'system prompt' from 'data.' Massive open problem.
05
Over-trust by users. Agents producing confident-sounding output trick humans into skipping verification. Worst in domains where the human can't easily check (medical, legal, financial).
06
Catastrophic action with no rollback. Agent deletes a file, sends an email, places an order. Constraining action surface is the practical mitigation.

What to watch · next 12-24 months

Five questions the field is asking.

01
Multi-agent systems where coordination cost stays below the value gain. Most multi-agent demos fail this test.
02
Agentic evals beyond SWE-bench. The field is hungry for richer agent benchmarks.
03
Persistent-memory agents that genuinely learn across sessions. Most current 'memory' systems are simple retrieval.
04
Safety in computer use. Operator and Anthropic's computer-use APIs ship without robust guardrails; this is the area where catastrophic mistakes will scale.
05
Cost-per-task economics. Many 2024-2025 'agent' demos hide $5-$50 per task LLM costs. Sustainable agent products need this under $1 per task for most use cases.

Atlas · AI safety →Cyber · AI security →← atlas index