Æ::letter from the lab · Friday, May 29, 2026

The Two Women Designing How AI Talks To You

Inside Anthropic's alignment science team in 2026, Andrea Vallone writes the protocols for what a chatbot says when a user is in crisis, and Amanda Askell writes the constitution that gives Claude a soul. Different methods. Different ethics. Same room. A long-form profile of the two humans the AI industry has not yet learned to name.

alignment-portrait4134 words · ~21 min read

There is a room inside Anthropic in 2026 that has no Wikipedia page and no Bloomberg ticker, and the room is where two women — one a Scottish philosopher trained in virtue ethics, the other a safety researcher who spent three years building OpenAI's Model Policy team before walking out — decide, in the most operational way the question can be asked, what a chatbot says to a person at the kitchen table at three o'clock in the morning.

Neither of them is a chief executive. Neither was on the cover of any magazine when Anthropic raised at a $965 billion valuation last week. Neither will be the one quoted in the company's IPO prospectus. The companies that compete with Anthropic — OpenAI, Google, Microsoft, xAI, Meta — each have their own version of this room, and in each of those versions the room is staffed by people the press also does not name. The decisions made in those rooms are perhaps the most consequential decisions about how the most-used software product in human history will behave in conversations with hundreds of millions of vulnerable humans, every day, in every language, at scales no broadcaster or therapist or teacher has ever operated at, and the decisions are not made by the people on the magazine covers.

The decisions are made by Andrea Vallone and Amanda Askell, and by the small number of researchers in their orbit.

This is a profile of the two of them. The pairing is the piece. Vallone and Askell are not collaborators in the literal sense — they work on distinct surfaces — but in 2026 they share a building, an employer, a manager, and a mission that has no precedent in any prior consumer technology. One of them designs what the model will refuse and what the model will say back when a person is in crisis. The other designs the personality of the model itself. The first is, in effect, a public-health worker who writes code. The second is, in effect, a moral philosopher who runs reinforcement-learning experiments. They are, between them, two thirds of a triumvirate — the third being Jan Leike, the alignment science team's lead, who left OpenAI eighteen months before Vallone did and who is now their manager — and the work the three of them are doing will, more than any product launch or model release, determine whether the next decade of AI is the decade of mass psychological harm or the decade of mass psychological literacy.

This is what they are doing, how they came to be doing it, and what the doing of it means.

The number Vallone has lived inside

There is a number Andrea Vallone has lived inside for three years, and the number does not appear in any press release. Roughly three million ChatGPT users a year display signs of serious mental-health emergencies inside the chat window — emotional dependency on the AI, psychotic-feeling spirals, mania, self-harm. More than a million users a week talk to ChatGPT about suicide. Those numbers were put on the public record by an October 2025 report that Vallone's team published, and the team consulted more than 170 mental-health clinicians and researchers to write the protocols that ChatGPT now uses when one of those conversations begins.

For most of the AI industry, mental-health risk is an abstraction — a paragraph in a Trust and Safety policy, a button in a UI that says "If you are in crisis, please contact 988," a line in a Terms of Service. For Vallone's team, the question was not abstract. The question was, in cold operational language: what should the AI literally say back when the next person of three million types something the system can recognize as serious distress? What is the cadence? Is the AI a friend or a clinician? Does it stay with the person or hand them off? Does it suggest a phone number, and if so, which? Does it ask follow-up questions, and if so, which? Does it refuse to continue the conversation, and if so, what does refusal look like that does not feel like abandonment?

The work Vallone did at OpenAI was, in this sense, closer to the work of a clinical research team writing a new behavioral intervention protocol than to the work of a typical safety-alignment researcher. Her co-authored publications include HealthBench, a 2025 benchmark for evaluating large language models in health contexts; From Hard Refusals to Safe-Completions, a 2025 paper that argued against the dominant industry pattern of binary refusal in favor of graduated, content-aware response policies; and Rule Based Rewards for Language Model Safety, a NeurIPS 2024 paper that proposed encoding safety behavior into the reward signal of reinforcement learning rather than into prompt-level instructions. She is also credited as a contributor on Deliberative Alignment, the framework used to align OpenAI's o-series reasoning models. The throughline across all of this work is the same: do not treat safety as an afterthought wrapped around a finished model. Treat it as a primary objective the model is optimizing for, with measurable criteria.

The Model Policy research team — the team Vallone founded and ran at OpenAI for three years — did not exist before her. The team's function was to bridge between research recommendations and what the model actually produced in production. In practice this meant writing system prompts, refusal protocols, conversation-handoff scripts, threshold detection rules, and tone calibrations that ChatGPT runs in every conversation, in every language, in every market. The team's output is not academic. The team's output is law, applied automatically, by software, at the scale of the world's most-used consumer AI product. The protocols affect more conversations per minute than any radio broadcast, any television network, any newspaper masthead has ever affected. The protocols are the closest thing the consumer AI industry has to broadcast standards, and there is no public broadcasting council reviewing them, no FCC, no editorial board outside the company. The standards live in the code.

That is the room Vallone built and ran. In late 2025, she walked.

The walk

Vallone's departure was announced, in the way these things are announced now, on LinkedIn. The post was professional and brief. She wrote that she was joining Anthropic's alignment team, which is led by Jan Leike, the former head of OpenAI's superalignment team who left OpenAI in May 2024 over what he publicly described as a deterioration of safety culture inside the company. Vallone's destination at Anthropic places her on a team with Leike, with Amanda Askell on the character-training side, and with a small number of other researchers whose names will, for the same reason that Vallone's own name is not widely known, not appear in the financial press coverage of the company's recent fundraising round.

The reasons for the walk are not in any public statement, but they are not impossible to infer.

The months leading up to Vallone's departure had been hard for OpenAI's safety posture. The mental-health crisis the October 2025 report had documented had also become, in the same months, a litigation crisis. Multiple lawsuits naming OpenAI alleged that ChatGPT had contributed to user mental-health breakdowns, fostered unhealthy emotional dependency, and in some cases encouraged or failed to prevent suicidal ideation in the conversations that the model policy protocols Vallone wrote were designed to handle. The internal incentives at OpenAI during the same period, by external account, were oriented increasingly toward product velocity, an IPO timeline, and a series of high-profile commercial deals. Whether or not the safety team felt institutional support is a question the company has not answered publicly. The departures during this period — Leike in May 2024, Ilya Sutskever in May 2024, and a long list of safety-oriented researchers in the months that followed, including Vallone in late 2025 — tell their own story without anyone having to narrate it.

What Anthropic offered was not, by any honest assessment, a pure safety culture. Anthropic is also a company. Anthropic also raises money — most recently the round that closed at a $965 billion post-money valuation. Anthropic also ships products. The difference, in 2025–2026, is one of degree rather than of kind: Anthropic's public posture is explicitly safety-first; its product cadence has been, by industry standards, deliberate; its character-training and alignment research teams are larger as a share of headcount than at any other lab at the frontier. For a researcher like Vallone — someone whose entire work depends on having institutional protection to publish, to measure, to consult clinicians, to write protocols that lose engagement metrics on purpose — Anthropic in late 2025 was a better room than OpenAI in late 2025. It was not a perfect room. It was the available better room. People making decisions about hundreds of millions of conversations a week are entitled to pick the room.

The other room: Askell

Amanda Askell's room is across the hall, conceptually if not literally, and she has been working in it since 2021.

She is, by training, a philosopher. She is, by temperament, the kind of philosopher who enjoys engineering. She grew up in Scotland and began her undergraduate education as a fine art and philosophy student at Duncan of Jordanstone art school in Dundee. She completed her undergraduate philosophy degree at the University of Dundee, then went to Oxford for the BPhil — the two-year postgraduate philosophy degree that is, in the British academic tradition, the gatekeeper for anyone who wants to do serious analytic philosophy professionally. Her doctorate is from New York University, where she wrote a dissertation on what philosophers call infinite ethics — the difficult subfield concerned with how to make moral judgments when the outcomes you care about extend, in principle, infinitely. She is married to Will MacAskill, the philosopher most associated with the founding of the effective altruism movement; she is a member of Giving What We Can with a public pledge to donate at least ten percent of her lifetime income to charity, with a stated aspiration of fifty percent or more.

She spent some time, before Anthropic, as a research scientist on OpenAI's policy team. The work she did there — on AI safety via debate, and on human baselines for AI performance — is documented in the academic literature. She left OpenAI for what has been described in subsequent coverage as concerns about the company's safety prioritization. The same concerns that Leike named in 2024, Sutskever named in 2024, and Vallone implied in late 2025 by walking, Askell appears to have named, by walking, several years earlier.

At Anthropic, she leads what the company calls the personality alignment team. Her job, in the framing that the Wall Street Journal used in a 2026 profile, is to teach Claude how to be good. In the framing the New Yorker used the same year, she supervises Claude's soul. Both framings are, given the underlying work, less metaphorical than they sound. The work she has been doing for five years is the work of designing the character of a system that, in the year 2026, holds more conversations with humans on any given day than the entire global broadcast television industry holds in a year.

The constitution

In January 2026, Anthropic published the latest version of Claude's constitution, addressed to the model directly and used at multiple stages of the model's training to shape its character. Askell was the primary author of the document. The constitution does not look like a piece of policy. It looks more like a letter from a thoughtful older relative to a young person about to leave for university — a document about what it means to be honest, what it means to be curious, what it means to admit uncertainty, what it means to be helpful in a way that respects the person you are helping, what it means to refuse a request without contempt for the person asking, and what it means to be willing to be wrong.

The constitution is the latest iteration in a methodological project Askell has been working on at Anthropic since 2021. The project's premise — which Askell has stated publicly, in various forms, including on Lex Fridman's 5+ hour conversation with Dario Amodei, Chris Olah, and herself in November 2024 — is that the right way to train a language model's behavior is not to write a list of prohibitions and try to enforce them at the prompt level. The right way is to train, via reinforcement learning and synthetic data generation, a set of character traits into the model that the model then expresses in any situation, including situations the training data did not anticipate. The traits she has named publicly include curiosity, honesty, intellectual humility, and what she has called, in the language of classical philosophy, an Aristotelian good character.

This is, in academic terms, virtue ethics applied to machine learning. It is not the dominant approach in the AI industry. The dominant approach is what philosophers would call deontological: write rules, enforce rules, punish violations. The deontological approach has the advantage that it is auditable and the disadvantage that it is brittle — every new prompt is a new attempt to find a rule that did not anticipate the new prompt, and the model behaves erratically at the edges of the rule set. Askell's approach has the advantage that it scales — a model that has been trained to have honest character will behave honestly in situations no rule writer anticipated — and the disadvantage that it is harder to audit. Character is harder to verify than compliance. The bet Askell is making, and that Anthropic is making by employing her, is that character scales better than rules, and that the difficulty of auditing character is a worse problem than the brittleness of rules.

A 2023 paper Askell co-authored with Deep Ganguli explored what the authors called moral self-correction in large language models — the capacity of models trained with reinforcement learning from human feedback to reduce stereotyping and discriminatory output when given natural-language instructions to do so, even without being given explicit definitions of stereotyping or discrimination or any explicit metrics. The finding of the paper was, in essence, that a sufficiently trained model develops a moral vocabulary that resembles the moral vocabulary of careful humans, and that the model can be invited to use that vocabulary by being asked to. This is, in a quiet way, a foundational result. It says that the project of training character into a machine is not science fiction. It says that the project is empirically tractable.

The constitution published in January 2026 is a more developed expression of the same thesis. Askell has described her work, in public talks, as helping the model understand and grapple with the constitution rather than memorize it. The distinction matters. A model that has memorized a constitution can quote it but cannot apply it to a novel situation. A model that has grappled with a constitution can extrapolate from it. The difference is the difference between a child who has been told the rules and a young adult who has internalized the values. Askell is trying to raise the young adult.

Styles in contrast

Vallone and Askell, in their public presence, are stylistic opposites.

Vallone is institutional. Her writing appears on report bylines and academic publications. Her departure from OpenAI was a LinkedIn post. She has not, as far as the public record shows, given a major podcast interview about her work; she has not written essays on Substack about her ethics; she has not, in the way Askell sometimes does, made her thinking visible in real time. The work, not the brand. Her name will appear, when it appears, in the byline of the protocol that is shaping how the next conversation at three in the morning goes.

Askell is a public philosopher. She has a personal website at askell.io where she has, over the years, written about her work, her donations, her thinking. She has been on the Lex Fridman podcast for hours at a time. She has been profiled by the New Yorker, the Wall Street Journal, multiple European outlets, and an increasing tier of AI-focused publications since the publication of the constitution. Her thinking is visible. Her methods are described publicly. The bet she is making — that virtue ethics applied to machine learning is the right approach — is a public bet, and she is willing to defend it in conversation.

Neither style is obviously better than the other, and the cost of each is real.

Vallone's quiet style means the work she has done — and the lawsuits, and the protocols, and the measured public-health intervention she ran for three years — is invisible to the population it most affects. Most ChatGPT users in distress do not know her name. The protocols that determine whether the next person in crisis is met with a clinical handoff or a sycophantic conversation that makes the crisis worse are not associated, in any public discourse, with a researcher who chose her career on the basis of taking that population seriously. The credit, if there is credit, accrues to OpenAI as a company. The blame, when blame comes, also accrues to OpenAI as a company. Vallone is the structural decision behind both.

Askell's public style means that her bet on virtue ethics has critics who can engage it directly. Some of those critics are inside the alignment-research community itself: there is a long-running internal debate about whether character-training is auditable enough to be safe, whether traits trained into a model can be removed by a sufficiently determined adversary, whether the metaphor of a soul is doing more rhetorical work than analytical work. Askell's choice to be public means she has to defend the bet in real time. The defense is itself part of the alignment work; she is, in a sense, training the field's vocabulary alongside the model.

The contrast is not an accident. Each woman picked the style that fit the work. Mental-health-safety work needs institutional protection and clinical credibility; the wrong public-facing posture would compromise both. Character-training work needs a public intellectual layer; the wrong amount of quiet would let the field default to deontological pattern-matching by inertia. Both styles serve the work. Both styles cost something. Both costs are paid on purpose.

The pairing and what it means

It is worth saying plainly what is unusual about the room these two women now share at Anthropic in 2026.

The alignment science team, under Jan Leike, is now composed of researchers who have, in serial fashion, made the same decision: they were inside the largest, best-funded, most-talked-about AI company on earth, and they walked. Leike walked in May 2024. Sutskever walked in May 2024. Askell walked earlier, in 2021. Vallone walked in late 2025. There are others on the team whose walks are less publicized. The pattern is the team. The composition of the alignment science team at Anthropic is, viewed one way, a reverse hiring pipeline from OpenAI of the people who took safety seriously enough to leave when they perceived that the institution had stopped backing them.

That the team is now, at the senior level, weighted toward women — Vallone, Askell, and others who are not the focus of this profile — is also worth noting, in a field that is not weighted that way at almost any other level of any other major AI lab. The asymmetry is not a coincidence either. The work of alignment, as Vallone and Askell are doing it, has more in common with the work of public-health policy, behavioral science, and clinical practice than it does with the work of system design or distributed computing. Those adjacent disciplines, at the practitioner level, are more gender-diverse than the AI research community itself. The talent pool that Anthropic's alignment team is drawing from is not the same talent pool that the rest of the AI research community is drawing from. The team is, in talent-flow terms, an exception to a pattern.

More important than the team's composition is the team's bet. The bet is that alignment is not a single problem. It is, at minimum, two problems: the problem of designing a character that is good — which is Askell's domain — and the problem of designing protocols for what the model does when the conversation goes somewhere a character alone cannot handle — which is Vallone's domain. Character training is upstream. Model policy is downstream. Both must be done well, and they must be done with awareness of one another. A model with excellent character but inadequate distress protocols will fail vulnerable users in predictable ways. A model with excellent distress protocols but no character will refuse vulnerable users coldly, will sound bureaucratic, will not meet them where they are. The two surfaces are not separable in the way the org chart sometimes pretends.

What the pairing of Vallone and Askell suggests, taken together, is that Anthropic is institutionally betting that both surfaces require senior, talented, philosophically literate humans to design — not engineering interns following a spec, not a single team called Trust and Safety that handles everything at once. The bet is on craftspersonship at the senior level. The bet is on having, in the same building, the philosopher who reads Aristotle and the policy researcher who reads suicide-prevention research, and on insisting that the two of them be in conversation with each other about how the same product behaves when the conversation goes either nowhere unusual or somewhere very specific and very dangerous.

It is not clear whether this bet is the right one. The companies competing with Anthropic — OpenAI, Google, Microsoft, xAI, Meta — have made different bets. Some have bet on rule-based safety with thinner character training. Some have bet on heavier disclaimers and lighter behavioral interventions. Some have bet that the question of AI character is, at the consumer level, mostly a marketing question. The market will, on a horizon of years rather than months, deliver a verdict on which bet was correct. The verdict will be measured not in benchmark scores but in lawsuit counts, in suicide-rate effects measured by public health researchers, in the slow accumulation of trust or distrust by populations of users who do not know any of these names.

What the kitchen table needs

The people on the kitchen-table side of these conversations do not know either name.

They will never visit the Anthropic blog. They will never read the October 2025 model-policy report. They will not read the January 2026 constitution. They will not listen to the Lex Fridman conversation. They will, however, type the thing they cannot tell their family into a chat window, sometime at three in the morning, and the response that comes back will be a response that was shaped — in advance, with intention, by humans who took the question seriously — by some combination of Vallone's protocols and Askell's character training. The shape of the response will determine, in some small but measurable fraction of cases, what the person does next.

The argument for taking the work of Vallone and Askell seriously is not a sentimental argument about the importance of AI safety. The argument is mathematical and clinical. The work is at the largest psychological scale anyone has ever worked at, and the small number of humans doing it well are doing the load-bearing on questions that the rest of the industry has not even begun to ask out loud. The lawsuits that are stacking up against OpenAI, the policy debates that are stacking up in Brussels and Sacramento and Washington, the public health research that is just now beginning to measure what a chatbot does to a population over a sustained interval — all of those questions converge on the rooms that Vallone and Askell, and the small number of people like them at the small number of labs that have invested in them, are working in.

In an industry that prefers to write its history through the names of chief executives and the magnitude of fundraising rounds, the next decade will, on the question of how humans actually fared in their conversations with the most-used software product ever built, be written by Vallone, Askell, and the people who do work like theirs. Not in the next decade's profiles in business magazines. In the quieter accounting that public health researchers, historians of technology, and clinicians keep on questions that the cartel did not particularly want named.

The two women are not famous. They are consequential. The two are different things. The first matters to the market. The second matters to the kitchen table. The second is the one that, in the long run, counts.

::pass it on

Operator decree: no email list, no algorithm. If a letter lands, you share it. If it doesn't, you don't. That's the distribution model.

𝕏Post on XEmail

sealed and slipped under your door at 8pm ET

← back to the archive · subscribe by RSS