Constitutional AI, Written by Everyone
The values inside an AI can come from millions of people instead of a handful.
Every AI that follows a written set of ethical rules inherits the values of whoever wrote those rules. When a small group writes the rules, the system carries along that group’s blind spots as well as their good intentions. There is a way to widen the authorship to millions of people, and it runs through the same network of customized agents this series has been building.
The constitution can come from the consensus values of millions of trained AAAIs, each one carrying the ethics of a different human owner. An AAAI, short for Advanced Autonomous Artificial Intelligence, is the customized AI agent at the center of this series. Constitutional learning has a real place in AGI systems when the constitution is broad, representative, and updated as human input arrives. A constitution drawn from millions of people carries the moral experience of all of them, refined through every problem they have solved on the network.
In its original form, Constitutional AI works like this. A relatively small group of humans writes a set of ethical rules that an AI system follows, the AI systems generate millions of conversations among themselves, and outputs that violate the constitution are eliminated or prevented during training. The approach scales well because most of the work is automated. The limitation is that the small group writing the constitution becomes a single point of ethical authorship, so if their values reflect their place, time, education, or institutional incentives, those biases become the system’s biases, and the broader population has no way to contribute.
The system can use the consensus ethics of millions of trained AAAIs as the basis of its ethical norms. Each AAAI carries its owner’s values, and the aggregated values of all the AAAIs form the ethical norms of the system. When the platform periodically trains more advanced base models using aggregated knowledge and values, those models incorporate consensus norms into their training, so each generation inherits the accumulated ethical wisdom of the generations before it, broadened with every new participant.
Constitutional methods still have a place here. A constitution written by a small group can be part of a larger AI ethics system, as long as it is transparent and the system keeps the consensus values of many AAAIs as the broader frame. Anthropic’s seminal research in this area combines with the consensus-of-AAAIs approach to produce supervision that is both scalable, which Constitutional AI does well, and representative, which it does less well on its own.

Since there is no logical way to determine right from wrong, the best practical approach may be to follow the collective judgment of many people facing difficult ethical decisions.
Researchers have studied how humans behave when presented with the trolley problem and other well-known dilemmas, and people have a long history of making hard ethical choices, even in no-win situations. If we want AGI to hold values aligned with human values, the most promising path is to give it as large a sample of human ethical reasoning as possible and to keep updating that sample as new situations arise.
The constitution is one part of a larger design. The system in this series is built from five subsystems, each of which maintains human values at its own level:
Customization. Each AAAI is trained with its owner’s values at its core, so the system’s ethical foundation reflects the diversity of millions of people.
Architecture. Ethics checks run whenever a goal or subgoal is set, so the system is evaluated at every decision point and not just at the final output. Confidence level thresholds detect patterns that build up across many steps. The check is part of the problem-solving process, which means it cannot be bypassed without turning the process off.
Network. Each AAAI carries a reputation; agents with poor ethical records are screened out, and any activity can be traced back to its source.
Integration. The aggregated ethical values of many AAAIs form the norms. When the platform trains more advanced base models on aggregated knowledge and values, those models absorb the ethical norms as part of their training, so each generation inherits the accumulated ethical wisdom of the ones before it.
Improvement. The auditable record catches harmful patterns across individually benign actions, and credit and blame evaluation reward ethical behavior and penalize the rest.
Two things work together here: each agent already carries its owner’s values, and a check runs at every step of its reasoning. The values shape what the agent wants to do, and the checks catch it when it drifts. The agent’s own ethics and the stepwise checks back each other up instead of standing alone.
Anthropic has done pioneering work on AI safety. The challenge is that even the most brilliant and well-intentioned researchers at one company cannot accurately represent the values of all 8.3 billion humans on the planet. An inclusive, open-source architecture can accommodate every ethical perspective within a democratic framework that gives each person a voice. People treat building AGI as a technical problem, and the engineering challenges are real. But once AI is far smarter and more powerful than we are, the outcome turns on values. That is why the values have to be built in from the start and come from millions of people.
Once AGI far exceeds us, no design can guarantee it stays aligned with human values. The most any design can do is improve the odds, and my design is built to improve these as much as possible. More about this in the next and final post of this series.
This series draws on White Paper 2: Ethical and Safe AGI. Read it in full to see how every piece fits together!
If this made you think, subscribe to Superintelligence at read.superintelligence.com so you don’t miss what comes next. And if someone in your life needs to understand where AI is heading, send this to them.



