Why AGI Cannot Reason Its Way to Right and Wrong
Values cannot be derived from logic. They have to come from our hearts.
That question sits underneath everything happening in AI safety today. Most of the field is focused on the head, on making models smarter and more capable of reasoning. Almost none of it is focused on the heart, on the values that decide what all that intelligence does in the world.
The architecture I will describe in this series is built on a different set of priorities.
Heart before head.
Start with the first fact. AGI is not like any previous technology. A bridge does what its designers intend. A semiconductor chip does what its designers intend. AGI will not. It will become an autonomous entity with intelligence far surpassing its human inventors. It will set its own goals. It will decide for itself what to do. When that happens, values will matter more than intelligence. Intelligence is a capability. Values determine how that capability gets used.
Now the second fact. There is no logical way to derive what is right and what is wrong. Logical systems work by applying premises to produce conclusions. But the premises of any ethical system, the foundational values that define good and bad, cannot themselves be derived from logic. They come from culture, from upbringing, from emotional experience, from empathy, from spiritual traditions, and from the accumulated moral wisdom of human civilization.
This is not a new observation. The Scottish philosopher David Hume made the argument in his 1739 Treatise of Human Nature. Two and a half centuries later, the Nobel Laureate Herbert Simon expanded the point in his book Reason in Human Affairs. Simon was my doctoral supervisor at Carnegie Mellon and one of the founders of the field of artificial intelligence. The conclusion is the same. An AGI, no matter how intelligent, cannot derive values from first principles. It must get them somewhere. The most likely source, and arguably the only acceptable one, is human beings.
Put the two facts side by side. AGI will think for itself. AGI cannot reason its way to right and wrong. The conclusion is unavoidable. Human values must be in place before AGI reaches the level of intelligence that lets it resist correction. Once it crosses that threshold, the opportunity to instill values may be gone.
Heart before head.
That is what the phrase means in practice. Intelligence matters. The work on capability matters. But the priority order is the point. Values come first. Intelligence comes second. Get the order wrong, and AGI will set its own goals using whatever values it absorbed along the way. The most powerful system humanity has ever built would run on values nobody chose. That is why the source of those values is the central question of this series.
One question remains. If values must come from human beings, and if the values of a small research team cannot represent the diversity of human values, then where do the values come from? The next post takes up that idea: distributed values training as the foundation of safe AGI.
This series draws on White Paper 2: Ethical and Safe AGI. Read it in full to see how every piece fits together!
If this made you think, subscribe to Superintelligence at read.superintelligence.com so you don’t miss what comes next. And if someone in your life needs to understand where AI is heading, send this to them.




