I recently went to RLC and had the honor of listening to Richard Sutton’s presentation on the future of reinforcement learning and the OAK architecture for superintelligence. After the talk, I wrote him an email and continued to think about the nature of superintelligence.
If we start thinking about how to design superintelligence, we can start to think about designing something dynamic, generative, and beyond human cognitive limits. It should be able to form complex systems from simple beginnings and produce emergent properties and behaviors. Such a design must follow a kind of razor’s cut principle: it has to be simple enough to allow the degrees of freedom for the rules to play out. Simplicity is not a limitation here but the foundation for complexity to emerge.
From this perspective, I asked myself whether the reward for a superintelligent can truly be designed by humans, or whether it must come from the real world itself. If such a reward or goal existed, what might it be? I have long suspected that simple and fixed goal/reward might be all you need. My thinking comes from an analogy with biological evolution. If we speculate, the “reward” of evolution may be for genes to persist against time and entropy, with organisms as their hosts. The goal is simple, fixed, and unchanging. It does not need to be engineered or altered. From such a simple foundation, life has developed in all its complexity, and intelligence has emerged.
People often say that a human baby learns to walk in only a few years, while a reinforcement learning agent may take millions of steps. But human babies have not really learned to walk in just a few years. They are the product of millions of years of evolution and countless trials and errors. Seen in that light, perhaps reinforcement learning agents are learning remarkably quickly.
The same principle appears in the universe. The laws of physics can be extremely simple and remain fixed for unimaginably long periods, yet they give rise to worlds and life of immense complexity and diversity. If someone were to design a universe, the most effective approach would not be to create a complicated architecture, but to set the simplest possible laws and let trial and error, possibility, and uncertainty take their course. In fact, suppose we wanted to design a universe. There is even the possibility that our own universe is designed or simulated by others. You might find evidence for this idea. For example, why do we have the Planck constant? Could it be that this is the floating-point precision limit of the simulation, the finest resolution it can represent? Why is there a limit on the speed of light? Perhaps because the system cannot afford to calculate interactions between every pair of particles in the universe, so it imposes a constraint: no causal effect can propagate faster than light. Why did the universe have a Big Bang? That could be nothing more than an initial condition set at the start. The motives of such a higher civilization or species could be as simple as: “If I give this simulation an initial condition and a set of basic physical laws, what kind of world will it evolve into? Will it produce intelligent species like humans, and will those species create their own simulations to satisfy their curiosity?”
The idea is hardly new. Many people have thought along these lines, especially since the computer era began. Human beings are skilled at drawing analogies. During the Industrial Revolution, with the rise of mechanical devices, researchers in cognitive science and psychology often linked the human mind to a working machine. Once computers and programming entered our lives, more people began to consider the possibility that our world could itself be coded.
If we follow this analogy to reinforcement learning, we might conclude that the design of superintelligence should follow the rules of designing the universe. The laws should be simple, the reward should be fixed, and the system should be left to evolve. With enough computational resources, complexity and intelligence could emerge without explicit planning or engineered representations.
From this, one conclusion appears almost inevitable: the ultimate goal or reward for such a system should be simple, fixed, and free from human engineering. But what could that reward actually be? In both evolution and the universe, if there is a goal driving them, it would have to be singular and unchanging. Evolution moves toward such a goal through endless trial and error. The direction is steady, but the convergence is slow on a scale that feels impossibly long to humans. If the goal were to change, evolution cannot sustain a process of development over time.
Yet for all our searching, we may never answer the deeper question: what is the meaning of existence? If the universe has a purpose, what is it? And if a superintelligence were given only this ultimate reward, what form could it possibly taken? It might be as unreachable to our understanding as the supposed “goal” of the universe itself.
In OAK, Sutton also described the need for a stronger algorithm, a planning component to be executed runtime, and a better feature representation and engineering. It was at the mention of planning that I began to wonder whether, in the analogy of the universe and evolution, such a component was necessary at all.
Planning could arises as an emergent property of runtime processes rather than as something built into the initial design. Up until that point, everything in the talk aligned with my little “design a universe” analogy. Once planning entered, however, the analogy shifted. Planning fits most naturally with human cognition, and that raises two important questions. First, is human cognition truly the best model for designing superintelligence? What about the cognition of octopuses, which evolved through a completely different path. Second, if we use human cognition as the anchor, should we not also include other components that are equally fundamental?
If human cognition is the model, planning becomes easy to justify. Human intelligence is rooted in prediction. We are constantly making predictions about the world, testing them against feedback, and trying to improve them. This could be why children tend to be more curious. With an incomplete mental map of the world, they must constantly explore to refine their predictions. By adulthood, many people tend to “believe” (which means, not necessarily true) that their mental model is sufficient to handle uncertainty or non-stationary rewards, and their drive to explore decreases. From this point of view, an off-policy planning framework is entirely reasonable. But does a superintelligence have to follow the human blueprint? Octopus intelligence may not be what human aspire to, yet it demonstrates that intelligence can emerge outside the human cognitive framework.
If reinforcement learning is taken as an analogy for evolution and the birth of the universe, the planning component in OAK begins to look less essential. If it is included solely because the design draws on the human brain, especially in its approach to feature representation, then the architecture could arguably go further. Human, as social beings, have two extraordinary abilities: collective wisdom and an intersubjective reality. Knowledge is shared across generations and among peers, allowing new achievements to be built on the work of the past. No individual could have created the internet, artificial intelligence, or skyscrapers without this inheritance. Our technological progress is the product of a multi-agent system in which emergent properties arise that surpass the intelligence of any single agent. If OAK is to be modeled on human intelligence, then multi-agent interaction seems to me not just beneficial, but necessary.