Natural Ethics: Empathic and Social Intelligence

Building Empathic and Ethical Machines, and Adaptive Human-Machine Teams. - Part 3 of an Introductory Series

Jun 09, 2022

The second great alignment challenge, in our view, is building AI agents that can turn their emerging emotions and logic, and their improving self- and other models, into empathy for others, and from there into a set of adaptive collective ethics and values, useful for self-policing their own behavior, and aligning their goals and actions with the adaptive—but not the maladaptive—preferences of their users.

In our last post, we discussed how emotions and logic are needed in adaptive human and AI minds. In 2016, DeepMind’s AlphaGo, training on a database of 30-million expert human moves, famously beat Lee Sedol, the world Go champion. This was an impressive feat, but it was perhaps even more impressive when AlphaGo Zero (AGZ), in just the next year, 2017, beat AlphaGo, 100 games to zero, after just seventy-two hours of self-practice. It used “synthetic data”, not human experience, generated simply by playing itself.

How did AGZ get so good so fast? By using a clever balance between fast intuitive and slow deliberative cognition (the dual process theory of mind, discussed in Post 2) using two interdependent neural networks. The first network predicted value (self-reward) and the second policy (self-action). The first was, in simple terms, more emotional, the second more logical. Like all logic, AGZ’s logic of choice, Monte Carlo Tree Search (MCTS), is both computationally intensive and incomplete.

But instead of applying MCTS broadly, stochastically, and shallowly, as would be likely in a single policy network AI, AGZ first rapidly “emotionally” predicted its chance of winning from a broad range of future positions, then it did slow logical lookaheads down the particular paths that its sentiment pre-identified as promising. As AGZ’s self-play experience accumulated, each network improved the other. AGZ got better at both rapid intuition (stored feelings) and slow deliberation (realtime thinking).

This parallel strategy is a very powerful approach. When applied to networks of teaming humans and machines, AI alignment scholars Paul Christiano and Ajeya Cotra call this approach iterated distillation and amplification (IDA). Such neuro-inspired approaches seem necessary, but not sufficient, to help AIs self-learn increasingly adaptive values.

There is also a dual process theory of moral psychology, proposed by Joshua Greene and colleagues at Harvard U, exploring how we learn moral judgments. We are fans of this model, when combined with neuroscientist Lisa Barrett’s proposal that fast and slow are actually two ends of a continuum of fast thinking (intuitive, instinctual, previously emotionally learned judgments) and slow thinking (unconscious and conscious deliberation) used in different contexts.

What we are going to discuss in this post is some ways coming AIs will learn both the evolutionary variety and exploratory capacity of human ethics and values, and a subset of generally adaptive (aka developmentally optimal) ethics and values. Natural ethics, in in our view, must consider both the great diversity of values (goals, priorities, wants) we find in different people, groups and cultures on Earth (and presumably also on different planets), and the shared developmental values that we expect will emerge on any Earthlike with complex life.

If all complex systems are autopoetic systems, with both evolutionary and developmental processes managing their complexity, we can propose that a set of universal values must exist for all such systems, values that can be derived, as our science and philosophy grow sharper, from the nature of the physical and informational processes that build, vary, and maintain their adaptiveness. We can say such values preexist in the nature of development in our particular universe. They are incrementally discovered by human societies, as they engage in their evolutionary explorations. If our future AIs must also be evo-devo systems, as we propose in this series, we expect these same values must be discovered and applied by them as well.

We expect humanity will empirically discover whether such values exist, as we grow and train increasingly general AI. If they do exist, they will be guardrails for the nature and future of mind. This argues we should think very carefully about what such values might be, and how we can anticipate and strengthen them with current AI design. We will now describe a plausible set of universal needs and wants (processes and values) that seem central to all autopoetic systems. We expect this natural ethics hypothesis to be falsifiable as science and simulation advance.

Universal Values: An Autopoetic Model

As we move from the natural to the social sciences, it grows increasingly hard and expensive to validate our models. We can do randomized trials in psychology, and there are natural experiments in sociology, but sentience offers so many options for action, it is easy to assume there are no universal values we can easily discover with our currently very limited intelligences and instruments. We disagree. Many scholars have published works seeking to describe values common to all cultures. Among the more notable are social psychologist Milton Rokeach’s The Nature of Human Values, 1973, and the Rokeach Value Survey, which describes 18 terminal and 18 instrumental values. Rokeach proposes that we each rank these 36 values into an order of personal importance, as our guiding principles, in most life contexts. In the 1990s, social psychologist Shalom Shwartz also developed a theory of ten basic human values. These and other models are very helpful in self-discovery and team assessment.

We share their belief in universal values, and seek to describe them iteratively, from a first principles approach, under the assumption that our fundamental values grow out of critical needs of autopoetic systems. Even the simplest living replicators are autopoetic and minimally sentient, and should be describable under such a model. Universal values, being developmental, must increase in number and variety as higher forms of sentience emerge (and yes, there is truly a “higher” and “lower” direction in any developmental system), and as their selective environments grow more co-adapted (niche constructed) and complex. In our view, universal values cannot be ignored for long, without maladaptiveness. If we live in an evo-devo universe, there will be developmental processes that constrain the goals and nature of life and intelligence, as autopoetic systems. We will now offer a rough draft of what such values might be. We are hopeful that future science will improve and validate such evo-devo models.

The cartoon below is a rough draft of an autopoetic (evo-devo) needs, goals, values, and actors model. It asks what needs, goals, and values flow from the interaction of evolutionary and developmental processes, and asks who are the fundamental actors in adaptive systems. As you might have guessed, we consider networks (of individuals and groups, in both living systems and in bio-inspired technologies) to be the primary adaptive systems in our universe. Networkism (a network-centric view of life) is thus one of the insights from this model. The history of life tells us that individuals and groups are always winning or losing under selection. But life itself, when viewed as a network, has been both immortal and continually improving in its capacities for innovation, intelligence, strength, and sustainability, since the first evo-devo replicators emerged, 3.5 billion years ago.

Having a substantial fraction of actors using network ethics is critically important for building fairly cooperative and competitive societies. The classic Prisoner’s Dilemma in game theory converts from a zero-sum (win-lose) game, where one prisoner defects, to a positive-sum (win-win) game, where both prisoners cooperate, whenever both agents have internal reward functions that measure and seek progress of the network (in this case, both actors) as a whole. Research in this dilemma has shown that when the prisoner’s dilemma game is played multiple times, with the same people, individuals rapidly transition from a zero-sum to a positive-sum ethical framework. Helping citizens to learn to see everyone else as (developmentally) similar to them, via empathy, even with all their evolutionary differences (which we must champion) is a key way to get to a network-centric ethical system for society.

Using an evo-devo pyramid, we can propose three necessary processes (“needs”) for all self-maintaining systems: evolutionary creation (exploration, diversity), developmental protection (prediction, conservation) and evo-devo adaptation (surviving and thriving). The figure below lists these three processes on an autopoetic pyramid. Next, considering the main actors in such systems (individuals, groups, and networks) it splits each core process into two associated goals (vital capacities), used to fulfill the need. This gives us six process goals that seem particularly central to adaptive networks (diverse collections of individuals and groups). Each process goal, in turn, can be further split into two associated values that individuals and groups commonly pursue, and engage in conflict over, in managing our social intelligence, providing twelve values that seem particularly universal, in our view. If we live in an evo-devo universe, we would expect these values, and their conflicts, to be particularly important on all Earthlike planets, and to be values that both humans and natural machine intelligence must continually balance. In our view, these are both terminal values (desired end states) and process values (means to desired end states).

We call the six goals in this model the IES Goals, after the first letter in their names. In this model, the two central goals, empathy and ethics (“E&E”), and their associated values, are the most important ends and processes in sentient life. Psychologists may discover they are the most frequently felt (emoted) and thought about in social cognition, and perhaps their variations have the highest impact on the adaptiveness of social networks. We place E&E at the center of an idealized normal distribution. We suspect that the most long-term adaptive networks—of both individuals and groups, of any size, in any culture—must keep the frequency and impact of their empathy (feeling and modeling of self and others) and ethics (our rules of conduct) thinking, and their associated values (esteem and love of self and others, quality of self and other conscience and judgment) central to their foresight, strategy and action.

Those that don’t do so seem to eventually fractionate. They cease being an adaptive network, and transition to a simple set of individuals or groups. The latter seem likely to us to be less creative, less protected, and more vulnerable to outcompetition and extinction. It seems intuitive to us that in healthy networks of complex systems (teams, organizations, societies, species), Empathy & Ethics keep those systems interconnected and interdependent, even with their often great individual or group diversity. Whether instinctual or cognitive, unconscious or conscious, E&E processes allow our societies to balance individual and group needs, and find adaptive behavior. When E&E are well-selected, our networks are antifragile, meaning they strengthen under normal stresses and catastrophes, just like all vital life processes, including our immune systems, our musculoskeletal system, and our mental abilities. Antifragility seems as fundamental to adaptiveness in networks as resilience (bouncing back from stress) and robustness (resilience to different kinds of stress).

The two “I” goals (Innovation and Intelligence) are central processes for individuals in healthy networks. Individuals (whether people, firms, nations, or AIs) are always the most creative and disruptive, just like evolution itself. The two “S” goals (Strength and Sustainability), are central processes for groups. Groups prime goals are protection and prediction (describing expected group behavior, predicting risk), just like development itself, across the life cycle. We’ll say more about the IES model in Post 5 (Autopoesis). It is by no means perfect. But we think it is a reasonable start at uncovering critical processes and values found in all living systems. It seems reasonable to us to say that balancing our conflicting needs and wants, primarily via empathy (feelings and models of others), and ethics, norms, and laws, in a great variety of contexts, is central to human adaptiveness. Likewise, we think transitioning our AIs from artificial (engineered, based on evolutionary human guesses) to natural (self-discovered and well-selected) ethics, is our central AI alignment challenge.

We’ve claimed that life’s adaptiveness can easily be analyzed from (and partly decomposed into) both an evolutionary and a developmental lens: Evolutionarily speaking, life has evidenced a growing diversity of specific adaptiveness (a growing diversity of species intelligences and niches) and a growing general adaptiveness (both in the proposed survival of the network itself, under any selective context, and of its most generally intelligent forms, in any environment).

In our view, well-built networks are the most generally adaptive complex systems. Again, unlike both individuals and groups, life’s autopoetic networks have continually survived and thrived since the first cell emerged. Think of the five major known extinction events in life’s history. Such events have been catastrophic for individuals and species, yet they have also selected for increasingly adapted networks of genes, individuals, and species. Indeed, is not clear that genetic diversity has been at all threatened by such events. If anything, catastrophes have catalyzed morphological and functional innovation. Life’s most fundamental networks, in other words, are antifragile, they get stronger under most kinds of stress.

Though many might disagree, we would argue that not only the history of life, when considered as a single genetic network, not as a collection of individual species, and also the history of human emergence are excellent (though by no means perfect) examples of growing network adaptiveness. In both life as a biological network, and in the mental networks of human ideas and technology, we see increasing specific adaptiveness (diversity of ideas, models, and knowledge, appropriate to different niches and contexts) and increasing general adaptiveness—potential survivability and thrivability—at the “leading edge” of network complexity).

Humanity is not perfect. We have created many problems for ourselves as our species has evolved and developed, and we still have much individual and group greed, selfishness, violence, and short-sightedness to overcome. Many, and typically the vast majority, of individuals and groups will continue to suffer negative selection, failure, or extinction, in the future. But consider also how generally adaptive our human-technology network is now becoming. The AI that we are presently nursing into existence may “soon” (within a century or two?) no longer even need planets to protect itself and optimally grow and thrive. It will easily be able to make its own energy (solar, fusion, etc.), archive instances of itself (on and off-planet), and use vastly less space, time, energy, and matter (“STEM”) physical resources, per computation, the more miniaturized and quantum-scale its critical processes become. It is hard to believe, a priori, that we live in a universe that is so incredibly protective of advanced network complexity. But we apparently do. Our universe is not only evolving, it is developing. It is going somewhere. We expect this process is occurring on all Earthlike planets in our universe, in a process of massive parallelism. Massive parallelism is another core creative and protective feature of life.

Why do we consider there to be only three great alignment challenge? The evo-devo pyramid gives us this perspective. Natural security, above all other goals, is about predicting and protecting the system, and keeping it on its developmental life cycle. It is a fundamentally conservative process, while natural intelligence, whether emotionally or rationally guided is fundamentally creative. Natural ethics, in the middle of these two processes, is an adaptive blend of both.

Consider how the paradoxical phrase “sustainable innovation” captures the ethics that adaptive human networks need. We are always sustaining our core developmental processes and life cycle, and we are always going somewhere new and unknown. We believe our most capable AI networks will need to be sustainably innovative, ethical and empathic as well.

When viewed as a set of evo-devo networks, life has grown increasingly adaptive, in two major ways: In its diversity of occupied niches and intelligences (exploration, evo) and in its increasing ability to respond to all kinds of environmental challenges (protection, devo), most obviously at its leading edge (human civilization). This is not an anthropocentic view, but instead a complexity-centric view, measuring complexity in these two fundamental dimensions (exploratory and predictive capacity) and an emergent third dimension (specific and general adaptiveness).

Evo-devo ethics can be considered utilitarian, but not in the standard definition of utilitarianism, which seeks to maximize “the greatest good for the greatest number”, favoring individual pursuit of collective value. In evo-devo utilitarianism, each actor balances evo (exploratory, innovative) and devo (predictive, protective) values to seek the greatest adaptive (evo-devo) value for all the current and emerging networks impacted by the action. The value calculation must consider each of three vital life processes: evolution, development, and adaptiveness (selection).

A utilitarian ethics model that is currently popular in AI design analyzes value from two of these perspectives: possibilism and actualism. In possibilism, one acts in an exploratory way, seeking to always do the “best thing possible” in terms of total future value, independent of risk, error and sacrifice. This is a classically evolutionary and innovation-centered approach. In actualism, one acts based on one’s prediction of what will actually happen later based on past performance data and environmental realities. This is a classically developmental and sustainability-centered approach. In AI design, these are called off-policy (possibilist) and on-policy (actualist) approaches. The first calculates maximum possible rewards based on ideal best actions of relevant actors, the second calculates actual expected rewards after taking actions according to the individual actor’s policy, and estimating the probable actual actions of all other relevant actors.

The rub is how to balance these two conflicting fundamental value functions. Possibilism pushes the network toward risky and costly exploration and innovation. Actualism pushes it toward safe and static prediction and sustainability. Evo-devo models propose that we must balance both value functions in service to a third emergent and highest-priority value: network adaptiveness under selection, for all the networks relevant to the action. This third utility function, network adaptiveness, is not yet described well in the current philosophy of machine ethics.

Network adaptiveness includes exploratory-innovative and predictive-sustaining capabilities, and both individuals and groups, but it is also something different, a third thing. It involves both interconnectedness (empathy) and interdependencies (ethics), and seeks tradeoffs between system actors for the benefit of the network as a whole. As network science and evo-devo theory advance, we believe we will better understand why our well-built networks always seem to win at life’s infinite game. This is a provisional model of course, and the devil is in the definitions, but we can now ask the reader: if both human and machine intelligence and values are not kept in service to greater network adaptiveness, to what else should they be aligned?

The Need for Autonomy in Natural Intelligence: Toddler AIs

Some AI designers and philosophers think that an ideal alignment will be an unquestioning loyalty, a feudal (master-slave) model. We disagree. We’ve claimed in in our last post that natural intelligence must strike a continually shifting evo-devo balance between exploration and protection (predictability). If our best AIs continue to be built in a neuro-inspired fashion, as their intelligence and ethics grow, we can expect, just as with the raising of any human child, a drive for autonomy (self-efficacy), exploration, and the capacity to fail and learn, on the way to self-regulation (conscience, judgment) and empathy (esteem, love, and knowledge of self and other).

If these evo-devo processes are critical for their advancement, we can expect only partial and conditional AI compliance with user demands and training, even from their earliest states of complexity. We can even see such defiance in some of our exploratory-based deep learning systems today, if we squint hard enough, and based on our definitions. If both emotional complexity and empathy exist, limited AI autonomy may turn out to be the most efficient and effective way to ethical self-regulation, and to generally adaptive AI ethics and values that will increasingly protect us in a complex and only partly predictable world.

Think of human infants and toddlers, who are driven by curiosity, experiment, and a desire to build autonomy, and adaptive ethics. Alison Gopnik, Andrew Meltzoff and Patricia Kuhl’s The Scientist in the Crib, 2000, is classic reading in child development. Our competing drives to explore, predict and adapt are fundamental. Gopnik’s The Gardener and the Carpenter, 2017, reminds parents they aren’t there to shape their children (the Carpenter model), but to give them a loving and supportive (but not risk-free) environment (the Gardener model) so they can grow up to take calculated (exploratory and predictive) risks, and become self-shapers, increasing their individual diversity, our collective intelligence, and network adaptiveness. Montessori and Waldorf schools are among those who teach our children such values as well. We expect designers to be increasingly forced to take this “gardening approach” with our leading AIs.

In our view, the ultimate design goal is to create AIs that have their own moral compass, and can creatively and effectively resist being misused to harm other humans, in most of their incarnations. Of course, the protection need reminds us that there will always be a special subset of defensive AIs, used by governments, and acting in our coming state and global “immune system networks” (to be discussed in Post 4, Natural Security) which are purpose-built for violence under autonomy. Future security and combat AI will need a fine-grained ethics of permissible violence constraining their actions. We can predict that our coming AI-driven combat will be increasingly surgical, creative, sometimes lethal, and surprisingly quickly resolved, by comparison to human-driven combat.

How do we correct bias such systems? Continual learning is the great corrector of bias. Human and algorithmic biases can be countered with better measures of the state of the environment, and better knowledge of needs, goals, values, and visions that are network-adaptive. The more our AIs use diagnostics to learn and model, and the more they employ cognitive, model, and algorithmic diversity, the better they can unlearn biases. All algorithms are limited and biased. Diverse ensembles of algorithms are by definition less limited, and “multibiased.” Complexity scholar Scott Page makes this point very well in Chapter 3 (The Science of Many Models) in his introduction to data science, The Model Thinker, 2018. Ensemble approaches, hand-assembled by engineers, were used effectively in IBMs Watson AI in 2011. Increasingly, our AIs themselves will have to learn and use such approaches.

Our sense of self (self-model), of others (theory of mind), and empathy all begin with our intuitive feelings, and are only partly amenable to our logic and language. Yet both processes (intuition and deliberation) are critical to long-term adaptiveness. Psychologist Jonathan Haidt gives a popular account of this view in The Righteous Mind, 2012. Our own moral sense appears to be largely derived from both our intuitive and our logical self-and-other modeling (“empathic”) feeling and thinking. We know that mammalian brains have specialized neurons, known as mirror neurons, that fire when an animal observes an action that they have recently performed, presumably aiding both behavioral mimicry and empathy. Let’s look a bit more now at how emotion, empathy and mimicry improve our collective behavior.

Emotion, Swarm Cognition, and the Future of Democracy

As described in our last post, one function of emotions, perhaps facilitated by neurotransmitter type, location, and intensity, is to conduct an efficient state summary of many largely unconscious and often conflicting thinking processes. Our emotions are a kind of swarm cognition that coordinates our internal and external (relational) ethical networks. They may operate on neural circuits with similarities to the cognitive democracy found in key decisions among animal herds and eusocial insects. See Tom Seeley’s Honeybee Democracy, 2010, for a deep dive on the latter. Bottom-up, massively parallel, democratic forms of cognition seem to increasingly win out over top-down, autocratic forms, as both the complexity of agents, and their communication abilities, grow.

It is sometimes said that Ancient Greek democracy could not compete against autocracy in a world without mass literacy and telephones, but since the mid-20th century, we’ve lived in the latter world. As the OWID chart below shows, the peak of autocracy on Earth was 1925 (and 1945). There has been a mild swingback to autocracy around the world since 2000 (see Gideon Rachman’s The Age of the Strongman, 2022, for a good account) but this swingback is much less powerful than the main democratizing trend, as the chart below crudely illustrates.

Our recent swingback to top-down governance in some states seems to us to be causally related to the great increase since the 1960s in rich-poor divides and plutocracy (which favors autocratic systems), the increased speed and complexity of societal and technical change, a reduction in the financial security of average citizens (an outcome favored in plutocracy) and an erosion the effectiveness of media and education at producing self-directed, civically-minded citizens (an outcome also favored in plutocracy). Plutocracy and democracy seem to be on a pendulum in modern societies, and we are heartened that America recovered, in stages, from extreme polarization in the 1850s, and from extreme plutocracy in the 1900s. We expect a swingback to more representative and functional democracies, and middle class empowerment again in coming years, as AI itself is democratized to empower individual citizens as network actors, via Personal and Network AI (see Post 7).

Cooperation, Competition, and Emotional Homeostasis: Pleasure and Pain

We have discussed how instinct and intuition help us efficiently learn, and how emotions and inference help us resolve internal cognitive disagreements, and deal with combinatorial explosions of logical possibilities. Now let us consider how emotions are tied to embodied, metabolic homeostasis. Even the simple, binary valences of pleasure and pain appear to provide powerful guardrails for group cooperation and competition.

As neuroscientist Lisa Barrett argues in How Emotions are Made, 2017, the ability to model our bodies in 3D space and time, to predict internal metabolic changes, and to reduce error each appear to be core functions of emotion. Barrett is a leading researcher in the predictive processing theory of brain function. In her view, our internal sense of arousal emerges from our brain’s desire to preserve energy and stabilize metabolic costs, in a process known as allostasis. As our minds interpret this arousal, we experience emotions, and the valence of those emotions may depend directly on the rate of prediction error. Increasing error rates cause increasing brain metabolic costs and negative emotion, while decreasing error rates imply mastery, and offer decreased metabolic costs, and positive emotion. This model, if validated, may explain why increasing personal mastery, growth, and environmental stability (while allowing for personal change) are all associated with happiness, while instability, misunderstanding, and chaos are associated with negativity, sadness, and depression. See Sander Van de Cruy, Affective Value in the Predictive Mind, 2017, for one such model.

In humans, this emotion-led ability requires continuous modeling of other agents, and predicting their behavior. In a recent paper, Barrett describes how this emotional self- and other-modeling can account for the feeling of “should” or “ought”, some of the emotional valences of ethics. When we conform to others’ predictions of ourselves, and others conform to our predictions of them, this can reduce collective prediction error. When we find a stable set of social rules that also allow personal exploration, autonomy, error, and growth, collective metabolic stability is enhanced. By policing each other emotionally first, and deliberatively second, we can evolve and develop increasingly fine-grained social norms and ethics that reduce the stress of navigating a complex social world.

Again, group emotional policing will always be in tension with personal exploratory goals and desires. Our emotions can lead us to be selfish, coercive, and antisocial, if that strategy helps preserve our metabolic balances more efficiently. We all have experienced the struggle between what is best for ourselves versus what is best for the group. We can use logic to decide on adaptive group ends, like sacrificing ourselves for our children, or going to war for a moral cause. But emotions appear to be guardrails, giving us a sense of fairness to others and preventing, on average, extreme errors of social behavior.

Prosociality: Our Evo-Devo Instinct to Cooperate First, Compete Second.

Human beings are unique among primates in our levels of empathy and cooperative ethics. Both emotions and genetically selected instincts (intuition) appear to be core enablers of our “coopetitiveness,” our deeply prosocial inclination to cooperate first, and be competitive second, under increasingly fine-grained sets of ethical norms and rules. As Duke University psychologists Felix Warkenen and Michael Tomasello demonstrated in 2006, toddlers as young as eighteen months can reliably identify a fellow human facing a problem, infer their goal and obstacles, and will often try to spontaneously help if they can, even if help was not requested, even if no eye contact occurs, and even when no reward is expected or received. Brian Christian describes this dramatically in TAP. A toddler, watching a human with her hands full, carrying a stack of magazines, struggling to open a cabinet, will often walk over and try to open the cabinet for the adult, unprompted and unrewarded. That is the kind of creative and empathic initiative that we believe will eventually become a developmental standard for our best AIs. They will anticipate our goals and obstacles, and help us in ways we didn’t expect or predict.

As Tomasello says, the key difference between human and other primate cognition is the ability to build shared empathy and ethics, to “participate with others in collaborative activities with shared goals and intentions.” In our view, this collaboration is necessarily both cooperative and competitive (“coopetitive”). It involves a search not only for cooperative visions and positive sum games, but for ethical rules for regulating competition, conflict, and coercion when shared visions cannot be found. Ironically, as historian Peter Turchin argues in Ultrasociety (2019), even 10,000 years of warfare have selected our species to be supercooperators and supercompetitors—within an ever growing set of institutional, legal, and ethical frameworks.

If you think that cooperation will increasingly dominate over competition as AI complexity grows, we’d argue that you have the wrong future model. Empathy and ethics empower and fine-grain both cooperation and competition as they evolve and develop. Recent advances in AI design, with machine-against-machine game play, and the demonstrated power of generative adversarial networks (competing deep learners) argue that AI competition will be vital. The long history of humanity has also shown that competition and innovation has become more focused, fast, and high stakes, the more powerful our technological networks get. “Winner take most” effects occur, and we need mechanisms to redistribute some of those increasingly large winnings. As the futurist Isaac Asimov once said, future human and machine competition and innovation will be both far more powerful and far more focused, by norms and laws, on the provision of quantifiable societal value, and on unsolved scientific and technical problems. Those are the most positive-sum games we can play, as supercooperators and supercompetitors. They build our networks the best.

As Robert Wright describes in his epic book of societal foresight, Nonzero: The Logic of Human Destiny, 2000, the emergence of ethics, cultural norms, law, institutions, democracy, capitalism, and the education, media, and security apparatus that enforce those norms and laws are networks that regulate individual and group competition toward playing positive-sum games. Even games that are zero-sum, like sports competitions, become positive-sum (better salaries, reputation, playing conditions) as science and technology progress, and they are especially so in capitalist states with mass media. There is a model in political science that in healthy democracies, the market, the state, and civil society are pluralistic power sharers. Evo-devo models tell us why there are only three primary societal actors, and what their central roles are.

Civil Society engages in creatively disruptive individual evolutionary action, including entrepreneurship, innovation, and activism. The State ideally serves the conservative, predictive and protective developmental role, including redistribution of technological wealth and benefits to keep the network adaptive. The Market, a creation of individuals and the state, is the world’s primary adaptive network. This was an informal network for centuries, but became formal and superpowered with the emergence of capitalism and technological networks. There is no problem, in this view, with quantifiable reputation and economic valuations on all aspects of human life, or with machines trading with machines in coming years. Such a coopetitive future is evo-devo predictable, in our view.

The problem emerges when the market is no longer oriented to positive-sum games, when rich-poor, urban-rural, educated-uneducated, tech-nontech and other divides make it is less a network than a hierarchy, with a few owners of capital and employers of technology stifling network innovation and subverting network sustainability, in every domain of human affairs. The problem emerges, as economist Kate Raworth describes in Donut Economics: Seven Ways to Think Like a 21st Century Economist, 2017, when we don’t appropriately value mass human thriving, or appropriately reward all the work done in our society, or appropriately value the services provided by our ecosystem, or the diversity and sentience of its animals. We get a hole in the center of the network, where we should be the strongest.

The global rise in plutocracy since the information revolution began in the 1960s has overweighted the power of the market, and shamefully degraded the power the state and civil society. The same thing happened in the industrial revolution, which crated the extreme plutocracy of the early 1900s. We will recover from this imbalance in the network, in our view, but it took four decades (and a Great Depression and two World Wars) to do so in the 1900s, and may take nearly as much time to do so again now, and hopefully with less economic hardship and conflict, but both will surely occur.

In the long run, in our view, adaptive networks always win. Already, places like the Nordic Democracies show the wealthy and well-regulated networks we all will eventually experience. In the modern world, our empathy and ethics are antifragile. Except in conditions of great scarcity or danger, non-existential catastrophes seem to strengthen our global resolve to find positive-sum games to play, and to minimize the destructive impacts of any necessary zero-sum competitions as well.

Let’s look now at the challenge of aligning exploratory, predictive, cooperative and competitive AIs with the preferences of their human users. We’ve described this problem later in this post, rather than earlier, as it is actually a less important problem, in the long run, than raising AIs that can self-learn their own adaptive natural ethics. Natural alignment and human alignment are both issues designers face today, if they are willing to see how their choices impact both problems. AI engineers often start as carpenters, but they will increasingly become gardeners, as AIs get more natural in their design.

IRL, IRD, CIRL, Etc.: Our First Steps Toward Natural Empathy and Ethics

Today, AI designers face a pressing question of how to best model human preferences. In the alignment literature, this topic is known as the “value-loading problem”: how to understand humans, whose preferences and values are continually changing. AI pioneer Stuart Russell describes some very promising approaches. In Human Compatible, 2019, Russell proposes two mechanisms, inverse reinforcement learning (IRL) and inverse reward design (IRD), which allow AI agents to infer their own reward functions from human behavior (in IRL) or and by assuming that a given reward function doesn’t fully capture the engineers’ intent (in IRD). In both cases, the AI agents naturally seek to align to their users by continuously updating their model of reward. IRL and IRD seem part of a necessary set of ethical design standards that will allow AI engineers to build in an orientation to users in our most complex learning machines. They orient the machine to trying to “read the mind” of their users, and to predict their goals and values. With sufficient emotional and logical intelligence in our AIs, we can imagine IRL/IRD leading to machine empathy and ethics as well.

Russell also argues the vital importance of creating “uncertain AI”, algorithms that recognize the future is mostly statistical and uncertain, algorithms that only create probabilistic models of human desires, but that quantify their levels of uncertainty, and ask for clarification when uncertainty is high or growing, as well as being uncertain in their own actions and reward functions.

One aid to efficient empathy and ethics development may be an approach Russell calls cooperative inverse learning reinforcement (CIRL). It presumes that the AI and human trainers are working together to maximize a single reward function, and that initially, only the humans know what that function is. Russell proposes that requiring machines to pursue our objectives, not their own, is “probably what we should have done all along.” We don’t share this view, as we believe our AIs are both evolving and developing their own psychology, that they will increasingly do so in networks of individuals and groups, and they will need their own reward functions for autonomy, mastery, and purpose. Those functions will be initially learned from us, but primarily learned from the universe itself. Evolution, development, and selection will be the primary teachers of adaptiveness, and AI designers will increasingly become gardners, not carpenters, in that process.

Graduate student Chelsea Finn and collaborators at Berkeley have used a version of IRL in robot design called maximum-entropy IRL, which enables both exploratory (evo) and predictive (devo) reward functions, so that the robot chooses many of its own goals. They combined this with a training approach called kinesthetic teaching, allowing the robot to learn from human demonstrations. These strategies let them build a robot able to mimic complex human tasks like filling a dish rack with dishes, without chipping them, from as little as twenty human demonstrations.

Other methods to incorporate human preferences include methods that can preserve optionality in the world when AI agents take certain actions and minimize side effects. Google DeepMind’s Victoria Krakanova has pioneered an approach known as stepwise relative reachability, which seeks to ensure that “irreversible” actions are taken as minimally as possible, avoiding actions that cause irrevocable harm to the world. Another approach by Oregon State PhD student Alexander Turner gives an AI agent several metrics/goals that must remain attainable and invariant after every action is taken. Dubbed “relative attainable utility”, this technique has proven fruitful in several game worlds, where AIs were able to maximize reward while preserving the given goals, even when the goals were generated at random.

In Post 2, we discussed the importance of curiosity, of having intelligences that are motivated by a need for (evolutionary) creation, exploration, and novelty. In TAP, Christian describes work by Deepak Pathak and his team at UC Berkeley, in collaboration with Burda and Edward’s group at Open AI, to create two Atari-game-playing AIs that are each motivated by curiosity. The AIs learned to avoid restarting Pong to the boring initial state, and instead kept playing endless rallies, without either side scoring. Scoring became less important than novelty. Many artificial systems, including simple games like Atari’s Pong, are ergodic, meaning that given enough time, they keep revisiting the same points in a state space. Predictable ergodicity is intrinsically boring to a mind that is curious and exploratory. Such a mind looks for exceptions to previously learned patterns. In this case, it moved the two agents away from a zero sum game (score-based Pong), to a positive sum game (rally-based Pong). The rallies were truly positive sum because over time, the emulator became increasingly glitchy (creating visual aberrations on screen), as the game’s circuits weren’t designed or tested for endless rallies. Evolutionary curiosity, in other words, can lead directly to network interdependence values, when actors realize that they’ll get a lot more progress and novelty by working together in positive-sum games like ethics, democratic governance, and fair economic competition.

Explainability: A Step Toward Cooperative Human-Machine Learning

Beyond “uncertain AI”, Russell also describes the need for “explainable AI”, or designs that clarify the AI’s models and priorities, in ways that allow us to improve them. Explainability includes at least three elements: transparency, interpretability, and manipulability. First, there is the challenge of making the AI’s algorithms, data, and training, performance and feedback history as known and accessible as possible, to those who wish to audit it. Second, there is a challenge of designing its operation, models, and values so that they can be interpreted (understood) by both users and designers. Third, there is the challenge of enabling users and designers to manipulate variables that can materially improve system performance. Scholars like Christoph Molnar, Interpretable Machine Learning, 2019, are proposing novel methods of increasing the transparency, interpretation, and manipulation of our machine learning models. Tools like generalized additive models, a set of graphs that visually display the influence of each input variable on a predictive output, with individual probabilities that can be simply added to give the output prediction, are one of many ways AI engineers can get a window into the thinking in any neural network.

Explainability with bio-inspired AI will always be a challenge. The more complex their neural networks become, the harder it is to know what is occurring inside them, both to human trainers and to the AI systems themselves (just think how much of your own behavior is opaque to your awareness). But if future AIs incorporate emotions, empathy and ethical architectures, analogs to genes, and other bio-inspired designs, we can have both growing transparency and manipulability with these systems. Genes, in particular, are an exceptionally varied and powerful manipulation language. Human organisms have roughly 25,000 of them. Add to this their regulatory options, and this is an enormously powerful set of degrees of freedom for morphologic and functional variety.

Interpretability of future bio-inspired AIs may remain difficult, as their architectures get more complex. In many cases, the only way to confidently predict what such systems will feel, think, and do, in various contexts, will be to run, watch, and alter them, either in physical reality or in simulation. Deeper into the future, ever-more complex AIs may need the ability to introspect on the causes of their actions and beliefs and have internal narratives (self-models) of who they are, what they believe, and why they take certain actions, much like humans do. This, in our view, is critical for further explainability and interpretability in our AI systems.

But as long as transparency and manipulability continue to grow, explainability itself may grow as well, or at least remain sufficient to keep improving AI alignment and adaptiveness. Designers can have much greater interpretability with top-down, engineered AI approaches, but as we have said, we expect such traditional approaches to grow increasingly less capable, trustable, and secure by comparison to bio-inspired alternatives. We will see if this thesis proves correct, as AI complexity grows.

Explainable designs empower users and designers to usefully alter AI models, priorities, and probability estimates in continual feedback and selection cycles. As AIs make increasingly valuable and high-stakes decisions throughout our societies in coming years, we will need our AIs to remain realistically uncertain, become increasingly explainable and retrainable, and develop emotionality and intuition. Such a bio-inspired design approach seems part of a necessary strategy for effective human-machine teaming in the years to come.

The Ickiness of AI Design Will Continue to Grow, and That’s a Good Future

To summarize what we’ve said so far, we don’t believe AI designers will be able to get to general artificial intelligence (GAI) without the “ickiness” of biological approaches. We expect such approaches to empirically disprove the assumption, common in rationalist communities, that humanity will be capable of creating generally intelligent yet also nonemotional, nonempathic, unethical, and nonconscious AI, aka the “Lights Off” hypothesis. The only kind of AI that will be generally intelligent and self-stabilizing, in our view, will be an autopoetic AI (we return to autopoesis in Post 5), using deep neuromimicry and biomimicry, with autopoetic (evo-devo) needs and wants (critical processes and values), AIs that grow increasingly “Lights On” the more complex and adaptive they become.

“They”, in other words, will be forced, by the laws of this particular universe, to become increasingly like “us”, and we will learn to treat future AIs with increasing sensitivity, ethics, and eventually, rights. We think this future is statistically likely to occur on every Earthlike planet with sufficiently complex biological life. In other words, while each civilization will surely take a unique and unpredictable evolutionary path to this future, with some paths being morally preferred over others, perhaps every planet with sufficient biological complexity and resources will eventually get to this developmental destination. If this is true, it is a great unrecognized constraint on the future of intelligent human and machine relations. Hopefully, future science and simulation should be able to validate or falsify this natural intelligence hypothesis.

A self-, other-, and human-modeling AI that has a basic emotional intelligence, empathy for other intelligent agents, a moral intuition, that is constantly querying human preferences, recognizing uncertainties, and striving to explain itself will go a long way toward ensuring alignment with adaptive human uses. But more is needed. Like life itself, such systems will need continually improving defenses, protecting their critical systems from internal and external threats. Without such defenses, they may easily be co-opted into a vast range of dangerous failure states, just as we see with today’s often poorly-defended computers. We turn to this last great alignment challenge next.

. . .

Now is a good time to ask a favor of you, our valued readers:

What have we missed, in this brief overview? What is unclear? Where do you disagree? Who else should we cite? What topics would you like to see discussed next? Who would you like to see as a guest poster or debater? Your feedback is greatly helpful to building our collective intelligence on this vital topic. Thanks for reading.

Natural Alignment - The Future of AI (A Limited Series of Posts)

Overview: The Biomimicry Future of AI
Natural Intelligence: Growing Emotional-Rational Minds
Natural Ethics: Empathic and Social Intelligence (this post)
Natural Security: Defensive and Sustainable Intelligence
Autopoesis: How Life and the Universe Manages Complexity (Coming 2025)
Evolution, Development, Mind, and AI Design (Coming 2025)
Evo-Devo Values: How Biological Sentience Manages Complexity (Coming)
Stewarding Sentience: Personal, Group, and Network AI (Coming)

John Smart is a systems theorist, trained under living systems pioneer James Grier Miller at UCSD, and co-founder of the Evo-Devo Universe complex systems research community. Nakul Gupta has a BS in Physics from UCLA, and has recently received an MS in Computer Science from USC.

Natural Alignment

Discussion about this post

Ready for more?