Putting Ourselves Back in the Equation, page 20
Both are right, although the higher-level, psychological explanation makes the connection between cause and effect clearer. For one thing, it explains variations more readily. If you ask for St. Pancras Station instead or make the request of a different driver, the specifics of brain activity will differ, yet the psychology will be basically the same. All you have to do is say the word, and off you go. As hard as it can be to persuade people to do something, it’s still easier than attempting to bypass their senses and alter their brain activity directly. You would have to zap billions of neurons, each in just the right way. “If I want you to do something for me, I don’t do this by manipulating your brain state,” List said. “Of course, it’s totally unethical to do this, but secondly, it would also be totally infeasible. The most systematic and reliable way in which I can get you to do something is by, for instance, asking you to do it.”
FOUR QUARTERS OR TEN DIMES
Neural networks make excellent sandboxes for studying causation on multiple scales. Using them, you can test claims that are otherwise beyond you. For instance, it would be hopelessly ambitious to show directly that particle physics gives rise to human psychology—you’d have to bootstrap all of chemistry and biology first. The gulf is just too wide. We have to take it on faith that chemistry and biology do derive from physics; this has never been strictly proved. But neural networks embody in microcosm the same principles of collective organization that occur in nature. “What’s cool about machine learning, for me, is that the distance between the microscopic model and the output is much, much smaller,” said the physicist and neural network researcher Dan Roberts. “There’s a bunch of steps, but it seems much more approachable than going through chemistry and biology in the middle.”
Hoel runs with this idea. He considers some very simple cases to explicate the sometimes mysterious process of emergence. He starts with a network like the others we’ve been talking about and takes it as the fundamental description of a system. Then he studies how the units turn on and off in response to one another, seeing whether the units might be grouped together in ways that create simpler but equivalent networks. He borrows from physicists’ standard method for navigating multiple levels of description, known as renormalization, while adding distinctive concepts.22
Hoel gave the example of a miniature network: basically, a pair of lightbulbs screwed into a two-socket table lamp. The network has four possible states: both on, both off, one on and the other off, and vice versa. If the bulbs have separate switches, they don’t really form a network; they are just two independent lights. But often these lamps are wired together so that, by turning a single switch, you cycle through the four states. Then you have a true network that you can describe either at the level of the individual bulbs or at the level of their combined effect. Hoel zeroes in on three ways that the two levels might be distinct.
First, the higher level can dispense with irrelevant details. Suppose, for example, the bulbs are wired so that only one is on at any given time. This little network always provides one bulb’s worth of illumination. If all you care about is having enough light to read by, it doesn’t matter which bulb is on. The irrelevance of details is known to philosophers as multiple realizability and to physicists as universality or substrate-independence. One example is that a dollar bill, four quarters, and eight dimes and four nickels are multiple ways to realize the same amount of money. A dollar is a dollar, whether made of copper, paper, or digital bits—its value is independent of its material incarnation.
Such compositional flexibility is ubiquitous in nature. A glass of water looks placid, yet we know it is a vast, heaving ocean of molecules—more of them than all the humans who have ever lived, colliding trillions of times per second. Their complexity is hidden from us not just because molecules are small, but because their zillions of arrangements, from here at our scale, all appear basically the same. If you move one molecule a bit to the left or swap it with another, the molecule might notice, but no one peering at the glass from the outside will be any the wiser.
By neglecting these molecular machinations, you greatly simplify your description. Simpler means more deterministic, which means tighter causal control. If you shake, stir, or squeeze the water in bulk, you can reliably predict the outcome, which is very hard to do at the level of H2O molecules if you act on those molecules one by one. To be sure, any high-level description has its limits; if you heat the water to a boil, you’ll have to switch to a new high-level description. But over its range of validity, each of these descriptions illuminates the essential physics that would otherwise get lost in the molecular weeds.
A second way that wholes can be more than the sum of their parts is redundancy. A system can start off looking very complicated, but settle into one of only a few states. The other states never recur, and you gain in explanatory clarity by neglecting them. In the two-bulb network example, suppose one of the bulbs blows out as soon as you turn it on. From then on, you can forget about it and treat the system as a single bulb. This kind of attractor dynamics is common. We saw it in chapter 2 with the Hopfield network, which has multiple stable patterns of neural activity and will transition to one of them. These patterns are usually all you need to describe the system.
Finally, the higher level can take advantage of modularity. When a group of components performs some specialized function, you can treat it as a single unit and forget its inner workings, much as you can regard a living being as a collection of cells or computer software as a series of standardized subroutines. Hoel and his colleagues thus invite physicists to think like biologists or software engineers. George Ellis, a theoretical physicist and mathematician at the University of Cape Town, told me that this insight is an important addition to standard renormalization theory: “They are taking seriously the modular hierarchical nature of complex structures, which is the key to complexity.”
Whereas stripping out irrelevant or redundant details makes a system more deterministic and predictable, modularity can make it less so, because sometimes the “function” of a module is to create noise. For instance, a coin toss is completely deterministic if you think in terms of the basic physics: the air currents in the room, the precise flick of your fingers, and so on. But those details are hidden from you, so they don’t help you to predict the outcome. A higher-level description treats the coin toss as truly random. (This is separate from any randomness that quantum effects might add.) Life is filled with situations that are so complex that they are effectively random, and you save yourself a lot of frustration if you treat them as such from the get-go, rather than pretend you have control.
Having streamlined your description of a system, you can repeat the process, looking for additional structure and moving to an even higher scale. Hoel showed that you gain explanatory traction by going to a higher-level description. Appealingly for physicists, he puts a number to the gain. For the two-bulb network, if the bulbs cycle through all four of their possible states in succession, you have perfect knowledge of what they will do when you turn the lamp’s switch knob. For two bulbs, that’s 2 computer bits of information. But other situations are less certain. Suppose a wire is loose, so that if both bulbs are off, they stay off, but otherwise they flicker randomly. Working through the math, knowing the system’s current state gives you only 0.81 bit of information about its future. The connection between cause and effect is weakened.
To restore predictability, you collapse the three randomly cycling states into one. The new network is smaller, just two states—“off” and “flickering”—or a single computer bit. But now it is fully deterministic. So knowing its present state gives you 1 bit of information about the successor state, for a gain of 0.19 bit from the original description. “The higher scale is not just a compressed description,” Hoel said. “Rather, it’s that by getting rid of noise, either in the case of increasing determinism or by reducing redundancy, you get a more informative description.” Something similar happens by the zillions with molecular motions in water.
HOW IS CAUSATION LIKE TEXT-MESSAGING?
This IIT-based approach to causation challenges many intuitions that physicists have about emergence. First, it dissolves the intuition that causation must occur either at the base level or at the high level. It can happen at both. Hoel’s mathematical method apportions causation among multiple scales. In the flickering two-bulb network, you could say very loosely that 81 percent of the causal oomph of the network lies at its base level and 19 percent at the higher level.
7.1. CAUSAL EMERGENCE. Consider a rudimentary network consisting of a pair of lightbulbs controlled by a glitchy switch. If both are off, they stay off, but if one is on, they flicker randomly while never going entirely dark. So the network really has just two states: “permanently off” and “perpetually flickering.” This setup illustrates how neglecting randomness can reveal a system’s essential dynamics. This is a simple principle of emergence that also operates in much more complex systems.
A second common intuition is that higher levels must contain less information since, by definition, they gloss over details. In Hoel’s analysis, a higher level does lose information by being simpler, but it also gains information by being truer to the network’s structure. Less is more. Joseph Halpern of Cornell University, a computer scientist who worked with Pearl to develop the interventionist theory of causation, said: “Hoel is essentially pointing out that a ‘small’ model may be more than just an approximation of a larger model. It may actually in some sense of the word have more information than the larger model.”
To buttress this point, Hoel has drawn on theorems from an unexpected quarter: communications engineering. Signaling is a type of causation. You tap a key on your phone and cause a letter to appear on your friend’s screen. Making that happen reliably requires sophistication: our messages punch through electrical interference only because our phones and devices encode data in a form that resists degradation. “Code” just means a way to represent information. Morse code, for instance, translates letters and numbers to dits and dahs and then to electrical pulses. It takes advantage of the structure of the English language, encoding the most common letters, E and T, with the shortest sequences to speed up the transmission. Predictive coding is a more sophisticated version of that. Other types of codes compensate for errors in transmission, based not on the structure of the data, but on the characteristics of the medium.
Codes create levels of abstraction. We don’t have to transmit our messages using raw physical states, but can cleverly combine those states to squeeze the maximum performance out of a system. Hoel showed that levels of causation are entirely analogous. Higher levels of causation scrape away the noise in a system—the irrelevant details—to let the essential dynamics shine through. Using them, you maximize the control you exert. “Higher scales offer error correction by acting in a similar manner to codes, which means there is room for the higher scales to do extra work and be more informative,” Hoel said.
A real-world example that closely resembles Hoel’s simple two-bulb network is computer flash memory.23 At the microscopic level, it looks like an egg carton, with rows and rows of units. Each unit has four different voltage levels and, in principle, could hold two computer bits. But only one of these voltage levels is reliable, while the other three tend to cycle among themselves. So instead of trying to cram in a pair of bits, engineers store just one in each unit, encoding 0 as the reliable voltage and 1 as one of three flaky ones. Such a code halves the storage capacity, but what good is storage capacity if you lose your data? By analogy, a higher-level causal description may gloss over details, but those details often don’t really matter.
A third common intuition that Hoel, List, and others have revisited is that determinism is an either/or situation. People often say the world has to be either intrinsically random or regular and predictable. Much of the debate over quantum physics hinges on which it is. But in fact the world doesn’t have to be one or the other. It can be both. “The world could be deterministic at one level of description and it could also simultaneously be indeterministic at another level of description,” List said. This is because each level reworks the laws of nature. The laws governing liquid water, for example, are a product not just of the laws governing the individual H2O molecules, but also of the way those molecules are arranged. Even if the molecular laws are fully regular and predictable, higher levels can become randomized. Within raging river rapids, the molecules may be entirely orderly. Conversely, a smoothly flowing fluid may belie erratically waggling molecules.
This concept of the level dependence of determinism opens up a remarkable possibility. Might there be no fundamental laws of physics at all? In Hoel’s model, a deterministic higher-level description can emerge from a base level that is thoroughly anarchic. “Effective information can be unbounded at the macroscale while approaching the limit of zero at the microscale,” Hoel said. His work thus lends credence to the physicist John Wheeler’s idea of “law without law.” Wheeler speculated that chaotic microscopic events “flaunting their freedom from formula” are nonetheless collectively law-abiding, with “billions upon billions of such acts giving rise, via an overpowering statistics, to the regularities of physical law.”24 If so, physicists could dig to the foundations and find that reality is built on quicksand.
As helpful as Hoel’s scheme is in clarifying the concepts of emergence, it doesn’t get us very far in figuring out how emergence works in most real-world situations. Hoel considers very simple networks and even then has to resort to a brute-force computer analysis to identify the multiple scales on which they operate. Finding the structure within a system is fundamentally hard, because there are so many possibilities to consider.
Irina Higgins, a neuroscientist and an AI researcher at Google DeepMind in London, sees connections between Hoel’s work and her own research, which aims to help artificial neural networks pick out the right structure in images. If you train a network to identify cats, it will dutifully spit out a label for any animal you show it, but that doesn’t mean it has identified catlike structures within the images. It might instead be creating bizarre combinations of pixels that happen to be correlated with the type of pet in those images, rather than conceiving of cats as creatures with tails, fur, and whiskers. Higgins is able to force a network to create realistic representations of cats. Her techniques, like Hoel’s, work by eliminating redundancy, on the assumption that a parsimonious description is truer to reality.25
But these techniques don’t yet work for multilayered images. “I am not aware of any model that can do it properly right now,” she told me in 2021. She gave me the example of an image showing sheep in a field surrounded by forest. “Do you represent each sheep or a flock as a whole? Do you represent the background as a single item, or do you split it into field plus forest plus sky, or do you represent it at the level of individual trees or even blades of grass?” she said. The machine has no reason to parse the scene one way or the other. Our brains would handle the task almost effortlessly, but even they build in a lot of presuppositions about what the structure is likely to be. For the same reason, it is inherently difficult to separate causation by layer. We take it for granted there is one way to make the world, when really there are countless ways.
IS IT POSSIBLE TO SAY ANYTHING NEW ABOUT FREE WILL?
A more sophisticated understanding of emergence could also help to unstick debates over free will. Traditionally the domain of philosophy, free will is a concern of physics, too. If we are to achieve a full understanding of causation, we can hardly leave out the most intricate causal actors known to science: ourselves.
Free will is the rare philosophical concept that is useful as well as fascinating. For example, our justice system and democratic processes are built on assumptions about individual volition. And debating whether we have it used to be almost as good a way to pass an evening with friends as playing Cards Against Humanity. But free will has lost some of its fun for me; these debates can bring out the worst in people. Everyone seems so sure of themselves.
But if everyone just chills, they will see that progress is possible. Consider how the debate has evolved historically. The question of whether we are the authors of our choices or merely cogs in a clockwork universe—or whether those two options are truly in opposition—used to hinge on determinism: If everything that happens, happens for a reason, then whether I’ll choose coffee or tea tomorrow morning is preordained. Eons before I or coffee or tea or Earth existed, the atoms that filled the early universe were subtly imprinted with the imperative, Get this man some coffee. But if people debating free will have come to agree on anything, it’s that determinism is a red herring. The laws of physics may well be indeterministic, in which case my choice of tea or coffee is due to a random atomic swerve. From my point of view, that’s no different from deterministic preordination. The choice is still being made for me.
Another point of general agreement on free will is that humans aren’t exempt from the laws of physics. The eminent philosopher who taught my college metaphysics class in the ’80s tried to convince us that we stand outside physics—that we have free will because human agency is, like God, an unmoved mover.26 Few think that anymore. Most accept that our decisions don’t break physics. They aren’t bolts out of the blue. They have antecedents.
Today the debate has shifted to the nature of causation. If causation lies entirely at the fundamental physical level, then we’re just puppets, or not even that—just big puppet-shaped blobs of atoms. But if Russell’s critique of causation is right, physics at its roots has no directedness or sense of compulsion; the category of cause just doesn’t apply there. Some physicists do think that causation is fundamental after all.27 But even if they’re right, the causes and effects that are relevant to free will arise at a higher level of description, just as human beings (as opposed to blobs of atoms) arise at a higher level of description. We can’t talk about your making a decision until we can talk about you. For Hoel, List, and others, higher levels do have causal power, and so, potentially, do you.
