Drift Into Failure (Sidney Dekker) » p.27 » Global Archive Voiced Books Online Free

Drift Into Failure, page 27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

The Alaska Airlines 261 accident, the topic of Chapter 2, was, as far as is known, not preceded by or correlated with a noticeable number of relevant occupational accidents. In fact, what is remarkable about this accident is that everybody was pretty much following the rules. Their local rules. The airline and its maintenance arm was abiding by recommendations from the manufacturer and the rules set by the regulator, the regulator was approving successive maintenance intervals on the basis of the evidence that was presented and deemed appropriate and sufficient at the time. The people doing the maintenance work followed the rules and procedures (however underspecified they might have been, but that was normal too) given to them for the execution of the lubrication task. Whether more or fewer maintenance workers would have suffered occupational incidents doesn't matter. It has no predictive value, it can carry no explanatory load relative to the eventual accident that emerged precisely because everybody was following their local rules. Just like the ants building their hill.

In Alaska 261, whether fewer or more technicians' hands got stuck and injured in the lubrication of the jack screw may bear no relationship to the eventual organizational accident. That is emergence. The behavior of the whole cannot be explained by, and is not mirrored in, the behavior of its constituent components. Instead, the behavior of the whole is the result – the emergent, cumulative result – of all these local components following their local rules and interact with each other in innumerous ways as they do so. Of course, unlike ants and slime mold, people across various organizations are not all the same. They are, in fact, quite different. But, like ants, all of them still respond to, and help shape, their local environment and follow or help shape the rules (written or otherwise, formal or informal) that apply there, So the properties of emergence would seem to hold. Behavior that is locally rational, that responds to local conditions and makes sense given the various rules that govern it locally, can add up to profoundly irrational behavior at the system level. Like an accident that takes scores, or hundreds, of lives – in a system where everybody locally works precisely to prevent that from happening.

A recent example from a major energy company operating in Louisiana near the Gulf of Mexico illustrates the problematic connection between the safety of individual components and system safety.47 An employee of this company injured himself when his swivel chair collapsed. The injury was severe enough for him to have to take a day off work. People from Occupational Health and Safety (OSHA) came in, inspected the situation, and issued a citation to the company for "not properly instructing employees how to sit in their chairs."

What was the company to do? Under pressure to show managerial action in response to the OSHA finding and citation, their health and safety department sent out a company-wide PowerPoint presentation, which demonstrated how to properly and safely sit in a swivel chair. In summary, it told employees to inspect their chair at least once every month, and to remove defective chairs from the workplace. Also, employees were to use the chair only for the purpose for which it was designed, and never stand in a chair to retrieve an object out of reach. The company president himself got personally involved with enforcing the OSHA sitting recommendations, citing employees who violated the new policy. His rationale was that "Permitting the smallest exceptions to our health, safety and environmental program is unacceptable and results in catastrophes such as the BP disaster."

Chairs tipping over because people don't sit in them in ways that they were designed for may have a relationship with a disaster the size of Deepwater Horizon, but it may not. Complexity doesn't allow us to draw such straight lines from the behavior of individual components to large events at system level. In fact, complexity understandings tipping to mean something different.

Phase Shifts and Tipping Points

The emergent behavior of a complex system can be very different, from one situation or moment to the next, than the collection of its parts would suggest. One dynamic that is responsible for this is called the tipping point, the phase shift, or phase transition. A phase shift means that a bit more (or less) of the same leads to something very different. Of course, if the relationship between parts and system were as straightforward as Newton proposed, then this Could never happen. A bit more behavior by the parts would lead to the same little bit more behavior by the system as a whole. But in complexity, that is not the case. A tiny little bit more (or a tiny little bit less) of the same in terms of the parts, can lead to something completely different for the whole system. It can lead to something qualitatively different.

The original idea for phase transitions comes from physics (or, more specifically: thermodynamics). It describes the shift of a thermodynamic system from one phase to another. As solid material is heated, it will transition to a liquid at some temperature, shifting phases, and changing system level properties. Heat it more, and it will change to a gas. Heat it eve more, and it may, in rare cases, become plasma. The behavior of the parts (molecules moving among each other) is not at all that different on either side of a phase transition point, yet the system properties undergo an abrupt change. For example, the volume taken up by steam is vastly different from the volume that boiling water needs (yet the difference in temperature between these two only needs to be infinitesimal).

The idea gained currency in sociology in the 1960s. The term tipping point, or angle of repose, was introduced to describe how a previously rare social phenomenon could become rapidly and dramatically more common. Morton Grodzins was studying racially diverse neighborhoods in the U.S.A. in the 1960s, when he discovered that most of the white families remained in the neighborhood as long as the comparative number of black families remained small. Very small. At a certain point, however, when "one too many" black families arrived, the remaining white families would move out en masse, in a process that became known as "white flight." The phrase tipping point of course was borrowed from physics itself, analogous to die adding of a small weight to a balanced object that could cause it to suddenly and completely (and irreversibly) topple over.

On September 17,2007, in Nisour Square, Baghdad, 17 people were killed and 24 injured when security teams from the private firm Blackwater U.S.A. unleashed a barrage of machine-gun fire. A bomb had gone off nearby just before. The first victim was Ahmed Haithem Ahmed, who was driving his mother, Mohassin, to pick up his father from the hospital where he worked as a pathologist. A Blackwater bullet tore through his head, but the car kept rolling toward the Blackwater convoy, and not much later 17 people were dead in a hail of gun fire, including Iraqis trying to escape to safety. No shot that could have provoked the Blackwater response had been heard.48

The event blew the extent of the privatization of warfare in Iraq into the open, with some in the U.S. congress arguing that if war was a profit motive, then peace could be hard to achieve. Blackwater lost its lucrative State Department contract to provide diplomatic security for the U.S. embassy in Baghdad after the incident.

Yet it would take two more years for the subtle drift towards the deadly incident to become apparent.

Blackwater U.S.A., a private security firm, was originally contracted to provide security for State Department and CIA employees after the September 11 attacks in 2001. In the spring of 2002, Erik Prince, the founder of Blackwater, offered to help the CIA guard its makeshift Afghan station in the Ariana Hotel in Kabul. Not long after signing that contract, dozens of Blackwater personnel, many of them former Navy Seals and Army Delta Force ex-soldiers, were sent out into the surrounding streets to provide perimeter security for the CIA station. From there, Blackwater operatives began accompanying CIA case officers on missions beyond the perimeter.

A similar progression happened in Iraq, Blackwater was first hired to provide static security for the CIA Baghdad station. Also, Blackwater employees were hired to provide personal security for CIA officers whenever they traveled in either Iraq or Afghanistan. This meant that Blackwater personnel began to accompany CIA officers even on offensive operations, sometimes launched in conjunction with Delta Force or Navy Seals teams (that is, former colleagues). It will never be possible to find out who fired the first offensive, not security-defensive, shot from a Blackwater gun in these operations, but lines soon began blurring.

Blackwater employees started to play central roles in so-called "snatch-and-grab" operations, intended to capture or kill militants in Iraq and Afghanistan, particularly during the 2004–2006 height of the Iraqi insurgency. Blackwater exercised a strong influence on such clandestine CIA operations, under the banner of being able to decide what the safest ways were to conduct such missions. They filled all roles, from "drivers to gunslingers."49

The House Oversight and Government Reform committee found that Blackwater had been involved in at least 195 shootings over the previous two years, many of which involved cover-ups of fatal shootings by its staff.

The incident on the 16th of September, 2007, was perhaps not a large departure from where things had been drifting. The Blackwater convoy in question was in the square to control traffic for a second convoy that was approaching from the south. The second convoy was bringing diplomats who had been evacuated from a meeting after a bomb went off near the compound where the meeting was taking place. That convoy had not arrived at the square by the time the shooting started.

As the gunfire continued, at least one of the Blackwater guards began screaming, "No! No! No!" and gestured to his colleagues to stop shooting, according to an Iraqi lawyer who was stuck in traffic and was shot in the back as he tried to flee.50

In the blur between CIA, military and contractor roles that grew during the worst part of the 2004—2006 Iraqi insurgency, not much would have distinguished one "snatch-and-grab" raid from the next. Blackwater guards would have fired shots in defense of the CIA and military, consistent with their assignment. Until a shot was fired that was not in defense, or not entirely in defense, or not at all in defense, but rather a contribution to the offensive raid. It is not that hard to imagine. You know the guys by name, you remember them from the time in the Seals or in Delta, you go on raids side-by-side for weeks, you see them lift a gun and fire in the same direction that everybody else is shooting in, you even get a shout perhaps, or a taunt, or a question. You shoot too. What's one offensive shot between friends? The next raid, you may even be expected to help out that way. One more shot, a bit more of die same, and the system began to display vastly different properties. Blackwater started to go in ahead of the others, started to help plan operations and take the lead, started to play an offensive role in the missions. Did the killing, even when unprovoked. A first offensive shot may have been a tipping point.

Despite original design requirements that the External Tank not shed debris, and the corresponding design requirement that the Shuttle not receive debris hits exceeding a trivial amount of force, debris impacted the Shuttle on each flight. Debris strikes were normal, in other words. Just like a lot of other technical problems – NASA engineers were, and always had been, working in an environment where technical problems proliferated. Flying with flaws was the norm. Over the course of 113 missions, foam-shedding and other debris impacts came to be regarded less as a hazard to the vehicle and crew. With each successful landing, it appears that NASA engineers and managers increasingly regarded the foam-shedding as inevitable, and as either unlikely to jeopardize safety or simply an acceptable risk.

The distinction between foam loss and debris events also appears to have become blurred. NASA and contractor personnel came to view foam strikes not as a safety of flight issue, but rather a maintenance, or "turnaround" issue. In Flight Readiness Review documentation, Mission Management Team minutes, In-Flight Anomaly disposition reports, and elsewhere, what was originally considered a serious threat to the Orbiter came to be treated as "in-family," a reportable problem that was within the known experience base, was believed to be understood, and was not regarded as a safety-of-flight issue. The reason why this problem was in the known experience base was that its result, heat the damage, had occurred on previous occasions (in fact, was very normal) and occurred because of a variety of reasons. Here was just one more.

The foam-loss issue was considered insignificant enough that Flight Readiness Review documents included no discussion about it. There was no paper trail of concerns about foam debris the damage that preceded the accident. This even fit the rules. According to Program rules, this discussion was not a requirement because the STS-112 incident was only identified as an "action," not an In-Flight Anomaly. Official definitions were assigned to each in-flight anomaly and ratified by the Flight Readiness Reviews. It limited the actions taken and the resources available for dealing with these problems.51

In the evaluation of damage caused by debris falling off the external tank prior to the 2003 Space Shuttle Columbia flight, you can see a similar phase shift. Under pressure to accommodate tight launch schedules and budget cuts (in part because of a diversion of funds to the international space station), it became more sensible to see certain problems as maintenance issues rather than flight safety risks. What was known as "debris events" now became "foam loss," a more innocuous label. Maintenance issues like foam loss could be cleared through a nominally simpler bureaucratic process, which allowed quicker turnarounds. In the enormous mass of assessments to be made between flights, foam debris strikes became one more issue. Gradually converting this issue from safety to maintenance was not different from a lot of other risk assessments and decisions that NASA had to do as one Shuttle landed and the next was prepared for flight. It was quite normal. It may, however, have represented the kind of phase shift, or phase transition – one more decision, just like tens of thousands of other decisions, that produced fundamentally different system behavior in the end.

With the benefit of hindsight, of course, it is easy to point to the flaws in these logics and priorities, for example those that converted a flight safety problem into a maintenance problem. But what we really need to understand is how these conversions of language made sense to decision-makers at the time. After all, their objective cannot have been to burn up a Space Shuttle on re-entry. And the important question to ask ourselves is how organizations can be made aware early on that such shifts in language can have far-reaching consequences, even if those are hard to foresee. In complex systems, after all, it is very hard to foresee or predict the consequences of presumed causes. So it is not the consequences that we should be afraid of (we might not even foresee them or believe them if we could). Rather, we should be weary of renaming things that negotiate their perceived risk down from what it was before.

Optimized at the Edge of Chaos

A common notion is that the functioning of complex systems is optimized at the edge of chaos. Originally a mathematical concept, the edge of chaos in socio-technical settings denotes a region of maximal complexity where systems are perched between order and disorder; between optimally organized complexity and chaos. Maximal functionality can be extracted from the system at this edge. This is where the system is tweaked to achieve its extreme diversity and variety, where complexity reaches its optimum. Here, the system can maximally, exhaustively and swiftly adapt and respond to changes in the environment (4WDs to speed boats to airplanes, day to night, Caribbean to Guinea). Indeed, this optimum, or maximum is determined very much in the relationship between complex system and environment.

In a sense, the edge of chaos is where the ecological arms race plays out in its most pure form – where competitors, or predator and prey are constantly trying to stay one step ahead of each other. The use of aerial surveillance will make trafficking over open ground less attractive, perhaps deflecting smugglers' routes into forests. The use of infrared or other dark-penetrating optics makes trafficking at night less protected, thus putting a greater premium on quiet daytime travel, when the sun is hot and high and policing might be less effective. The various actions and Counteractions constantly affect each other to undermine what was previously a good strategy while at the same time forcing the creation of new strategies. Complex systems tend to settle at the edge where their responses are just good enough to stay ahead of the others, but where there is not such a huge cost to generating those responses that they will run out of business because of it. That is the edge of chaos.

There is a fundamental trade-off that interdependent agents in a complex system must make, a trade-off that lies behind much of the adaptation that the system can display, and also why such adaptation sometimes becomes less successful. This is the trade-off between exploitation versus exploration. Exploration is a necessary activity for survival inside a complex, adaptive, living system. It means searching for a solution that is optimal relative to the landscape of opportunities and constraints you now know. Flying cocaine via West Africa may be a great opportunity that can be the result of exploration if the Caribbean has gone solid with narcotics countermeasures, policing, interdiction and arrests. Exploration is that which generates new smuggling routes (for example, through Africa). But making those routes work takes investment, of time and money. Local politicians may need to be bought, local strongmen need to be found and patronized, the local geography and ecology of inlets, ports, airports, roads, forests, hiding places, mules and so forth, need to be mapped. So if there are still plenty of holes to wiggle cocaine through in the Caribbean, then West Africa may not be optimal. The Caribbean can still be exploited – until further.

Exploitation means taking advantage of what you already know, it means reaping the benefits of past searches.52 If you stop exploring, however, and just keep exploiting, you might miss out on something much better, and your system may become brittle, or less adaptive. While you are exploiting Caribbean routes, competitors may have already set up shop in Guinea. By the time the Caribbean holes close, you are left with no alternatives other than declaring war on rival smugglers. At the edge of chaos, then, the system has reached an optimum point – not just in either exploration or exploitation, but in their complement. It gets enough return from exploiting that which was explored earlier, yet retains adaptive capacity by continued exploration of better options. In a complex system, however, it is difficult to say in advance what the returns of exploration are going to be, A new route may get discovered. But a key smuggler on a scouting mission may get caught, and give up his or her collaborators. Exploration can thus produce big events. This is because the optimum balance between exploration and exploitation puts a system near the edge of chaos.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Drift Into Failure, page 27

Other author's books: