Posts Tagged Systems Engineering
The use of weighted-sum value matrices is a core component of many system-procurement and organizational decisions including risk assessments. In recent years the USAF has eliminated weighted-sum evaluations from most procurement decisions. They’ve done this on the basis that system requirements should set accurate performance levels that, once met, reduce procurement decisions to simple competition on price. This probably oversimplifies things. For example, the acquisition cost for an aircraft system might be easy to establish. But life cycle cost of systems that includes wear-out or limited-fatigue-life components requires forecasting and engineering judgments. In other areas of systems engineering, such as trade studies, maintenance planning, spares allocation, and especially risk analysis, multi-attribute or multi-criterion decisions are common.
Weighted-sum criterion matrices (and their relatives, e.g., weighted-product, AHP, etc.) are often criticized in engineering decision analysis for some valid reasons. These include non-independence of criteria, difficulties in normalizing and converting measurements and expert opinions into scores, and logical/philosophical concerns about decomposing subjective decisions into constituents.
Years ago, a team of systems engineers and I, while working through the issues of using weighted-sum matrices to select subcontractors for aircraft systems, experimented with comparing the problems we encountered in vendor selection to the unrelated multi-attribute decision process of mate selection. We met the same issues in attempting to create criteria, weight those criteria, and establish criteria scores in both decision processes, despite the fact that one process seems highly technical, the other one completely non-technical. This exercise emphasized the degree to which aircraft system vendor selection involves subjective decisions. It also revealed that despite the weaknesses of using weighted sums to make decisions, the process of identifying, weighting, and scoring the criteria for a decision greatly enhanced the engineers’ ability to give an expert opinion. But this final expert opinion was often at odds with that derived from weighted-sum scoring, even after attempts to adjust the weightings of the criteria.
Weighted-sum and related numerical approaches to decision-making interest me because I encounter them in my work with clients. They are central to most risk-analysis methodologies, and, therefore, central to risk management. The topic is inherently multidisciplinary, since it entails engineering, psychology, economics, and, in cases where weighted sums derive from multiple participants, social psychology.
This post is an introduction-after-the-fact, to my previous post, How to Pick a Spouse. I’m writing this brief prequel to address the fact that blog excerpting tools tend to use only the first few lines of a post, and on that basis, my post appeared to be on mate selection rather than decision analysis, it’s main point.
If you’re interested in multi-attribute decision-making in the engineering of systems, please continue now to How to Pick a Spouse.
Katz’s Law: Humans will act rationally when all other possibilities have been exhausted.
Bekhap’s Law asserts that brains times beauty equals a constant. Can this be true? Are intellect and beauty quantifiable? Is beauty a property of the subject of investigation, or a quality of the mind of the beholder? Are any other relevant variables (attributes) intimately tied to brains or beauty? Assuming brains and beauty both are desirable, Backhap’s Law implies an optimization exercise – picking a point on the reciprocal function representing the best compromise between brains and beauty. Presumably, this point differs for all evaluators. It raises questions about the marginal utility of brains and beauty. Is it possible that too much brain or too much beauty could be a liability? (Engineers would call this an edge-case check of Beckhap’s validity.) Is Beckhap’s Law of any use without a cost axis? Other axes? In practice, if taken seriously, Backhap’s Law might be merely one constraint in a multi-attribute decision process for selecting a spouse. It also sheds light on the problems of Air Force procurement of the components of a weapons system and a lot of other decisions. I’ll explain why.
I’ll start with an overview of how the Air Force oversees contract awards for aircraft subsystems – at least how it worked through most of USAF history, before recent changes in procurement methods. Historically, after awarding a contract to an aircraft maker, the aircraft maker’s engineers wrote specs for its systems. Vendors bid on the systems by creating designs described in proposals submitted for competition. The engineers who wrote the specs also created a list of a few dozen criteria, with weightings for each, on which they graded the vendors’ proposals. The USAF approved this criteria list and their weightings before vendors submitted their proposals to ensure the fairness deserved by taxpayers. Pricing and life-cycle cost were similarly scored by the aircraft maker. The bidder with the best total score got the contract.
A while back I headed a team of four engineers, all single men, designing and spec’ing out systems for a military jet. It took most of a year to write these specs. Six months later we received proposals hundreds of pages long. We graded the proposals according to our pre-determined list of criteria. After computing the weighted sums (sums of score times weight for each criteria) I asked the engineers if the results agreed with their subjective judgments. That is, did the scores agree with the subjective judgment of best bidder made by these engineers independent of the scoring process. Only about half of them were. I asked the team why they thought the score results differed from their subjective judgments.
They proposed several theories. A systems engineer, viewing the system from the perspective of its interactions and interfaces with the entire aircraft may not be familiar with all the internal details of the system while writings specs. You learn a lot of these details by reading the vendors’ proposals. So you’re better suited to create the criteria list after reading proposals. But the criteria and their weightings are fixed at that point because of the fairness concern. Anonymized proposals might preserve fairness and allow better criteria lists, one engineer offered.
But there was more to the disconnect between their subjective judgments of “best candidate” and the computed results. Someone immediately cited the problem of normalization. Converting weight in pounds, for example, to a dimensionless score (e.g., a grade of 0 to 100) was problematic. If minimum product weight is the goal, how you do you convert three vendors’ product weights into grades on the 100 scale. Giving the lowest weight 100 points and subtracting the percentage weight delta of the others feels arbitrary – because it is. Doing so compresses the scores excessively – making you want to assign a higher weighting to product-weight to compensate for the clustering of the product-weight scores. Since you’re not allowed to do that, you invent some other ad hoc means of increasing the difference between scores. In other words, you work around the weighted-sum concept to try to comply with the spirit of the rules without actually breaking the rules. But you still end up with a method in which you’re not terribly confident.
A bright young engineer named Hui then hit on a major problem of the weighted-sum scoring approach. He offered that the criteria in our lists were not truly independent; they interacted with each other. Further, he noted, it would be impossible to create a list of criteria that were truly independent. Nature, physics and engineering design just don’t work like that. On that thought, another engineer said that even if the criteria represented truly independent attributes of the vendors’ proposed systems, they might not be independent in a mental model of quality judgment. For example, there may be a logical quality composed of a nonlinear relationship between reliability, spares cost, support equipment, and maintainability. Engineering meets philosophy.
We spent lunch critiquing and philosophizing about multi-attribute decision-making. Where else is this relevant, I asked. Hui said, “Hmmm, everywhere?” “Dating!” said Eric. “Dating, or marriage?”, I asked. They agreed that while their immediate dating interests might suggest otherwise, all four were in fact interested in finding a spouse at some point. I suggested we test multi-attribute decision matrices on this particular decision. They accepted the challenge. Each agreed to make a list of past and potential future candidates to wed, without regard for the likelihood of any mutual interest the candidate might have. Each also would independently prepare a list of criteria on which they would rate the candidates. To clarify, each engineer would develop their own criteria, weightings, and scores for their own candidates only. No multi-party (participatory) decisions were involved; these involve other complex issues beyond our scope here (e.g., differing degrees of over/under-confidence in participants, doctrinal paradox, etc.). Sharing the list would be optional.
Nevertheless, on completing their criteria lists, everyone was happy to share criteria and weightings. There were quite a few non-independent attributes related to appearance, grooming and dress, even within a single engineer’s list. Likewise with intelligence. Then there was sense of humor, quirkiness, religious compatibility, moral virtues, education, type A/B personality, all the characteristics of Myers-Briggs, Eysenck, MMPI, and assorted personality tests. Each engineer rated a handful of candidates and calculated the weighted sum for each.
I asked everyone if their winning candidate matched their subjective judgment of who the winner should have been. A resounding no, across the board.
Some adherents of rigid multi-attribute decision processes address such disconnects between intuition and weighted-sum decision scores by suggesting that in this case we merely adjust the weightings. For example, MindTools suggests:
“If your intuition tells you that the top scoring option isn’t the best one, then reflect on the scores and weightings that you’ve applied. This may be a sign that certain factors are more important to you than you initially thought.”
To some, this sounds like an admission that subjective judgment is more reliable than the results of the numerical exercise. Regardless, no amount of adjusting scores and weights left the engineers confident that the method worked. No adjustment to the weight coefficients seemed to properly express tradeoffs between some of the attributes. I.e., no tweaking of the system ordered the candidates (from high to low) in a way that made sense to each evaluator. This meant the redesigned formula still wasn’t trustworthy. Again, the matter of complex interactions of non-independent criteria came up. The relative importance of attributes seems to change as one contemplates different aspects of a thing. A philosopher’s perspective would be that normative statements cannot be made descriptive by decomposition. Analytic methods don’t answer normative questions.
Interestingly, all the engineers felt that listing criteria and scoring them helped them make better judgments about the ideal spouse, but not the judgments resulting directly from the weighted-sum analysis.
Fact is, picking which supplier should get the contract and picking the best spouse candidate are normative, subjective decisions. No amount of dividing a subjective decision into components makes it objective. Nor does any amount of ranking or scoring. A quantified opinion is still an opinion. This doesn’t mean we shouldn’t use decision matrices or quantify our sentiments, but it does mean we should not hide behind such quantifications.
From the perspective of psychology, decomposing the decision into parts seems to make sense. Expert opinion is known to be sometimes marvelous, sometimes terribly flawed. Daniel Kahneman writes extensively on associative coherence, finding that our natural, untrained tendency is to reach conclusions first, and justify them second. Kahneman and Gary Klein looked in detail at expert opinions in “Conditions for Intuitive Expertise: a Failure to Disagree” (American Psychologist, 2009). They found that short-answer expert opinion can be very poor. But they found that the subjective judgments of experts forced to examine details and contemplate alternatives – particularly when they have sufficient experience to close the intuition feedback loop – are greatly improved.
Their findings seem to support the aircraft engineers’ views of the weight-sum analysis process. Despite the risk of confusing reasons with causes, enumerating the evaluation criteria and formally assessing them aids the subjective decision process. Doing so left them more confident about their decisions, for spouse and for aircraft system, though those decision differed from the ones produced by weighted sums. In the case of the aircraft systems, the engineers had to live with the results of the weighted-sum scoring.
I was one of the engineers who disagreed with the results of the aircraft system decisions. The weighted-sum process awarded a very large contract to the firm whose design I judged inferior. Ten years later, service problems were severe enough that the Air Force agreed to switch to the vendor I had subjectively judged best. As for the engineer-spouse decisions, those of my old engineering team are all successful so far. It may not be a coincidence that the divorce rates of engineers are among the lowest of all professions.
Hedy Lamarr was granted a patent for spread-spectrum communication technology, paving the way for modern wireless networking.
Last time I started with my friend Willie’s bold claim that he doesn’t believe in probability; then I gave a short history of probability. I observed that defining probability is a controversial matter, split between objective and subjective interpretations. About the only thing these interpretations agree on is that probability values range from zero to one, where P = 1 means certainty. When you learn probability and statistics in school, you are getting the frequentist interpretation, which is considered objective. Frequentism relies on directly equating observed frequencies with probabilities. In this model, the probability of an event exactly equals the limit of the relative frequency of that outcome in an infinitely large number of trials.
The problem with this interpretation in practice – in medicine, engineering, and gambling machines – isn’t merely the impossibility of an infinite number of trials. A few million trials might be enough. Running trials works for dice but not for earthquakes and space shuttles. It also has problems with things like cancer, where plenty of frequency data exists. Frequentism requires placing an individual specimen into a relevant population or reference class. Doing this is easy for dice, harder for humans. A study says that as a white males of my age I face a 7% probability of having a stroke in the next 10 years. That’s based on my membership in the reference class of white males. If I restrict that set to white men who don’t smoke, it drops to 4%. If I account for good systolic blood pressure, no family history of atrial fibrillation or ventricular hypertrophy, it drops another percent or so.
Ultimately, if I limit my population to a set of one (just me) and apply the belief that every effect has a cause (i.e., some real-world chunk of blockage causes an artery to rupture), you can conclude that my probability of having a stroke can only be one of two values – zero or one.
Frequentism, as seen by its opponents, too closely ties probabilities to observed frequencies. They note that the limit-of-relative-frequency concept relies on induction, which might mean it’s not so objective after all. Further, those frequencies are unknowable in many real-world cases. Still further, finding an individual’s correct reference class is messy, possibly downright subjective. Finally, no frequency data exists for earthquakes that haven’t happened yet. All that seems to do some real damage to frequentism’s utility score.
The subjective interpretations of probability propose fixes to some of frequentism’s problems. The most common subjective interpretation is Bayesianism, which itself comes in several flavors. All subjective interpretations see probability as a degree of belief in a specific outcome, as held by a rational person. Think of it as a fair bet with odds. The odds you’re willing to accept for a bet on your race horse exactly equals your degree of belief in that horse’s ability to win. If your filly were in the same race an infinite number of times, you’d expect to break even, based on those odds, whether you bet on her or against her.
Subjective interpretations rely on logical coherence and belief. The core of Bayesianism, for example, is that beliefs must 1) originate with a numerical probability estimate, 2) adhere to the rules of probability calculation, and 3) follow an exact rule for updating belief estimates based on new evidence. The second rule deals with the common core of probability math used in all interpretations. These include things like how to add and multiply probabilities and Bayes theorem, not to be confused with Bayesianism, the belief system. Bayes theorem is an uncontroversial equation relating the probability of A given B to the probability of A and the probability of B. The third rule of Bayesianism is similarly computational, addressing how belief is updated after new evidence. The details aren’t needed here. Note that while Bayesianism is generally considered subjective, it is still computationally exacting.
The obvious problem with all subjective interpretations, particularly as applied to engineering problems, is that they rely, at least initially, on expert opinion. Life and death rides on the choice of experts and the value of their opinions. As Richard Feynman noted in his minority report on the Challenger, official rank plays too large a part in the choice of experts, and the higher (and less technical) the rank, the more optimistic the probability estimates.
The engineering risk analysis technique most consistent with the frequentist (objective) interpretation of probability is fault tree analysis. Other risk analysis techniques, some embodied in mature software products, are based on Bayesian (subjective) philosophy.
When Willie said he didn’t believe in probability, he may have meant several things. I’ll try to track him down and ask him, but I doubt the incident stuck in his mind as it did mine. If he meant that he doesn’t believe that probability was useful in system design, he had a rational belief; but I disagree with it. I doubt he meant that though.
Willie may have been leaning toward the ties between probability and redundancy in system design. Probability is the calculus by which redundancy is allocated to redundant systems. Willie may think that redundancy doesn’t yield the expected increase in safety because having more equipment means more things than can fail. This argument fails to face that, ideally speaking, a redundant path does double the chance having a component failure, but squares the probability of system failure. That’s a good thing, since squaring a number less than one makes it smaller. In other words, the benefit in reducing the chance of system failure vastly exceeds the deficit of having more components to repair. If that was his point, I disagree in principle, but accept that redundancy is no excuse for lack of component design excellence.
He may also think system designers can be overly confident of the exponential increase in modeled probability of system reliability that stems from redundancy. That increase in reliability is only valid if the redundancy creates no common mode failures and no latent (undetected for unknown time intervals) failures of redundant paths that aren’t currently operating. If that’s his point, then we agree completely. This is an area where pairing the experience and design expertise of someone like Willie with rigorous risk analysis using fault trees yields great systems.
Unlike Willie, Challenger-era NASA gave no official statement on its belief in probability. Feynman’s report points to NASA’s use of numeric probabilities for specific component failure modes. The Rogers Commission report says that NASA management talked about degrees of probability. From this we might guess that NASA believed in probability and its use in measuring risk. On the other hand, the Rogers Commission report also gives examples of NASA’s disbelief in probability’s usefulness. For example, the report’s Technical Management section states that, “NASA has rejected the use of probability on the basis that such techniques are insufficient to assure that adequate safety margins can be applied to protect the lives of the crew.”
Regardless of what NASA’s beliefs about porbability, it’s clear that NASA didn’t use fault tree analysis for the space shuttle program prior to the Challenger disaster. Nor did it use Bayesian inference methods, any hybrid probability model, or any consideration of probability beyond opinions about failures of critical items. Feynman was livid about this. A Bayesian (subjective, but computational) approach would have at least forced NASA to make it subjective judgments explicit and would have produced a rational model of its judgments. Post-Challenger Bayesian analyses, including one by NASA, varied widely, but all indicated unacceptable risk. NASA has since adopted risk management approaches more consistent with those used in commercial and military aircraft design.
An obvious question arises when you think about using a frequentist model on nearly one-of-a-kind vehicles. How accurate can any frequency data be for something as infrequent as a shuttle flight? Accurate enough, in my view. If you see the shuttle as monolithic and indivisible, the data is too sparse; but not if you view it as a system of components, most of which, like o-ring seals, have close analogs in common use, with known failure rates.
The FAA mandated probabilistic risk analyses of the frequentist variety (effectively mandating fault trees) in 1968. Since then flying has become safe, by any measure. In no other endeavor has mankind made such an inherently dangerous activity so safe. Aviation safety progressed through many innovations, redundant systems being high on the list. Probability is the means by which you allocate redundancy. You can’t get great aircraft systems without designers like Willie. Nor can you get them without probability. Believe it or not.
Years ago in a meeting on design of a complex, redundant system for a commercial jet, I referred to probabilities of various component failures. In front of this group of seasoned engineers, a highly respected, senior member of the team interjected, “I don’t believe in probability.” His proclamation stopped me cold. My first thought was what kind a backward brute would say something like that, especially in the context of aircraft design. But Willie was no brute. In fact he is a legend in electro-hydro-mechanical system design circles; and he deserves that status. For decades, millions of fearless fliers have touched down on the runway, unaware that Willie’s expertise played a large part in their safe arrival. So what can we make of Willie’s stated disbelief in probability?
Friends and I have been discussing risk science a lot lately – diverse aspects of it including the Challenger disaster, pharmaceutical manufacture in China, and black swans in financial markets. I want to write a few posts on risk science, as a personal log, and for whomever else might be interested. Risk science relies on several different understandings of risk, which in turn rely on the concept of probability. So before getting to risk, I’m going to jot down some thoughts on probability. These thoughts involve no computation or equations, but they do shed some light on Willie’s mindset. First a bit of background.
Oddly, the meaning of the word probability involves philosophy much more than it does math, so Willie’s use of belief might be justified. People mean very different things when they say probability. The chance of rolling a 7 is conceptually very different from the chance of an earthquake in Missouri this year. Probability is hard to define accurately. A look at its history shows why.
Mathematical theories of probability only first appeared in the late 17th century. This is puzzling, since gambling had existed for thousands of years. Gambling was enough of a problem in the ancient world that the Egyptian pharaohs, Roman emperors and Achaemenid satraps outlawed it. Such legislation had little effect on the urge to deal the cards or roll the dice. Enforcement was sporadic and halfhearted. Yet gamblers failed to develop probability theories. Historian Ian Hacking (The Emergence of Probability) observes, “Someone with only the most modest knowledge of probability mathematics could have won himself the whole of Gaul in a week.”
Why so much interest with so little understanding? In European and middle eastern history, it seems that neither Platonism (determinism derived from ideal forms) nor the Judeo/Christian/Islamic traditions (determinism through God’s will) had much sympathy for knowledge of chance. Chance was something to which knowledge could not apply. Chance meant uncertainty, and uncertainty was the absence of knowledge. Knowledge of chance didn’t seem to make sense. Plus, chance was the tool of immoral and dishonest gamblers.
The term probability is tied to the modern understanding of evidence. In medieval times, and well into the renaissance, probability literally referred to the level of authority – typically tied to the nobility – of a witness in a court case. A probable opinion was one given by a reputable witness. So a testimony could be highly probable but very incorrect, even false.
Through empiricism, central to the scientific method, the notion of diagnosis (inference of a condition from key indicators) emerged in the 17th century. Diagnosis allowed nature to be the reputable authority, rather than a person of status. For example, the symptom of skin spots could testify, with various degrees of probability, that measles had caused it. This goes back to the notion of induction and inference from the best explanation of evidence, which I discussed in past posts. Pascal, Fermat and Huygens brought probability into the respectable world of science.
But outside of science, probability and statistics still remained second class citizens right up to the 20th century. You used these tools when you didn’t have an exact set of accurate facts. Recognition of the predictive value of probability and statistics finally emerged when governments realized that death records had uses beyond preserving history, and when insurance companies figured out how to price premiums competitively.
Also around the turn of the 20th century, it became clear that in many realms – thermodynamics and quantum mechanics for example – probability would take center stage against determinism. Scientists began to see that some – perhaps most – aspects of reality were fundamentally probabilistic in nature, not deterministic. This was a tough pill for many to swallow, even Albert Einstein. Einstein famously argued with Niels Bohr, saying, “God does not play dice.” Einstein believed that some hidden variable would eventually emerge to explain why one of two identical atoms would decay while the other did not. A century later, Bohr is still winning that argument.
What we mean when we say probability today may seem uncontroversial – until you stake lives on it. Then it gets weird, and definitions become important. Defining probability is a wickedly contentious matter, because wildly conflicting conceptions of probability exist. They can be roughly divided into the objective and subjective interpretations. In the next post I’ll focus on the frequentist interpretation, which is objective, and the subjectivist interpretations as a group. I’ll look at the impact of accepting – or believing in – each of these on the design of things like airliners and space shuttles from the perspectives of Willie, Richard Feynman, and NASA. Then I’ll defend my own views on when and where to hold various beliefs about probability.
An odd myth persists in systems engineering and risk analysis circles. Fault tree analysis (FTA), and sometimes fault trees themselves, are said to be deductive. FMEAs are called inductive. How can this be?
By fault trees I mean Boolean logic modeling of unwanted system states by logical decomposition of equipment fault states into combinations of failure states of more basic components. You can read more on fault tree analysis and its deductive nature at Wikipedia. By FMEA (Failure Mode & Effects Analysis) I mean recording all the things that can go wrong with the components of a system. Writers who find fault trees deductive also find FMEAs, their complement, to be inductive. I’ll argue here that building fault trees is not a deductive process, and that there is possible harm in saying so. Secondarily, I’ll offer that while FMEA creation involves inductive reasoning, the point carries little weight, since the rest of engineering is inductive reasoning too.
Word meanings can vary with context; but use of the term deductive is consistent across math, science, law, and philosophy. Deduction is the process of drawing a logically certain conclusion about a particular instance from a rule or premise about the general. Assuming all men are mortal, if Socrates is a man, then he is mortal. This is true regardless of the meaning of the word mortal. It’s truth is certain, even if Socrates never existed, and even if you take mortal to mean living forever.
Example from a software development website:
FMECA is an inductive analysis of system failure, starting with the presumed failure of a component and analyzing its effect on system stability: “What will happen if valve A sticks open?” In contrast, FTA is a deductive analysis, starting with potential or actual failures and deducing what might have caused them: “What could cause a deadlock in the application?”
The well-intended writer says we deduce the causes of the effects in question. Deduction is not up to that task. When we infer causes from observed effects, we are using induction, not deduction.
How did the odd claims that fault trees and FTAs are deductive arise? It might trace to William Vesely, NASA’s original fault tree proponent. Vesely sometimes used the term deductive in his introductions to fault trees. If he meant that the process of reducing fault trees into cut sets (sets of basic events or initiators) is deductive, he was obviously correct. But calculation isn’t the critical aspect of fault trees; constructing them is where the effort and need for diligence lie. Fault tree software does the math. If Vesely saw the critical process of constructing fault trees and supplying them with numerical data (often arduous, regardless of software) as deductive – which I doubt – he was certainly wrong.
Inductive reasoning, as used in science, logic and philosophy, means inferring general rules or laws from observations of particular instances. The special use of the term math induction actually refers to deduction, as mathematicians are well aware. Math induction is deductive reasoning with a confusing title. Induction in science and engineering stems from our need to predict future events. We form theories about how things will behave in the future based on observations of how similar things behaved in the past. As I discussed regarding Bacon vs. Descartes, science is forced into the realm of induction because deduction never makes contact with the physical world – it lives in the mind.
Inductive reasoning is exactly what goes on when you construct a fault tree. You are making inferences about future conditions based on modeling and historical data – a purely inductive process. The fact that you use math to solve fault trees does not make fault trees any more deductive than the presence of math in lab experiments makes empirical science deductive.
Does this matter?
It’s easy enough to fix this technical point in descriptions fault tree analysis. We should do so, if merely to avoid confusing students. But more importantly, quantitative risk analysis – including FTA – has its enemies. They range from several top consultancies selling subjective, risk-score matrix methodologies dressed up in fancy clothes (see Tony Cox’s SIRA presentation on this topic) to some of NASA’s top management – those flogged by Richard Feynman in his minority report on the Challenger disaster. The various criticisms of fault tree analysis say it is too analytical and correlates poorly with the real world. Sound familiar? It echoes a feud between the heirs of Bacon (induction) and the heirs of Descartes (deduction). Some of fault trees’ foes find them overly deductive. They then imply that errors found in past quantitative analyses impugn objectivity itself, preferring subjective analyses based on expert opinion. This curious conclusion would not follow, even if fault tree analyses were deductive, which they are not.
Science is the belief in the ignorance of experts. – Richard Feynman
On reading my praise of Richard Feynman, a fellow systems engineer and INCOSE member (International Council on Systems Engineering) suggested that I read Feynman’s Minority Report to the Space Shuttle Challenger Enquiry. He said I might not like it. I read it, and I don’t like it, not from the perspective of a systems engineer.
Challenger explosion, Jan. 28, 1986
I should be clear on what I mean by systems engineering. I know of three uses of the term: first, the engineering of embedded systems, i.e., firmware (not relevant here); second, an organizational management approach (relevant, but secondary); third, a discipline aimed at design of assemblies of components to achieve a function that is greater than those of its constituents (bingo). Definitions given by others are useful toward examining Feynman’s minority report on the Challenger.
Simon Ramo, the “R” in TRW and inventor of the ICBM, put it like this: “Systems engineering is a discipline that concentrates on the design and application of the whole (system) as distinct from the parts. It involves looking at a problem in its entirety, taking into account all the facets and all the variables and relating the social to the technical aspect.”
Howard Eisner of GWU says, “Systems engineering is an iterative process of top-down synthesis, development, and operation of a real-world system that satisfies, in a near optimal manner, the full range of requirements for the system.”
INCOSE’s definition is pragmatic (pleasantly, as their guide tends a bit toward strategic-management jargon): “Systems engineering is an interdisciplinary approach and means to enable the realization of successful systems.”
Feynman reaches several sound conclusions about root causes of the flight 51-L Challenger disaster. He observes that NASA’s safety culture had critical flaws and that its management seemed to indulge in fantasy, ignoring the conclusions, advice and warnings of diligent systems and component engineers. He gives specific examples of how NASA management grossly exaggerated the reliability of many systems and components in the shuttle. On this point he concludes, “reality must take precedence over public relations, for nature cannot be fooled.” He describes a belief by management that because an anomaly was without consequence in a previous mission, it is therefore safe. Most importantly, he cites the erroneous use of the concept of factor of safety around the O-ring seals between the two lower segments of the solid rocket motors by NASA management (the Rogers Commission also agrees that failure of these O-rings was the root cause of the disaster). An NASA report on seal erosion in an earlier mission (flight 51-C) had assigned a safety factor of three, based on the seals having eroded only one third of the amount thought to be critical. Feynman replies that the O-rings were not designed to erode, and hence the factor-of-safety concept did not apply. Seal erosion was a failure of the design, catastrophic or not; there was no safety factor at all. “Erosion was a clue that something was wrong; not something from which safety could be inferred.”
But later Feynman incorrectly states that establishing a hypothetical propulsion system failure rate of 1 in 100,000 missions would require an inordinate number of tests to determine with confidence. Here he seems not to grasp both the exponential impact of redundancy on reliability, and that fault tree analysis could confidently calculate low system failure rates based on historical failure rates of large populations of constituent components, combined with the output of FMEAs (failure mode effects analyses) on those components in the relevant systems. This error does not impact Feynman’s conclusions about the root cause of the Challenger disaster. I mention it here because Feynman might be viewed as an authoritative source on systems engineering, but is here doing a poor job of systems engineering.
Discussing the liquid fuel engines, Feynman then introduces the concept of top-down design, which he criticizes. It isn’t clear exactly what he means by top-down. The most charitable reading would be a critique of NASA top management’s overruling the judgments of engineering management and engineers; but, on closer reading, it’s clear this cannot be his meaning:
The usual way that such engines are designed (for military or civilian aircraft) may be called the component system, or bottom-up design. First it is necessary to thoroughly understand the properties and limitations of the materials to be used (for turbine blades, for example), and tests are begun in experimental rigs to determine those. With this knowledge larger component parts (such as bearings) are designed and tested individually…
The Space Shuttle Main Engine was handled in a different manner, top down, we might say. The engine was designed and put together all at once with relatively little detailed preliminary study of the material and components. Then when troubles are found in the bearings, turbine blades, coolant pipes, etc., it is more expensive and difficult to discover the causes and make changes.
All mechanical-system design is necessarily top-down, in the sense of top-down used by Eisner, above. This use of the term is metaphor for progressive functional decomposition from mission requirements down to component requirements. Engineers cannot, for example, size a shuttle’s fuel pumps based on the functional requirement of having five men and two women orbit the earth to deploy a communications satellite. The fuel pump’s performance requirements ultimately emerge from successive derivations of requirements for subsystem design candidates. This design process is top-down, whether the various layers of subsystem design candidates are themselves newly conceived systems or ones that are already mature products (“off the shelf”). Wikipedia’s article and several software methodology sites incorrectly refer to design using off-the-shelf components as bottom-up – not involving functional decomposition. They err by failing to consider that piecing together existing subsystems toward a grander purpose still first requires functional decomposition of that grander purpose into lower-level requirements that serve as a basis for selecting existing subsystems. Simply put, you’ve got to know what you want a thing to do, even if you build that thing from available parts – software or hardware – in order to select those parts. Using off-the-shelf software subsystems still requires functional decomposition of the desired grander system.
F-117 frontal view
Off-the-shelf is a common strategy in aerospace, primarily for cost and schedule reasons. The Lockheed F-117, despite its unique design, used avionics taken from the C-130 and the F-16, brakes from the F-15, landing gear from the T-38, and other parts from commercial and military aircraft. This was for expediency. For the F-117, these off-the-shelf components still had to go through the necessary requirements validation, functional and stress testing, certification, and approval by all of the “ilities” (reliability, maintainability, supportability, durability, etc) required to justify their use in the vehicle – just as if they were newly designed. Likewise for the Challenger, the choice of new design vs. off-the-shelf should have had no impact on safety or reliability if proper systems engineering occurred. Whether its constituents were new designs or off-the-shelf, the shuttle’s propulsion system is necessarily – and desirably – the result of top-down design. Feynman may simply mean that the design and testing phases were rushed, that omissions were made, and that testing was incomplete. Other evidence suggests this; but these omissions are not a negative consequence of top-down design, which is the only sound process for the design of aircraft and other systems of systems.
It is difficult to imagine any sound basis for Feynman’s use of – and defense of – bottom-up design other than the selection of off-the-shelf components, which, as mentioned above, still entails functional decomposition (top-down design). Other uses of the term appear in discussions of software methodologies. I also found a handful of academic papers that incorrectly – incoherently, in my view – equate top-down with analysis and deduction, and bottom-up with synthesis and induction. The erroneous equation of analysis with deductive reasoning pops up in Design Thinking and social science literature (e.g., at socialresearchmethods.net). It fails to realize that analysis as a means of inferring cause from observed result (i.e., what made this happen?) always entails inductive reasoning. Geometry is deduction; science and engineering are inherently inductive.
The use of bottom-up shows up in software circles in a disparaging sense. It describes a state of system growth that happens with no conscious design beyond that of an original seed. It is non-design, in a sense. Such “organic growth” happens in enterprise software when new features, not envisioned during the original design, are later bolted-on. This can stem from naïve mismanagement by those unaware of the damage done to maintainability and further extensibility of the software system, or through necessity in a merger/acquisition scenario where the system’s owners are aware of the consequences but have no other alternatives. This scenario obviously does not apply to the hardware or software of the Challenger; and if it did, such bottom-up “design” would be a defect of the system, not a virtue.
Hydro-mechanical system components in 737 gear bay
Aerospace has in its legacy an attitude – as opposed to a design method – sometimes called a bottom-up mindset. I’ve encountered this as a form of resistance to methodological system-design-for-safety and the application of redundancy. In my experience it came from expert designers of electro-hydro-mechanical subsystems. A legendary aerospace systems designer once told me with a straight face, “I don’t believe in probability.” You can trace this type of thinking back to the rough and ready pioneers of manned flight. Charles Lindbergh, for example, said something along the lines of, “give me one good engine and one good pilot.” Implicit in this mentality is the notion that safety emerges from component quality rather than from system design. The failure rates of the best aerospace components tend to vary from those of average components by factors of two or ten, whereas redundancy has an exponential effect. Feynman’s criticism of top-down and endorsement of bottom-up – whatever he meant by it – could unfortunately be seen as support for this harmful and oddly persistent notion of bottom-up.
Toward the end of Feynman’s report, he reveals another misunderstanding about design of life-critical systems. In the section on avionics, he faults NASA for using 15-year-old software and hardware designs, concluding that the electronics are obsolete. He claims that modern chip sets are more reliable and of higher quality. This criticism runs contrary to his complaint about top-down design of the main engines, and it misses a key point. The improvements in reliability of newer chips would contribute only negligibly toward improved availability of the quad-redundant system containing them. More importantly, older designs of electronic components are often used in avionics precisely because they are old, mature designs. Accelerated-life testing of electronics is known to be tricky business. We use old-design chips because there is enough historical usage data to determine their failure rates without relying on accelerated-life testing. Long ago at McDonnell Douglas I oversaw use of the Intel 87C196 chip for a system on the C-17 aircraft. The Intel rep told me that this was the first use of the Intel 8086-derivative chip in a military aircraft. We defended its use, over the traditional but less capable Motorola chips, on the basis that the then 10+ year history of 8086’s in similar environments was finally sufficient to establish a statistical failure rate usable in our system availability calculations. Interestingly, at that time NASA had already been using 8086 chips in the shuttle for years.
Feynman’s minority report on the Challenger contains misunderstandings and technical errors from the perspective of a systems engineer. While these errors may have little impact on his findings, they should be called out because of the possible influence they may have on future generations of engineers. The tyranny of pedigree, as we saw with Galileo, can extend a wrong idea’s life for generations.
That said, Feynman makes several key points about the psychology of engineering management that deserve much more attention than they get in engineering circles. First among these in my mind is the fallacy of induction from near-misses viewed as successes, thereby producing undue confidence about future missions.
“His legs were weary, but his mind was at ease, free from the presentiment of change. The sense of security more frequently springs from habit than from conviction, and for this reason it often subsists after such a change in the conditions as might have been expected to suggest alarm. The lapse of time during which a given event has not happened is, in the logic of habit, constantly alleged as a reason why the event should never happen, even when the lapse of time is precisely the added condition which makes the event imminent. A man will tell you that he has worked in a mine for forty years unhurt by an accident, as a reason why he should apprehend no danger, though the roof is beginning to sink; and it is often observable that the older a man gets, the more difficult it is to retain a believing conception of his own death.”
– from Silas Marner, by George Eliot (Mary Ann Evans Cross), 1861
Text and aircraft photos copyright 2013 by William Storage. NASA shuttle photos public domain.
In two previous posts I looked at the established definition of wicked problem and tested whether a rough statement of the clean energy problem met the 10 (adjusted to 11 by me) points of that definition. I found that clean energy met about half the requirements to qualify as wicked. Next I want to look at whether characterizing the problem of clean energy as wicked is productive.
Outside the usual hyperbole of climate journalism, there are a number of serious, credible authors who use the term. The Hartwell Paper (London School of Economics, 2010), referenced in yesterday’s post, features it rather centrally. Its authors sought a means of putting climate policy on track after failure of the Copenhagen climate conference. They made some excellent points and recommendations, noting that climate policy and energy policy are not the same thing. They suggested that reframing the climate issue around matters of human dignity will likely be more effective than framing it around human sin and atonement. They also asserted that the UNFCCC/Kyoto model was doomed to failure from the start because it approached climate change as a tame problem when in fact it is a wicked one. I believe The Hartwell Paper errs considerably in concluding that mischaracterization of a wicked problem as a tame one was the main reason for failure of Kyoto. Doing so implies much too sharp a distinction between tame and wicked and overstates the value of that distinction in determining how to attack a problem. Kyoto’s failure can be understood by simple economics; some parties saw insufficient benefit for the cost.
The Hartwell Paper says that presence of open, complex and/or nonlinear systems make a problem wicked. Hartwell does not address nonlinearity by name, though one of its authors, Gwyn Prins, does in related discussions. Though I agree with most of the conclusions reached by Hartwell and separately by Prins, I think Prins’ work might benefit from a better understanding of systems engineering and design and less reliance on the notion of wickedness. To clarify, my only quibble with Prins is terminology, not intent or conclusion. The terminology wouldn’t matter except that it becomes fuel for trumpery and creates an air of unsolvability.
For example, Prins contrasts the wicked problems of climate and energy with the tame problem of aircraft carrier design (The Wicked Problem of Climate Change on YouTube). He offers that in the case of an aircraft carrier, after a certain amount of study into metallurgy and propulsion systems, you can know that it’s time to quit studying and start building, but the lack of definitive formulation of the climate problem prevents us from identifying a similar point in the problem solving sequence for climate.
But this comparison – fix climate change versus build aircraft carrier – is inaccurate. The goal in the case of an aircraft carrier is not an armored boat with 40 fighter jets on it. The carrier is a system, itself a component within a larger weapons system having the objective of national defense. National defense might further be elaborated something like the capacity to defend the US and allies against various military threats, to operate efficiently with minimum risk to its occupants while being reliable, maintainable and fuel-efficient.
In other words, a better comparison would be national defense versus climate change. These problems probably have similar wickedness. If national defense were a tame problem, we could, with a finite amount of analysis and calculation, derive the horsepower requirements of an aircraft carrier’s nuclear-driven turbines and the BTU requirements for its cooling system, through some complex but finite analytical process, from the requirement for national security. But translating peace-keeping and defense-readiness into horsepower first requires making a bunch of subjective and qualitative decisions using an arbitrarily large number of very human judgments. These judgments have no stopping rule; the design has an infinite number of potential solutions, and is close to a one-shot solution that is prone to unintended consequences (case in point, the French carrier Charles de Gaulle). Once implemented, products like the aircraft carrier have no ultimate test of efficacy. Weapons system design – and almost all engineering design problems – are wicked problems using Rittel’s criteria. So how useful is the characterization of wickedness?
One potential value of calling a problem wicked is to convince management and government that study is needed before quantitative requirements can be set, but I think that point is now firmly established. Many engineers would see this as the usual need for requirements analysis, which has always been a subjective and social process involving operations analysis, identification of stakeholders, ethnography, focus groups, scenario and persona modeling, interviews with subject matter experts, consensus tools, fall-back methods, and possibly a dictator or tie breaker.
Steve Rayner of Oxford is another fan of wicked problems. He’s done great work in bringing rationality and pragmatism to climate policy, but his application of wickedness (e.g., Wicked Problems: Clumsy Solutions) can easily be read (erroneously) as an admission of insolvability. If the category wicked once had value, it now seems a liability – an immobilizing one at that. We have work to do; roll up your sleeves.
Rittel and Webber concluded their paper with no advice on how to deal with wickedness; but they imply early on a need for the social professions to advance beyond the view that “instruments of perfectability can be perfected.” I take that to mean they see limits to the utility of science and flaws in viewing organizations, governments and societies as mechanisms. I agree; the mid 20th century was rife with such flawed thinking. However, governments, managers and product design teams have always had to deal with deciding what to tell the engineers to build. If this is the reason climate and energy writers find their topic to be wicked, the term is useless.
A related problem revealed by press covering climate and energy wickedness is that many journalists confuse the difficulty of reaching consensus with the difficulty of making calculations. An open system in physics is merely a means of modeling a physical process; we model problems as open or closed as a convenience for analysis. Social scientists use open system to discuss adaptive agents, co-evolution and social or political interactions. They’re both good definitions in the their contexts, but confusing them leads to the bad conclusion that physical open systems are unanalyzable by the tools of science. The same applies for the term, nonlinearity. In engineering, it means a second- or higher-order system – standard engineering stuff. In new age literature, it sometimes (at its worst) implies a style of thinking that refutes logic and rationality. We can’t blame equivocation of the terms open system and nonlinearity on the use of the term wicked problem, but we can recognize that choice of language has a dramatic effect on popular uptake of science (see post Toward a New Misunderstanding of Science).
Assigning wickedness to the problems of climate/energy or national defense adds little value toward dealing with them. Nor does calling them super-wicked as do Levin et al in “Playing it Forward: Path Dependency, Progressive Incrementalism, and the ‘Super Wicked’ Problem of Global Climate Change,” which does, thankfully, take pains to avoid a lost-cause position. But wicked and super-wicked do have the power to bewilder and demoralize because of our inability to divorce wicked from its more traditional context. Characterizing the problem as wicked is a self-fulfilling prophecy; it convinces that if some of the questions are unanswerable then no action can be taken. We don’t have to know how the global climate works in order to know how to avoid interfering with it any more than we currently do. We know that China is booming and will accept no external constraints that hamper its economic growth. But we also know that China’s air pollution kills half a million people a year, that the US is good at inventing things, and that China is good at manufacturing them. We also know how to calculate the extent to which solar and wind can contribute to US and global clean energy. We know that governments can stimulate demand as well as supply. That’s something to work with, despite the lack of consensus or transcendent authority.
Further, we can know that solar-powered cell phone chargers, biodegradable phones, eco-beer, and gloves heated with USB-power are truly wicked, in the old-fashioned sense of the word. They’re wicked because of the point made by Rittel, Webber and Churchman in their original papers on wicked problems. Taming a small part of a wicked problem is morally wrong, as is outright faking it – surely the case with much of the greenwash. But even where there’s no fraud, minor taming with major fanfare is still reprehensible. It creates an illusion of progress and distracts us from the task at hand.
Next I want to look at whether our major clean energy efforts – wind and solar power, biomass, hybrid cars and the like – are wicked and morally wrong for these same reasons.
The price of metaphor is eternal vigilance – Arturo Rosenblueth and Norbert Wiener