Bill Storage

This user hasn't shared any biographical information

Great Philosophers Damned to Hell

April 1 2015.

My neighbor asked me if I thought anything new ever happened in philosophy, or whether, 2500 years after Socrates, all that could be worked out in philosophy had been wrapped up and shipped. Alfred Whitehead came to mind, who wrote in Process and Reality that the entire European philosophical tradition was merely footnotes to Plato. I don’t know what Whitehead meant by this, or for that matter, by the majority of his metaphysical ramblings. I’m no expert, but for my money most of what’s great in philosophy has happened in the last few centuries – including some real gems in the last few decades.

For me, ancient, eastern, and medieval philosophy is merely a preface to Hume. OK, a few of his predecessors deserve a nod – Peter Abelard, Adelard of Bath, and Francis Bacon. But really, David Hume was the first human honest enough to admit that we can’t really know much about anything worth knowing and that our actions are born of custom, not reason. Hume threw a wrench into the works of causation and induction and stopped them cold. Hume could write clearly and concisely. Try his Treatise some time.

Immanuel Kant, in an attempt to reconcile empiricism with rationalism, fought to rescue us from Hume’s skepticism and failed miserably. Kant, often a tad difficult to grasp (“transcendental idealism” actually can make sense once you get his vocabulary), succeeded in opposing every one of his own positions while paving the way for the great steaming heap of German philosophy that reeks to this day.

The core of that heap is, of course, the domain of GWF Hegel, which the more economical Schopenhauer called “pseudo-philosophy paralyzing all mental powers, stifling all real thinking.”

Don’t take my word (or Schopenhauer‘s) for it. Read Karl Marx’s Critique of Hegel’s Philosophy. On second thought, don’t. Just read Imre Lakatos’s critique of Marx’s critique of Hegel. Better yet, read Paul Feyerabend’s critique of Lakatos’s critique of Marx’s critique. Of Hegel. Now you’re getting the spirit of philosophy. For every philosopher there is an equal and opposite philosopher. For Kant, they were the same person. For Hegel, the opposite and its referent are both all substance and non-being. Or something like that.

Hegel set out to “make philosophy speak German” and succeeded in making German speak gibberish. Through great effort and remapping your vocabulary you can eventually understand Hegel, at which point you realize what an existential waste that effort has been. But not all of what Hegel wrote was gibberish; some of it was facile politics.

Hegel writes – in the most charitable of translations – that reason “is Substance, as well as Infinite Power; its own Infinite Material underlying all the natural and spiritual life which it originates, as also the Infinite Form, – that which sets this Material in motion”

I side with the logical positivists, who, despite ultimately crashing into Karl Popper’s brick wall, had the noble cause of making philosophy work like science. The positivists, as seen in writings by AJ Ayer and Hans Reichenbach, thought the words of Hegel simply did no intellectual work. Rudolf Carnap relentlessly mocked Heidegger’s “the nothing itself nothings.” It sounds better in the Nazi philosopher’s own German: “Das Nichts nichtet,” and reveals that Reichenbach could have been more sympathetic in his translation by using nihilates instead of nothings.  The removal of a sentence from its context was unfair, as you can plainly see when it is returned to its native habitat:

In anxiety occurs a shrinking back before … which is surely not any sort of flight but rather a kind of bewildered calm. This “back before” takes its departure from the nothing. The nothing itself does not attract; it is essentially repelling. But this repulsion is itself as such a parting gesture toward beings that are submerging as a whole. This wholly repelling gesture toward beings that are in retreat as a whole, which is the action of the nothing that oppresses Dasein in anxiety, is the essence of the nothing: nihilation. It is neither an annihilation of beings nor does it spring from a negation. Nihilation will not submit to calculation in terms of annihilation and negation. The nothing itself nihilates.

Heidegger goes on like that for 150 pages.

The positivists found fault with philosophers who argued from their armchairs that Einstein could not have been right. Yes, they really did this; and not all of them opposed Einstein’s science just because it was Jewish. The philosophy of the positivists had some real intellectual heft, despite being wrong, more or less. They were consumed not only by causality and determinism, but by the quest for demarcation – the fine line between science and nonsense. They failed. Popper burst their bubble by pointing out that scientific theory selection relied more on absence of disconfirming evidence than on the presence of confirming evidence. Positivism fell victim mainly to its own honest efforts. The insider Willard Van Orman Quine (like Popper), put a nail in positivism’s coffin by showing the distinction between analytic and synthetic statements to be false. Hillary Putnam, killing the now-dead horse, then showed the distinction between “observational” and “theoretical” to be meaningless. Finally, in 1960, Thomas Kuhn showed up in Berkeley with the bomb that the truth conditions for science do not stand independent of their paradigms. I think often and write occasionally on the highly misappropriated Kuhn. He was wrong in all his details and overall one of the rightest men who ever lived.

Before leaving logical positivism, I must mention another hero from its ranks, Carl Hempel. Hempel is best known, at least in scientific circles, for his wonderful illustration of Hume’s problem of induction known as the Raven Paradox.

But I digress. I mainly intended to say that philosophy for me really starts with Hume and some of his contemporaries, like Adam Smith, William Blackstone, Voltaire, Diderot, Moses Mendelssohn, d’Alembert, and Montesquieu.

And to say that 20th century philosophers have still been busy, and have broken new ground. As favorites I’ll cite Quine, Kuhn and Hempel, mentioned above, along with Ludwig Wittgenstein, Richard Rorty (late works in particular), Hannah Arendt, John Rawls (read about, don’t read – great thinker, tedious writer), Michel Foucault (despite his Hegelian tendencies), Charles Peirce, William James (writes better than his brother), Paul Feyerabend, 7th Circuit Judge Richard Posner, and the distinguished Simon Blackburn, with whom I’ll finish.

One of Thomas Kuhn’s more controversial concepts is that of incommensurability. He maintained that cross-paradigm argument is futile because members of opposing paradigms do not share a sufficiently common language in which to argue. At best, they lob their words across each other’s bows. This brings to mind a story told by Simon Blackburn at a talk I attended a few years back. It recalls Theodoras and Protagoras against Socrates on truth being absolute vs. relative – if you’re into that sort of thing. If not, it’s still good.

Blackburn said that Lord Jeremy Waldron was attending a think tank session on ethics at Princeton, out of obligation, not fondness for such sessions. As Blackburn recounted Waldron’s experience, Waldron sat on a forum in which representatives of the great religions gave presentations.

First the Buddhist talked of the corruption of life by desire, the eight-fold way, and the path of enlightenment, to which all the panelists said  “Wow, terrific. If that works for you that’s great” and things of the like.

Then the Hindu holy man talked of the cycles of suffering and birth and rebirth, the teachings of Krishna and the way to release. And the panelists praised his conviction, applauded and cried ‘Wow, terrific – if it works for you that’s fabulous” and so on.

A Catholic priest then came to the podium, detailing  the message of Christ, the promise of salvation, and the path to eternal life. The panel cheered at his great passion, applauded and cried, ‘Wow, terrific, if that works for you, great”.

And the priest pounded his fist on the podium and shouted, ‘No! Not a question of whether it works for me! This is the true word of the living God; and if you don’t believe it you’re all damned to Hell!”

The panel cheered and gave a standing ovation, saying: “Wow! Terrific! If that works for you that’s great”!

,

5 Comments

Pure Green Sense

With some sadness I recently received a Notice of Assignment for the Benefit of Creditors signaling the demise of PureSense Environmental, Inc. PureSense was real green – not green paint.

It’s ironic that PureSense was so little known. Environmental charlatans and quacks continue to get venture capital and government grants for businesses built around absurd “green” products debunkable by anyone with knowledge of high school physics. PureSense was nothing like that. Their down-to-earth (literally) concept provides real-time irrigation and agricultural field management with inexpensive hardware and sophisticated software. Their matrix of sensors record soil moisture, salinity, soil temperature and climate data from crop fields every 15 minutes. Doing this eliminates guesswork, optimizing use of electricity, water, and pesticides. Avoiding over- and under-watering maximizes crop yield while minimizing use of resources. It’s a win-win.

But innovation and farming are strange bedfellows. Apparently, farmers didn’t all jump at the opportunity. I did some crop disease modelling work for PureSense a few years back. Their employees told me that a common response to showing farmers that their neighbors had substantially increased yield using PureSense was along the lines of, “we’re doing ok with what we’ve got…” Perhaps we shouldn’t be surprised. Not too long ago, farmers who experimented too wildly left no progeny.

The ever fascinating Jethro Tull, inventor of the modern seed drill and many other revolutionary farming gadgets in the early 1700s, was flabbergasted at the reluctance of farmers to adopt his tools and methods. Tull wrote on Soil and Civilization, predicting that future people would have easier lives, since “the Produce of Land Will be Increased, and the Usual Expence Lessened” through a scientific (though that word is an anachronism) approach to agriculture.

The editor of his 2nd edition of his Horse-hoeing Husbandry, Or, An Essay on the Principles of Vegetation and Tillage echoed Tull’s astonishment at farmers’ behavior.

 How it has happened that a Method of Culture which proposes such advantages to those who shall duly prosecute it, hath been so long neglected in this Country, may be matter of Surprize to such as are not acquainted with the Characters of the Men on whom the Practice thereof depends; but to those who know them thoroughly it can be done. For it is certain that very few of them can be prevailed on to alter their usual Methods upon any consideration; though they are convinced that their continuing therein disables them from paying their Rents, and maintaining their Families.

 And, what is still more to be lamented, these People are so much attached to their old Customs, that they are not only averse to alter them themselves, but are moreover industrious to prevent others from succeeding, who attempt to introduce anything new; and indeed have it too generally in their Power, to defeat any Scheme which is not agreeable to their own Notions; seeing it must be executed by the same sort of Hands.

Tull could have predicted PureSense’s demise. I think its employees could have as well. GlassDoor comments suggested that PureSense needed “a more devoted sales staff.” That is likely an understatement given the market. A more creative sales model might be more on the mark. Knowing that farmers, even while wincing at ever-shrinking margins, will cling to their established methods for better or worse, PureSense should perhaps have gotten closer to the culture of farming.

PureSense’s possible failure to tap into farmers’ psyche aside, America’s vulnerability to futuristic technobabble is no doubt a major funding hurdle. You’d think that USDA REAP loan providers  and NRCS Conservation Innovation Grants programs would be lining up at their door. But I suspect crop efficiency pales in wow factor compared to a cylindrical tower of solar cells that somehow magically increases the area of sun-facing photovoltaics (hint: Solyndra’s actual efficiency was about 8.5%, a far cry from their claims that got them half a billion from the Obama administration).

Ozzie Zehner nailed this problem in Green Illusions. In his chapter on the alternative-energy fetish, he discusses energy pornographers, the enviro-techno-enthusiasts who jump to spend billions on dubious green tech that yields less benefit than home insulation and proper tire inflation would. Insulation, light rail, and LED lighting isn’t sexy; biofuels, advanced solar, and stratospheric wind turbines are. Jethro Tull would not have been surprised that modern farmers are as resistant to change as those of 17th century Berkshire. But I think he’d be appalled to learn the extent to which modern tech press, business and government line up for physics-defying snake oil while ignoring something as fundamental as agriculture.

As I finished writing this I learned that Jain Irrigation has just acquired the assets of PureSense and has pledged a long-term commitment to the PureSense platform.

Jethro Tull smiles.

Leave a comment

More Philosophy for Engineers

In a post on Richard Feynman and philosophy of science, I suggested that engineers would benefit from a class in philosophy of science. A student recently asked if I meant to say that a course in philosophy would make engineers better at engineering – or better philosophers. Better engineers, I said.

Here’s an example from my recent work as an engineer  that drives the point home.

I was reviewing an FMEA (Failure Mode Effects Analysis) prepared by a high-priced consultancy and encountered many cases where a critical failure mode had been deemed highly improbable on the basis that the FMEA was for a mature system with no known failures.

How many hours of operation has this system actually seen, I asked. The response indicated about 10 thousand hours total.

I said on that basis we could assume a failure rate of about one per 10,001 hours. The direct cost of the failure was about $1.5 million. Thus the “expected value” (or “mathematical expectation” – the probabilistic cost of the loss) of this failure mode in a 160 hour mission is $24,000 or about $300,000 per year (excluding any secondary effects such as damaged reputation). With that number in mind, I asked the client if they wanted to consider further mitigation by adding monitoring circuitry.

I was challenged on the failure rate I used. It was, after all, a mature, ten year old system with no recorded failures of this type.

Here’s where the analytic philosophy course those consultants never took would have been useful.

You simply cannot justify calling a failure mode extremely rare based on evidence that it is at least somewhat rare. All unique events – like the massive rotor failure that took out all three hydraulic systems of a DC-10 in Sioux City – were very rare before they happened.

The authors of the FMEA I was reviewing were using unjustifiable inductive reasoning. Philosopher David Hume debugged this thoroughly in his 1738 A Treatise of Human Nature.

Hume concluded that there simply is no rational or deductive basis for  induction, the belief that the future will be like the past.

Hume understood that, despite the lack of justification for induction, betting against the sun rising tomorrow was not a good strategy either. But this is a matter of pragmatism, not of rationality. A bet against the sunrise would mean getting behind counter-induction; and there’s no rational justification for that either.

In the case of the failure mode not yet observed, however, there is ample justification for counter-induction. All mechanical parts and all human operations necessarily have nonzero failure or error rates. In the world of failure modeling, the knowledge “known pretty good” does not support the proposition “probably extremely good”, no matter how natural the step between them feels.

Hume’s problem of induction, despite the efforts of Immanuel Kant and the McKinsey consulting firm, has not been solved.

A fabulously entertaining – in my view – expression of the problem of induction was given by philosopher Carl Hempel in 1965.

Hempel observed that we tend to take each new observation of a black crow as incrementally supporting the inductive conclusion that all crows are black. Deductive logic tells us that if a conditional statement is true, its contrapositive is also true, since the statement and its contrapositive are logically equivalent. Thus if all crows are black then all non-black things are non-crow.

It then follows that if each observation of black crows is evidence that all crows are black (compare: each observation of no failure is evidence that no failure will occur), then each observation of a non-black non-crow is also evidence that all crows are black.

Following this line, my red shirt is confirming evidence for the proposition that all crows are black. It’s a hard argument to oppose, but it simply does not “feel” right to most people.

Many try to salvage the situation by suggesting that observing that my shirt is red is in fact evidence that all crows are black, but provides only unimaginably small support to that proposition.

But pushing the thing just a bit further destroys even this attempt at rescuing induction from the clutches of analysis.

If my red shirt gives a tiny bit of evidence that all crows are black, it then also gives equal support to the proposition that all crows are white. After all, my red shirt is a non-white non-crow.

,

10 Comments

The Onagawa Reactor Non-Meltdown

On March 11, 2011, the strongest earthquake in Japanese recorded history hit Tohuku, leaving about 15,000 dead. The closest nuclear reactor to the quake’s epicenter was the Onagawa Nuclear Power Station operated by Tohoku Electric Power Company. As a result of the earthquake and subsequent tsunami that destroyed the town of Onagawa, the Onagawa nuclear facility remained intact and shut itself down safely, without incident. The Onagawa nuclear facility was the vicinity’s only safe evacuation destination. Residents of Onagawa left homeless by the natural disasters sought refuge in the facility, where its workers provided food.

The more famous Fukushima nuclear facility was about twice as far from the earthquake’s epicenter. The tsunami at Fukushima was slightly less severe. Fukushimia experienced three core meltdowns, resulting in evacuation of 300,000 people. The findings of the Fukushima Nuclear Accident Independent Investigation Commission have been widely published. They conclude that Fukushima failed to meet the most basic safety requirements, had conducted no valid probabilistic risk assessment, had no provisions for containing damage, and that its regulators operated in a network of corruption, collusion, and nepotism. Kiyoshi Kurokawa, Chairman of the commission stated:

THE EARTHQUAKE AND TSUNAMI of March 11, 2011 were natural disasters of a magnitude that shocked the entire world. Although triggered by these cataclysmic events, the subsequent accident at the Fukushima Daiichi Nuclear Power Plant cannot be regarded as a natural disaster. It was a profoundly manmade disaster – that could and should have been foreseen and prevented.

Only by grasping [the mindset of Japanese bureaucracy] can one understand how Japan’s nuclear industry managed to avoid absorbing the critical lessons learned from Three Mile Island and Chernobyl. It was this mindset that led to the disaster at the Fukushima Daiichi Nuclear Plant.

The consequences of negligence at Fukushima stand out as catastrophic, but the mindset that supported it can be found across Japan.

Despite these findings, the world’s response to Fukushima has been much more focused on opposition to nuclear power than on opposition to corrupt regulatory government bodies and the cultures that foster them.

Two scholars from USC, Airi Ryu and Najmedin Meshkati, recently published “Why You Haven’t Heard About Onagawa Nuclear Power Station after the Earthquake and Tsunami of March 11, 2011,their examination of the contrasting safety mindsets of TEPCO, the firm operating the Fukushima nuclear plant, and Tohoku Electric Power, the firm operating Onagawa.

Ryu and Meshkati reported vast differences in personal accountability, leadership values, work environments, and approaches to decision-making. Interestingly, they found even Tohuko Electric to be weak in setting up an environment where concerns could be raised and where an attitude of questioning authority was encouraged. Nevertheless, TEPCO was far inferior to Tohoku Electric in all other safety culture traits.

Their report is worth a read for anyone interested in the value of creating a culture of risk management and the need for regulatory bodies to develop non-adversarial relationships with the industries they oversee, something I discussed in a recent post on risk management.

2 Comments

Incommensurability and the Design-Engineering Gap

Those who conceptualize products – particularly software – often have the unpleasant task of explaining their conceptual gems to unimaginative, sanctimonious engineers entrenched in the analytic mire of in-the-box thinking. This communication directs the engineers to do some plumbing and flip a few switches that get the concept to its intended audience or market… Or, at least, this is how many engineers think they are viewed by designers.
gap

Truth is, engineers and creative designers really don’t speak the same language. This is more than just a joke. Many posts here involve philosopher of science, Thomas Kuhn. Kuhn’s idea of incommensurability between scientific paradigms also fits the design-engineering gap well. Those who claim the label, designers, believe design to be a highly creative, open-ended process with no right answer. Many engineers, conversely, understand design – at least within their discipline – to mean a systematic selection of components progressively integrated into an overall system, guided by business constraints and the laws of nature and reason. Disagreement on the meaning of design is just the start of the conflict.

Kuhn concluded that the lexicon of a discipline constrains the problem space and conceptual universe of that discipline. I.e., there is no fundamental theory of meaning that applies across paradigms. The meaning of expressions inside a paradigm comply only with the rules of that paradigm.  Says Kuhn, “Conceptually, the world is our representation of our niche, the residence of the particular human community with whose members we are currently interacting” (The Road Since Structure, 1993, p. 103). Kuhn was criticized for exaggerating the extent to which a community’s vocabulary and word usage constrains the thoughts they are able to think. Kuhn saw this condition as self-perpetuating, since the discipline’s constrained thoughts then eliminate any need for expansion of its lexicon. Kuhn may have overplayed his hand on incommensurability, but you wouldn’t know it from some software-project kickoff meetings I’ve attended.

This short sketch, The Expert, written and directed by Lauris Beinerts, portrays design-engineering incommensurability from the perspective of the sole engineer in a preliminary design meeting.

See also: Debbie Downer Doesn’t Do Design

, ,

Leave a comment

Arianna Huffington, Wisdom, and Stoicism 1.0

Arianna HuffingtonArianna Huffington spoke at The Commonwealth Club in San Francisco last week. Interviewed by Facebook CEO Sheryl Sandberg, Huffington spoke mainly on topics in her recently published Thrive: The Third Metric to Redefining Success and Creating a Life of Well-Being, Wisdom, and Wonder. 2500 attendees packed Davies Symphony Hall. Several of us were men. 

Huffington began with the story of her wake-up call to the idea that success is killing us. She told of collapsing from exhaustion, hitting the corner of her desk on the way down, gashing her forehead and breaking her cheek bone.

She later realized that “by any sane definition of success, if you are lying in a pool of blood on the floor of your office you’re not a success.”

After this epiphany Huffington began an inquiry into the meaning of success. The first big change was realizing that she needed much more sleep. She joked that she now advises women to sleep their way to the top. Sleep is a wonder drug.

Her reexamination of success also included personal values. She referred to ancient philosophers who asked what is a good life. She explicitly identified her current doctrine with that of the Stoics (not to be confused with modern use of the term stoic). “Put joy back in our everyday lives,” she says. She finds that we have shrunken the definition of success down to money and power, and now we need to expand it again. Each of us needs to define success by our own criteria, hence the name of her latest book. The third metric in her book’s title includes focus on well-being, wisdom, wonder, and giving.

Refreshingly (for me at least) Huffington drew repeatedly on ancient western philosophy, mostly that of the Stoics. In keeping with the Stoic style, her pearls often seem self-evident only after the fact:

“The essence of what we are is greater than whatever we are in the world.” 

Take risk. See failure as part of the journey, not the opposite of success. (paraphrased) 

I do not try to dance better than anyone else. I only try to dance better than myself. 

“We may not be able to witness our own eulogy, but we’re actually writing it all the time, every day.” 

“It’s not ‘What do I want to do?’, it’s ‘What kind of life do I want to have?” 

“Being connected in a shallow way to the entire world can prevent us from being deeply connected to those closest to us, including ourselves.” 

“‘My life has been full of terrible misfortunes, most of which never happened.'” (citing Montaigne)

Marcus AureliusAs you’d expect, Huffington and Sandberg suggested that male-dominated corporate culture betrays a dearth of several of the qualities embodied in Huffington’s third metric. Huffington said the most popular book among CEOs is the Chinese military treatise, The Art of War. She said CEOs might do better to read children’s books like Silverstein’s The Giving Tree or maybe Make Way for Ducklings. Fair enough; there are no female Bernie Madoffs.

I was pleasantly surprised by Huffington. I found her earlier environmental pronouncements to be poorly conceived. But in this talk on success, wisdom, and values, she shone. Huffington plays the part of a Stoic well, though some of the audience seemed to judge her more of a sophist. One attendee asked her if she really believed that living the life she identified in Thrive could have possibly led to her current success. Huffington replied yes, of course, adding that she, like Bill Clinton, found they’d made all their biggest mistakes while tired.

Huffington’s quotes above align well with the ancients. Consider these from Marcus Aurelius, one of the last of the great Stoics:

Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth. 

Very little is needed to make a happy life; it is all within yourself, in your way of thinking. 

Confine yourself to the present.

 Be content to seem what you really are. 

The object of life is not to be on the side of the majority, but to escape finding oneself in the ranks of the insane.

I particularly enjoyed Huffington’s association of sense-of-now, inner calm, and wisdom with Stoicism, rather than, as is common in Silicon Valley, with a misinformed and fetishized understanding of Buddhism. Further, her fare was free of the intellectualization of mysticism that’s starting to plague Wisdom 2.0. It was a great performance.

 

————————

.

 

Preach not to others what they should eat, but eat as becomes you, and be silent. – Epictetus

,

8 Comments

Multiple-Criteria Decision Analysis in the Engineering and Procurement of Systems

The use of weighted-sum value matrices is a core component of many system-procurement and organizational decisions including risk assessments. In recent years the USAF has eliminated weighted-sum evaluations from most procurement decisions. They’ve done this on the basis that system requirements should set accurate performance levels that, once met, reduce procurement decisions to simple competition on price. This probably oversimplifies things. For example, the acquisition cost for an aircraft system might be easy to establish. But life cycle cost of systems that includes wear-out or limited-fatigue-life components requires forecasting and engineering judgments. In other areas of systems engineering, such as trade studies, maintenance planning, spares allocation, and especially risk analysis, multi-attribute or multi-criterion decisions are common.

Weighted-sum criterion matrices (and their relatives, e.g., weighted-product, AHP, etc.) are often criticized in engineering decision analysis for some valid reasons. These include non-independence of criteria, difficulties in normalizing and converting measurements and expert opinions into scores, and logical/philosophical concerns about decomposing subjective decisions into constituents.

Years ago, a team of systems engineers and I, while working through the issues of using weighted-sum matrices to select subcontractors for aircraft systems, experimented with comparing the problems we encountered in vendor selection to the unrelated multi-attribute decision process of mate selection. We met the same issues in attempting to create criteria, weight those criteria, and establish criteria scores in both decision processes, despite the fact that one process seems highly technical, the other one completely non-technical. This exercise emphasized the degree to which aircraft system vendor selection involves subjective decisions. It also revealed that despite the weaknesses of using weighted sums to make decisions, the process of identifying, weighting, and scoring the criteria for a decision greatly enhanced the engineers’ ability to give an expert opinion. But this final expert opinion was often at odds with that derived from weighted-sum scoring, even after attempts to adjust the weightings of the criteria.

Weighted-sum and related numerical approaches to decision-making interest me because I encounter them in my work with clients. They are central to most risk-analysis methodologies, and, therefore, central to risk management. The topic is inherently multidisciplinary, since it entails engineering, psychology, economics, and, in cases where weighted sums derive from multiple participants, social psychology.

This post is an introduction-after-the-fact, to my previous post, How to Pick a Spouse. I’m writing this brief prequel to address the fact that blog excerpting tools tend to use only the first few lines of a post, and on that basis, my post appeared to be on mate selection rather than decision analysis, it’s main point.

If you’re interested in multi-attribute decision-making in the engineering of systems, please continue now to How to Pick a Spouse.

.
F-16

.

————-

Katz’s Law: Humans will act rationally when all other possibilities have been exhausted.

,

Leave a comment

How to Pick a Spouse

Bekhap’s Law asserts that brains times beauty equals a constant. Can this be true? Are intellect and beauty quantifiable? Is beauty a property of the subject of investigation, or a quality of the mind of the beholder? Are any other relevant variables (attributes) intimately tied to brains or beauty? Assuming brains and beauty both are desirable, Backhap’s Law implies an optimization exercise – picking a point on the reciprocal function representing the best compromise between brains and beauty. Presumably, this point differs for all evaluators. It raises questions about the marginal utility of brains and beauty. Is it possible that too much brain or too much beauty could be a liability? (Engineers would call this an edge-case check of Beckhap’s validity.) Is Beckhap’s Law of any use without a cost axis? Other axes? In practice, if taken seriously, Backhap’s Law might be merely one constraint in a multi-attribute decision process for selecting a spouse. It also sheds light on the problems of Air Force procurement of the components of a weapons system and a lot of other decisions. I’ll explain why.

C-17 aircraft photo

I’ll start with an overview of how the Air Force oversees contract awards for aircraft subsystems – at least how it worked through most of USAF history, before recent changes in procurement methods.  Historically, after awarding a contract to an aircraft maker, the aircraft maker’s engineers wrote specs for its systems. Vendors bid on the systems by creating designs described in proposals submitted for competition. The engineers who wrote the specs also created a list of a few dozen criteria, with weightings for each, on which they graded the vendors’ proposals. The USAF approved this criteria list and their weightings before vendors submitted their proposals to ensure the fairness deserved by taxpayers. Pricing and life-cycle cost were similarly scored by the aircraft maker. The bidder with the best total score got the contract.

A while back I headed a team of four engineers, all single men, designing and spec’ing out systems for a military jet. It took most of a year to write these specs. Six months later we received proposals hundreds of pages long. We graded the proposals according to our pre-determined list of criteria. After computing the weighted sums (sums of score times weight for each criteria) I asked the engineers if the results agreed with their subjective judgments. That is, did the scores agree with the subjective judgment of best bidder made by these engineers independent of the scoring process. Only about half of them were. I asked the team why they thought the score results differed from their subjective judgments.

They proposed several theories. A systems engineer, viewing the system from the perspective of its interactions and interfaces with the entire aircraft may not be familiar with all the internal details of the system while writings specs. You learn a lot of these details by reading the vendors’ proposals. So you’re better suited to create the criteria list after reading proposals. But the criteria and their weightings are fixed at that point because of the fairness concern. Anonymized proposals might preserve fairness and allow better criteria lists, one engineer offered.

But there was more to the disconnect between their subjective judgments of “best candidate” and the computed results. Someone immediately cited the problem of normalization. Converting weight in pounds, for example, to a dimensionless score (e.g., a grade of 0 to 100) was problematic. If minimum product weight is the goal, how you do you convert three vendors’ product weights into grades on the 100 scale. Giving the lowest weight 100 points and subtracting the percentage weight delta of the others feels arbitrary – because it is. Doing so compresses the scores excessively – making you want to assign a higher weighting to product-weight to compensate for the clustering of the product-weight scores. Since you’re not allowed to do that, you invent some other ad hoc means of increasing the difference between scores. In other words, you work around the weighted-sum concept to try to comply with the spirit of the rules without actually breaking the rules. But you still end up with a method in which you’re not terribly confident.

A bright young engineer named Hui then hit on a major problem of the weighted-sum scoring approach. He offered that the criteria in our lists were not truly independent; they interacted with each other. Further, he noted, it would be impossible to create a list of criteria that were truly independent. Nature, physics and engineering design just don’t work like that. On that thought, another engineer said that even if the criteria represented truly independent attributes of the vendors’ proposed systems, they might not be independent in a mental model of quality judgment. For example, there may be a logical quality composed of a nonlinear relationship between reliability, spares cost, support equipment, and maintainability. Engineering meets philosophy.

We spent lunch critiquing and philosophizing about multi-attribute decision-making. Where else is this relevant, I asked. Hui said, “Hmmm, everywhere?” “Dating!” said Eric. “Dating, or marriage?”, I asked. They agreed that while their immediate dating interests might suggest otherwise, all four were in fact interested in finding a spouse at some point. I suggested we test multi-attribute decision matrices on this particular decision. They accepted the challenge. Each agreed to make a list of past and potential future candidates to wed, without regard for the likelihood of any mutual interest the candidate might have. Each also would independently prepare a list of criteria on which they would rate the candidates. To clarify, each engineer would develop their own criteria, weightings, and scores for their own candidates only. No multi-party (participatory) decisions were involved; these involve other complex issues beyond our scope here (e.g., differing degrees of over/under-confidence in participants, doctrinal paradox, etc.). Sharing the list would be optional.

Nevertheless, on completing their criteria lists, everyone was happy to share criteria and weightings. There were quite a few non-independent attributes related to appearance, grooming and dress, even within a single engineer’s list. Likewise with intelligence. Then there was sense of humor, quirkiness, religious compatibility, moral virtues, education, type A/B personality, all the characteristics of Myers-Briggs, Eysenck, MMPI, and assorted personality tests. Each engineer rated a handful of candidates and calculated the weighted sum for each.

I asked everyone if their winning candidate matched their subjective judgment of who the winner should have been. A resounding no, across the board.

Some adherents of rigid multi-attribute decision processes address such disconnects between intuition and weighted-sum decision scores by suggesting that in this case we merely adjust the weightings. For example, MindTools suggests:

“If your intuition tells you that the top scoring option isn’t the best one, then reflect on the scores and weightings that you’ve applied. This may be a sign that certain factors are more important to you than you initially thought.”

To some, this sounds like an admission that subjective judgment is more reliable than the results of the numerical exercise. Regardless, no amount of adjusting scores and weights left the engineers confident that the method worked. No adjustment to the weight coefficients seemed to properly express tradeoffs between some of the attributes. I.e., no tweaking of the system ordered the candidates (from high to low) in a way that made sense to each evaluator. This meant the redesigned formula still wasn’t trustworthy. Again, the matter of complex interactions of non-independent criteria came up. The relative importance of attributes seems to change as one contemplates different aspects of a thing. A philosopher’s perspective would be that normative statements cannot be made descriptive by decomposition. Analytic methods don’t answer normative questions.

Interestingly, all the engineers felt that listing criteria and scoring them helped them make better judgments about the ideal spouse, but not the judgments resulting directly from the weighted-sum analysis.

Fact is, picking which supplier should get the contract and picking the best spouse candidate are normative, subjective decisions. No amount of dividing a subjective decision into components makes it objective. Nor does any amount of ranking or scoring. A quantified opinion is still an opinion. This doesn’t mean we shouldn’t use decision matrices or quantify our sentiments, but it does mean we should not hide behind such quantifications.

From the perspective of psychology, decomposing the decision into parts seems to make sense. Expert opinion is known to be sometimes marvelous, sometimes terribly flawed. Daniel Kahneman writes extensively on associative coherence, finding that our natural, untrained tendency is to reach conclusions first, and justify them second. Kahneman and Gary Klein looked in detail at expert opinions in “Conditions for Intuitive Expertise: a Failure to Disagree(American Psychologist, 2009). They found that short-answer expert opinion can be very poor. But they found that the subjective judgments of experts forced to examine details and contemplate alternatives – particularly when they have sufficient experience to close the intuition feedback loop ­– are greatly improved.

Their findings seem to support the aircraft engineers’ views of the weight-sum analysis process. Despite the risk of confusing reasons with causes, enumerating the evaluation criteria and formally assessing them aids the subjective decision process. Doing so left them more confident about their decisions, for spouse and for aircraft system, though those decision differed from the ones produced by weighted sums. In the case of the aircraft systems, the engineers had to live with the results of the weighted-sum scoring.

I was one of the engineers who disagreed with the results of the aircraft system decisions.  The weighted-sum process awarded a very large contract to the firm whose design I judged inferior. Ten years later, service problems were severe enough that the Air Force agreed to switch to the vendor I had subjectively judged best. As for the engineer-spouse decisions, those of my old engineering team are all successful so far. It may not be a coincidence that the divorce rates of engineers are among the lowest of all professions.

——————-

Hedy Lamarr was granted a patent for spread-spectrum communication technology, paving the way for modern wireless networking.

Hedy Lamarr

,

3 Comments

A New Era of Risk Management?

The quality of risk management has mostly fallen for the past few decades. There are signs of change for the better.

Risk management is a broad field; many kinds of risk must be managed. Risk is usually defined in terms of probability and cost of a potential loss. Risk management, then, is the identification, assessment and prioritization of risks and the application of resources to reduce the probability and/or cost of the loss.

The earliest and most accessible example of risk management is insurance, first documented in about 1770 BC in the Code of Hammurabi (e.g., rules 23, 24, and 48). The Code addresses both risk mitigation, through threats and penalties, and minimizing loss to victims, through risk pooling and insurance payouts.

Golden Gate BridgeInsurance was the first example of risk management getting serious about risk assessment. Both the frequentist and quantified subjective risk measurement approaches (see recent posts on belief in probability) emerged from actuarial science developed by the insurance industry.

Risk assessment, through its close relatives, decision analysis and operations research, got another boost from World War II. Big names like Alan Turing, John Von Neumann, Ian Fleming (later James Bond author) and teams at MIT, Columbia University and Bletchley Park put quantitative risk analyses of several flavors on the map.

Today, “risk management” applies to security guard services, portfolio management, terrorism and more. Oddly, much of what is called risk management involves no risk assessment at all, and is therefore inconsistent with the above definition of risk management, paraphrased from Wikipedia.

Most risk assessment involves quantification of some sort. Actuarial science and the probabilistic risk analyses used in aircraft design are probably the “hardest” of the hard risk measurement approaches, Here, “hard” means the numbers used in the analyses come from measurements of real world values like auto accidents, lightning strikes, cancer rates, and the historical failure rates of computer chips, valves and motors. “Softer” analyses, still mathematically rigorous, involve quantified subjective judgments in tools like Monte Carlo analyses and Bayesian belief networks. As the code breakers and submarine hunters of WWII found, trained experts using calibrated expert opinions can surprise everyone, even themselves.

A much softer, yet still quantified (barely), approach to risk management using expert opinion is the risk matrix familiar to most people: on a scale of 1 to 4, rate the following risks…, etc. It’s been shown to be truly worse than useless in many cases, for a variety of reasons by many researchers. Yet it remains the core of risk analysis in many areas of business and government, across many types of risk (reputation, credit, project, financial and safety). Finally, some of what is called risk management involves no quantification, ordering, or classifying. Call it expert intuition or qualitative audit.

These soft categories of risk management most arouse the ire of independent and small-firm risk analysts. Common criticisms by these analysts include:

1. “Risk management” has become jargonized and often involves no real risk analysis.
2. Quantification of risk in some spheres is plagued by garbage-in-garbage-out. Frequency-based models are taken as gospel, and believed merely because they look scientific (e.g., Fukushima).
3. Quantified/frequentist risk analyses are not used in cases where historical data and a sound basis for them actually exists (e.g., pharmaceutical manufacture).
4. Big consultancies used their existing relationships to sell unsound (fluff) risk methods, squeezing out analysts with sound methods (accused of Arthur Anderson, McKinsey, Bain, KPMG).
5. Quantitative risk analyses of subjective type commonly don’t involve training or calibration of those giving expert opinions, thereby resulting in incoherent (in the Bayesian sense) belief systems.
6. Groupthink and bad management override rational input into risk assessment (subprime mortgage, space shuttle Challenger).
7. Risk management is equated with regulatory compliance (banking operations, hospital medicine, pharmaceuticals, side-effect of Sarbanes-Oxley).
8. Some professionals refuse to accept any formal approach to risk management (medical practitioners and hospitals).

While these criticisms may involve some degree of sour grapes, they have considerable merit in my view, and partially explain the decline in quality of risk management. I’ve worked in risk analysis involving uranium processing, nuclear weapons handling, commercial and military aviation, pharmaceutical manufacture, closed-circuit scuba design, and mountaineering. If the above complaints are valid in these circles – and they are –  it’s easy to believe they plague areas where softer risk methods reign.

Several books and scores of papers specifically address the problems of simple risk-score matrices, often dressed up in fancy clothes to look rigorous. The approach has been shown to have dangerous flaws by many analysts and scholars, e.g., Tony Cox, Sam SavageDouglas Hubbard, and Laura-Diana Radu. Cox shows examples where risk matrices assign higher qualitative ratings to quantitatively smaller risks. He shows that risks with negatively correlated frequencies and severities can result in risk-matrix decisions that are worse than random decisions. Also, such methods are obviously very prone to range compression errors. Most interestingly, in my experience, the stratification (highly likely, somewhat likely, moderately likely, etc.) inherent in risk matrices assume common interpretation of terms across a group. Many tests (e.g., Kahneman & Tversky and Budescu, Broomell, Por) show that large differences in the way people understand such phrases dramatically affect their judgments of risk. Thus risk matrices create the illusion of communication and agreement where neither are present.

Nevertheless, the risk matrix has been institutionalized. It is embraced by government (MIL-STD-882), standards bodies (ISO 31000), and professional societies (Project Management Institute (PMI), ISACA/COBIT). Hubbard’s opponents argue that if risk matrices are so bad, why do so many people use them – an odd argument, to say the least. ISO 31000, in my view, isn’t a complete write-off. In places, it rationally addresses risk as something that can be managed through reduction of likelihood, reduction of consequences, risk sharing, and risk transfer. But elsewhere it redefines risk as mere uncertainty, thereby reintroducing the positive/negative risk mess created by economist Frank Knight a century ago. Worse, from my perspective, like the guidelines of PMI and ISACA, it gives credence to structure in the guise of knowledge and to process posing as strategy. In short, it sets up a lot of wickets which, once navigated, give a sense that risk has been managed when in fact it may have been merely discussed.

A small benefit of the subprime mortgage meltdown of 2008 was that it became obvious that the financial risk management revolution of the 1990s was a farce, exposing a need for deep structural changes. I don’t follow financial risk analysis closely enough to know whether that’s happened. But the negative example made public by the housing collapse has created enough anxiety in other disciplines to cause some welcome reappraisals.

There is surprising and welcome activity in nuclear energy. Several organizations involved in nuclear power generation have acknowledged that we’ve lost competency in this area, and have recently identified paths to address the challenges. The Nuclear Energy Institute recently noted that while Fukushima is seen as evidence that probabilistic risk analysis (PRA) doesn’t work, if Japan had actually embraced PRA, the high risk of tsunami-induced disaster would have been immediately apparent. Late last year the Nuclear Energy Institute submitted two drafts to the U.S. Nuclear Regulatory Commission addressing lost ground in PRA and identifying a substantive path forward: Reclaiming the Promise of Risk-Informed Decision-Making and Restoring Risk-Informed Regulation. These documents acknowledge that the promise of PRA has been stunted by distrust of the method, focus on compliance instead of science, external audits by unqualified teams, and the above-mentioned Fukushima fallacy.

Likewise, the FDA, often criticized for over-regulating and over-reach – confusing efficacy with safety – has shown improvement in recent years. It has revised its decades-old process validation guidance to focus more on verification, scientific evidence and risk analysis tools rather than validation and documentation. The FDA’s ICH Q9 (Quality Risk Management) guidelines discuss risk, risk analysis and risk management in terms familiar to practitioners of “hard” risk analysis, even covering fault tree analysis (the “hardest” form of PRA) in some detail. The ASTM E2500 standard moves these concepts further forward. Similarly, the FDA’s recent guidelines on mobile health devices seem to accept that the FDA’s reach should not exceed its grasp in the domain of smart phones loaded with health apps. Reading between the lines, I take it that after years of fostering the notion that risk management equals regulatory compliance, the FDA realized that it must push drug safety far down into the ranks of the drug makers in the same way the FAA did with aircraft makers (with obvious success) in the late 1960s. Fostering a culture of safety rather than one of compliance distributes the work of providing safety and reduces the need for regulators to anticipate every possible failure of every step of every process in every drug firm.

This is real progress. There may yet be hope for financial risk management.

, ,

4 Comments

Common-Mode Failure Driven Home

In a recent post I mentioned that probabilistic failure models are highly vulnerable to wrong assumptions of independence of failures, especially in redundant system designs. Common-mode failures in multiple channels defeats the purpose of redundancy in fault-tolerant designs. Likewise, if probability of non-function is modeled (roughly) as historical rate of a specific component failure times the length of time we’re exposed to the failure, we need to establish that exposure time with great care. If only one channel is in control at a time, failure of the other channel can go undetected. Monitoring systems can detect such latent failures. But then failures of the monitoring system tend to be latent.

For example, your car’s dashboard has an engine oil warning light. That light ties to a monitor that detects oil leaks from worn gaskets or loose connections before the oil level drops enough to cause engine damage. Without that dashboard warning light, the exposure time to an undetected slow leak is months – the time between oil changes. The oil warning light alerts you to the condition, giving you time to deal with it before your engine seizes.

But what if the light is burned out? This failure mode is why the warning lights flash on for a short time when you start your car. In theory, you’d notice a burnt-out warning light during the startup monitor test. If you don’t notice it, the exposure time for an oil leak becomes the exposure time for failure of the warning light. Assuming you change your engine oil every 9 months, loss of the monitor potentially increases the exposure time from minutes to months, multiplying the probability of an engine problem by several orders of magnitude. Aircraft and nuclear reactors contain many such monitoring systems. They need periodic maintenance to ensure they’re able to detect failures. The monitoring systems rarely show problems in the check-ups; and this fact often lures operations managers, perceiving that inspections aren’t productive, into increasing maintenance intervals. Oops. Those maintenance intervals were actually part of the system design, derived from some quantified level of acceptable risk.

Common-mode failures get a lot press when they’re dramatic. They’re often used by risk managers as evidence that quantitative risk analysis of all types doesn’t work. Fukushima is the current poster child of bad quantitative risk analysis. Despite everyone’s agreement that any frequencies or probabilities used in Fukushima analyses prior to the tsunami were complete garbage, the result for many was to conclude that probability theory failed us. Opponents of risk analysis also regularly cite the Tacoma Narrows Bridge collapse, the Chicago DC-10 engine-loss disaster, and the Mount Osutaka 747 crash as examples. But none of the affected systems in these disasters had been justified by probabilistic risk modeling. Finally, common-mode failure is often cited in cases where it isn’t the whole story, as with the Sioux City DC-10 crash. More on Sioux City later.

On the lighter side, I’d like to relate two incidents – one personal experience, one from a neighbor – that exemplify common-mode failure and erroneous assumptions of exposure time in everyday life, to drive the point home with no mathematical rigor.

I often ride my bicycle through affluent Marin County. Last year I stopped at the Molly Stone grocery in Sausalito, a popular biker stop, to grab some junk food. I locked my bike to the bike rack, entered the store, grabbed a bag of chips and checked out through the fast lane with no waiting. Ninety seconds at most. I emerged to find no bike, no lock and no thief.

I suspect that, as a risk man, I unconsciously model all risk as the combination of some numerical rate (occurrence per hour) times some exposure time. In this mental model, the exposure time to bike theft was 90 seconds. I likely judged the rate to be more than zero but still pretty low, given broad daylight, the busy location with lots of witnesses, and the affluent community. Not that I built such a mental model explicitly of course, but I must have used some unconscious process of that sort. Thinking like a crook would have served me better.

If you were planning to steal an expensive bike, where would you go to do it? Probably a place with a lot of expensive bikes. You might go there and sit in your pickup truck with a friend waiting for a good opportunity. You’d bring a 3-foot long set of chain link cutters to make quick work of the 10 mm diameter stem of a bike lock. Your friend might follow the victim into the store to ensure you were done cutting the lock and throwing the bike into the bed of your pickup to speed away before the victim bought his snacks.

After the fact, I had much different thought thoughts about this specific failure rate. More important, what is the exposure time when the thief is already there waiting for me, or when I’m being stalked?

My neighbor just experienced a nerve-racking common mode failure. He lives in a San Francisco high-rise and drives a Range Rover. His wife drives a Mercedes. He takes the Range Rover to work, using the same valet parking-lot service every day. He’s known the attendant for years. He takes his house key from the ring of vehicle keys, leaving the rest on the visor for the attendant. He waves to the attendant as he leaves the lot on way to the office.

One day last year he erred in thinking the attendant had seen him. Someone else, now quite familiar with his arrival time and habits, got to his Range Rover while the attendant was moving another car. The thief drove out of the lot without the attendant noticing. Neither my neighbor nor the attendant had reason for concern. This gave the enterprising thief plenty of time. He explored the glove box, finding the registration, which includes my neighbor’s address. He also noticed the electronic keys for the Mercedes.

The thief enlisted a trusted colleague, and drove the stolen car to my neighbor’s home, where they used the electronic garage entry key tucked neatly into its slot in the visor to open the gate. They methodically spiraled through the garage, periodically clicking the button on the Mercedes key. Eventually they saw the car lights flash and they split up, each driving one vehicle out of the garage using the provided electronic key fobs. My neighbor lost two cars though common-mode failures. Fortunately, the whole thing was on tape and the law men were effective; no vehicle damage.

Should I hide my vehicle registration, or move to Michigan?

—————–

In theory, there’s no difference between theory and practice. In practice, there is.

Leave a comment