Archive for category Management Science

Use and Abuse of Failure Mode & Effects Analysis in Business

On investigating about 80 deaths associated with the drug heparin in 2009, the FDA found that over-sulphated chondroitin with toxic effects had been intentionally substituted for a legitimate ingredient for economic reasons. That is, an unscrupulous supplier sold a counterfeit chemical costing 1% as much as the real thing and it killed people.

This wasn’t unprecedented. Gentamicin, in the late 1980s, was a similar case. Likewise Cefaclor in 1996, and again with diethylene glycol sold as glycerin in 2006.

Adulteration is an obvious failure mode of supply chains and operations for drug makers. Drug firms buying adulterated raw material had presumably conducted failure mode effects analyses at several levels. An early-stage FMEA should have seen the failure mode and assessed its effects, thereby triggering the creation of controls to prevent the process failure. So what went wrong?

The FDA’s reports on the heparin incident didn’t make public any analyses done by the drug makers. But based on the “best practices” specified by standards bodies, consulting firms, and many risk managers, we can make a good guess. Their risk assessments were likely misguided, poorly executed, gutless, and ineffective.

Abuse of FMEA - On Risk Of. Photo by Bill StoragePromoters of FMEAs as a means of risk analysis often cite aerospace as a guiding light in matters of risk. Commercial aviation should be the exemplar of risk management. In no other endeavor has mankind made such an inherently dangerous activity so safe as commercial jet flight.

While those in pharmaceutical risk and compliance extol aviation, they mostly stray far from its methods, mindset, and values. This is certainly the case with the FMEA, a tool poorly understood, misapplied, poorly executed, and then blamed for failing to prevent catastrophe.

In the case of heparin, a properly performed FMEA exercise would certainly have identified the failure mode. But FMEA wasn’t even the right tool for identifying that hazard in the first place. A functional hazard anlysis (FHA) or Business Impact Analysis (BIA) would have highlighted chemical contamination leading to death of patients, supply disruption, and reputation damage as a top hazard in minutes. I know this for fact, because I use drug manufacture as an example when teaching classes on FHA. First-day students identify that hazard without being coached.

FHAs can be done very early in the conceptual phase of a project or system design. They need no implementation details. They’re short and sweet, and they yield concerns to address with high priority. Early writers on the topic of FMEA explicitly identified it as being something like the opposite of an FHA, for former being “bottom-up, the latter “top down,” NASA’s response to the USGS on the suitability of FMEAs their needs, for example, stressed this point. FMEAs rely strongly on implementation details. They produce a lot of essential but lower-value content (essential because FMEAs help confirm which failure modes can be de-prioritized) when there is an actual device or process design.

So a failure mode of risk management is using FMEAs for purposes other than those for which they were designed. Equating FMEA with risk analysis and risk management is a gross failure mode of management.

If industry somehow stops misusing FMEAs, they then face the hurdle of doing them well. This is a challenge, as the quality of training, guidance, and facilitation of FMEAs has degraded badly over the past twenty years.

FMEAs, as promoted by the Project Management Institute, ISO 31000, and APM PRAM, to name a few, bear little resemblance to those in aviation. I know this, from three decades of risk work in diverse industries, half of it in aerospace. You can see the differences by studying sample FMEAs on the web.

It’s anyone’s guess how  FMEAs went so far astray. Some blame the explosion of enterprise risk management suppliers in the 1990s. ERM, partly rooted in the sound discipline of actuarial science, generally lacks rigor. It was up-sold by consultancies to their existing corporate clients, who assumed those consultancies actually had background in risk science, which they did not.  Studies a decade later by Protiviti and the EIU failed to show any impact on profit or other benefit of ERM initiatives, except for positive self-assessments by executives of the firms.

But bad FMEAs predated the ERM era. Adopted by US automotive industry in the 1970s, sloppy FMEAs justified optimistic warranty claims estimates for accounting purposes. While Toyota was implementing statistical process control to precisely predict the warranty cost of adverse tolerance accumulation, Detroit was pretending that multiplying ordinal scales of probability, severity, and detectability was mathematically or scientifically valid.

Citing inability to quantify failure rates of basic components and assemblies (an odd claim given the abundance of warranty and repair data), auto firms began to assign scores or ranks to failure modes rather than giving probability values between zero and one. This first appears in automotive conference proceedings around 1971. Lacking hard failure rates – if in fact they did – reliability workers could have estimated numeric probability values based on subjective experience or derived them from reliability handbooks then available. Instead they began to assign ranks or scores on a 1 to 10 scale.

In principle there is no difference between guessing a probability of 0.001 (a numerical probability value) and guessing a value of “1” on a 10 scale (either an ordinal number or a probability value mapped to a limited-range score). But in practice there is a big difference.

One difference is that people estimating probability scores in facilitated FMEA sessions usually use grossly different mental mapping processes to get from labels like “extremely likely” or “moderately unlikely” to numerical probabilities. A physicist sees “likely” for a failure mode to mean more than once per million; a drug trial manager interprets it to mean more than 5%. Neither is wrong; but if those two specialists aren’t alert to the difference, when they each judge a failure likely, there will be a dangerous illusion of communication and agreement where none exists.

Further, FMEA participants don’t agree – and often don’t know they don’t agree – on the mapping of their probability estimates into 1-10 scores.

The resultant probability scores or ranks (as opposed to P values between zero and one)  are used to generate Risk Priority Numbers (RPN), that first appeared in the American automotive industry. You won’t find RPN or anything like it in aviation FMEAs, or even the modern automotive industry. Detroit abandoned them long ago.

RPNs are defined as the arithmetic product of a probability score, a severity score, and a detection (more precisely, the inverse of detectability) score. The explicit thinking here is that risks can be prioritized on the basis of the product of three numbers, each ranging from 1 to 10.

An implicit – but critical, though never addressed by users of RPN – thinking here is that engineers, businesses, regulators and consumers are risk-neutral. Risk neutrality, as conceived in portfolio choice theory, would in this context mean that everyone would be indifferent to two risks of the same RPN, even comprising very different probability and severity values.That is, an RPN formed from the scores {2,8,4} would dictate the same risk response as failure modes with RPN scores {8,4,2} and {4,4,4} since the RPN values (product of the scores) are equal. In the real world this is never true. It is usually very far from true. Most of us are not risk-neutral, we’re risk-averse. That changes things. As a trivial example, banks might have valid reasons for caring more about a single $100M loss than one hundred $1M losses.

Beyond the implicit assumption of risk-neutrality, RPN has other problems. As mentioned above, there both cognitive and group-dynamics problems arise when FMEA teams attempt to model probabilities as ranks or scores. Similar difficulties arise with scoring the cost of a loss, i.e., the severity component of RPN. Again there is the question of why, if you know the cost of a failure (in dollars, lives lost, or patients not cured) would you convert a valid measurement into a subjective score (granting, for sake of argument, that risk-neutrality is justified)? Again the answer is to enter that score into the RPN calculation.

Still more problematic is the detectability value used in RPNs. In a non-trivial system or process, detectability and probability are not independent variables. And there is vagueness around the meaning of detectability. Is it the means by which you know the failure mode has happened, after the fact? Or is there an indication that the failure is about to happen, such that something can be observed thereby preventing the failure? If the former, detection is irrelevant to risk of failure, if the latter the detection should be operationalized in the model of the system. That is, if a monitor (e.g, brake fluid level check) is in a system, the monitor is a component with its own failure modes and exposure times, which impact its probability of failure. This is how aviation risk analysis models such things. But not the Project Management Institute

A simple summary of the problems with scoring, ranking and RPN is that adding ambiguity to a calculation necessarily reduces precision.

I’ve identified  several major differences between the approach to FMEAs used in aviation and those who claim they’re behaving like aerospace. They are not. Aviation risk analysis has reduced risk by a factor of roughly a thousand, based on fatal accident rates since aviation risk methods were developed. I don’t think the PMI can sees similar results from its adherents.

A partial summary of failure modes of common FMEA processes includes the following, based on the above:

  • Equating FMEA with risk assessment
  • Confusing FMEA with Hazard Analysis
  • Viewing the FMEA as a Quality (QC) function
  • Insufficient rigor in establishing probability and severity values
  • Unwarranted (and implicit) assumption of risk-neutrality
  • Unsound quantification of risk (RPN)
  • Confusion about the role of detection

The corrective action for most of these should be obvious, including operationalizing a system’s detection methods, using numeric (non-ordinal) probability and cost values (even if estimated) instead of masking ignorance and uncertainty with ranking and scoring, and steering clear of Risk Priority Numbers and the Project Management Institute.

2 Comments

Which Is To Be Master? – Humpty Dumpty’s Research Agenda

Should economics, sociology or management count as science?

2500 years ago, Plato, in The Sophist, described a battle between the gods and the earth giants. The fight was over the foundations of knowledge. The gods thought knowledge came from innate concepts and deductive reasoning only. Euclid’s geometry was a perfect example – self-evident axioms plus deduced theorems. In this model, no experiments are needed. Plato explained that the earth giants, however, sought knowledge through earthly experience. Plato sided with the gods; and his opponents, the Sophists, sided with the giants. Roughly speaking, this battle corresponds to the modern tension between rationalism (the gods) and empiricism (the giants). For the gods, the articles of knowledge must be timeless, universal and certain. For the giants, knowledge is contingent, experiential, and merely probable.


Plato’s approach led the Greeks – Aristotle, most notably – to hold that rocks fall with speeds proportional to their weights, a belief that persisted for 2000 years until Galileo and his insolent ilk had the gall to test it. Science was born.

Enlightenment era physics aside, Plato and the gods are alive and well. Scientists and social reformers of the Enlightenment tried to secularize knowledge. They held that common folk could overturn beliefs with the right evidence. Empirical evidence, in their view, could trump any theory or authority. Math was good for deduction; but what’s good for math is not good for physics, government, and business management.

Euclidean geometry was still regarded as true – a perfect example of knowledge fit for the gods –  throughout the Enlightenment era. But cracks began to emerge in the 1800s through the work of mathematicians like Lobachevsky and Riemann. By considering alternatives to Euclid’s 5th postulate, which never quite seemed to fit with the rest, they invented other valid (internally consistent) geometries, incompatible with Euclid’s. On the surface, Euclid’s geometry seemed correct, by being consistent with our experience. I.e., angle sums of triangles seem to equal 180 degrees. But geometry, being pure and of the gods, should not need validation by experience, nor should it be capable of such validation.

Non-Euclidean Geometry rocked Victorian society and entered the domain of philosophers, just as Special Relativity later did. Hotly debated, its impact on the teaching of geometry became the subject of an entire book by conservative mathematician and logician Charles Dodgson. Before writing that book, Dodgson published a more famous one, Alice in Wonderland.

The mathematical and philosophical content of Alice have been analyzed at length. Alice’s dialogue with Humpty Dumpty is a staple of semantics and semiotics, particularly, Humpty’s use of stipulative definition. Humpty first reasons that “unbirthdays” are better than birthdays, there being so many more of them, and then proclaims glory. Picking up that dialogue, Humpty announces,

‘And only one [day of the year] for birthday presents, you know. There’s glory for you!’

‘I don’t know what you mean by “glory”,’ Alice said.

Humpty Dumpty smiled contemptuously. ‘Of course you don’t — till I tell you. I meant “there’s a nice knock-down argument for you!”‘

‘But “glory” doesn’t mean “a nice knock-down argument”,’ Alice objected.

‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’

‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’

‘The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all.’

Humpty is right that one can redefine terms at will, provided a definition is given. But the exchange hints at a deeper notion. While having a private language is possible, it is also futile, if the purpose of language is communication.

Another aspect of this exchange gets little coverage by analysts. Dodgson has Humpty emphasize the concept of argument (knock-down), nudging us in the direction of formal logic. Humpty is surely a stand-in for the proponents of non-Euclidean geometry, against whom Dodgson is strongly (though wrongly – more below) opposed. Dodgson was also versed in Greek philosophy and Platonic idealism. Humpty is firmly aligned with Plato and the gods. Alice sides with Plato’s earth giants, the sophists. Humpty’s question, which is to be master?, points strongly at the battle between the gods and the giants. Was this Dodgson’s main intent?

When Alice first chases the rabbit down the hole, she says that she fell for a long time, and reasons that the hole must be either very deep or that she fell very slowly. Dodgson, schooled in Newtonian mechanics, knew, unlike the ancient Greeks, that all objects fall at the same speed. So the possibility that Alice fell slowly suggests that even the laws of nature are up for grabs. In science, we accept that new evidence might reverse what we think are the laws of nature, yielding a scientific revolution (paradigm shift).

In trying to vindicate “Euclid’s masterpiece,” as Dodgson called it, he is trying to free himself from an unpleasant logical truth: within the realm of math, we have no basis to the think the world is Euclidean rather than Lobichevskian. He’s trying to rescue conservative mathematics (Euclidean geometry) by empirical means. Logicians would say Dodgson is confusing a synthetic and a posteriori proposition with one that is analytic and a priori. That is, justification of the 5th postulate can’t rely on human experience, observations, or measurements.  Math and reasoning feed science; but science can’t help math at all. Dodgson should know better. In the battle between the gods and the earth giants, experience can only aid the giants, not the gods. As historian of science Steven Goldman put it, “the connection between the products of deductive reasoning and reality is not a logical connection.” If mathematical claims could be validated empirically then they wouldn’t be timeless, universal and certain.

While Dodgson was treating math as a science, some sciences today have the opposite problem. They side with Plato. This may be true even in physics. String theory, by some accounts, has hijacked academic physics, especially its funding. Wolfgang Lerche of CERN called string theory the Stanford propaganda machine working at its fullest. String theory at present isn’t testable. But its explanatory power is huge; and some think physicists pursue it with good reason. It satisfies at least one of the criteria Richard Dawid lists as reasons scientists follow unfalsifiable theories:

  1. the theory is the only game in town; there are no other viable options
  2. the theoretical research program has produced successes in the past
  3. the theory turns out to have even more explanatory power than originally thought

Dawid’s criteria may not apply to the social and dismal sciences. Far from the only game in town, too many theories – as untestable as strings, all plausible but mutually incompatible – vie for our Nobel honors.

Privileging innate knowledge and reason – as Plato did – requires denying natural human skepticism. Believing that intuition alone is axiomatic for some types of knowledge of the world requires suppressing skepticism about theorems built on those axioms. Philosophers call this epistemic foundationalism. A behavioral economist might see it as confirmation bias and denialism.

Physicists accuse social scientists of continually modifying their theories to accommodate falsifying evidence, still clinging to a central belief or interpretation. These recall the Marxists’ fancy footwork to rationalize their revolution not first occurring in a developed country, as was predicted. A harsher criticism is that social sciences design theories from the outset to be explanatory but not testable. In the 70s, Clifford D Shearing facetiously wrote in The American Sociologist that “a cursory glance at the development of sociological theory should suggest… that any theorist who seeks sociological fame must insure that his theories are essentially untestable.”

The Antipositivist school is serious about the issue Shearing joked about. Jurgen Habermas argues that sociology cannot explain by appeal to natural law. Deirdre (Donald) McCloskey mocked the empiricist leanings of Milton Friedman as being invalid in principle. Presumably, antipositivists are content that theories only explain, not predict.

In business management, the co-occurence of the terms theory and practice and the usage of the string “theory and practice” as opposed to “theory and evidence” or “theory and testing” suggests that Plato reigns in management science. “Practice” seems to mean interacting with the world under the assumption that that the theory is true.

The theory and practice model is missing the notion of testing those beliefs against the world or, more importantly, seeking cases in the world that conflict with the theory. Further, it has no notion of theory selection; theories do not compete for success.

Can a research agenda with no concept of theory testing, falsification effort, or theory competition and theory choice be scientific? If so, it seems creationism and astrology should be called science. Several courts (e.g. McLean vs. Arkansas) have ruled against creationism on the grounds that its research program fails to reference natural law, is untestable by evidence, and is certain rather than tentative. Creationism isn’t concerned with details. Intelligent Design (old-earth creationism), for example, is far more concerned with showing Darwinism wrong that with establishing an age of the earth. There is no scholarly debate between old-earth and young-earth creationism on specifics.

Critics say the fields of economists and business and business management are likewise free of scholarly debate. They seem to have similarly thin research agendas. Competition between theories in these fields is lacking; incompatible management theories coexist without challenges. Many theorist/practitioners seem happy to give priority to their model over reality.

Dodgson appears also to have been wise to the problem of a model having priority over the thing it models – believing the model is more real than the world. In Sylvie and Bruno Concluded, he has Mein Herr brag about his country’s map-making progress. They advanced their mapping skill from rendering at 6 inches per mile to 6 yards per mile, and then to 100 yards per mile. Ultimately, they built a map with scale 1:1. The farmers protested its use, saying it would cover the country and shut out the light. Finally, forgetting what models what, Mein Herr explains, “so we now use the country itself, as its own map, and I assure you it does nearly as well.”

Humpty Dumpty had bold theories that he furiously proselytized. Happy to construct his own logical framework and dwell therein, free from empirical testing, his research agenda was as thin as his skin. Perhaps a Nobel Prize and a high post in a management consultancy are in order. Empiricism be damned, there’s glory for you.

 

There appears to be a sort of war of Giants and Gods going on amongst them; they are fighting with one another about the nature of essence…

Some of them are dragging down all things from heaven and from the unseen to earth, and they literally grasp in their hands rocks and oaks; of these they lay hold, and obstinately maintain, that the things only which can be touched or handled have being or essence…

And that is the reason why their opponents cautiously defend themselves from above, out of an unseen world, mightily contending that true essence consists of certain intelligible and incorporeal ideas…  –  Plato, Sophist

An untestable theory cannot be improved upon by experience. – David Deutsch

An economist is an expert who will know tomorrow why the things he predicted yesterday didn’t happen. – Earl Wilson

 

 

6 Comments

Frederick Taylor Must Die

If management thinker Frederick Winslow Taylor (died 1915) were alive today he would certainly resent the straw man we have stood in his place. Taylor tried to inject science into the discipline of management. Innocent of much of the dehumanization of workers pinned on him, Taylor still failed in several big ways, even by the standards of his own time. For example, he failed at science.

What Taylor called science was mostly mere measurement – no explanatory or predictive theories. And he certainly didn’t welcome criticism or court refutation. Not only did he turn workers into machines, he turned managers into machines that did little more than take measurements. And as Paul Zak notes in Trust Factor Taylor failed to recognize that organizations are people embedded in a culture.

Taylor is long dead, but Taylorism is alive and well. Before I left Goodyear Aerospace in the late 80’s, I recall the head of Human Resources at a State of the Company address reporting trends in terms of “personnel units.” Did these units include androids and work animals I wondered.

Heavy-handed management can turn any of Douglas McGregor’s Theory Y (internally motivated) workers into Theory X (lazy, needs to be prodded, extrinsic rewards) using tried and true industrial-era management methodologies. That is, one can turn TPS, the Toyota Production System, originally aimed at developing people, into just another demoralizing bureaucratic procedure wearing lipstick.

In Silicon Valley, software creation is modeled as a manufacturing process. Scrum team members often have no authority for schedule, backlog, communications or anything else; and teams “do agile” with none of the self-direction, direct communications, or other principles laid out in the agile manifesto. Yet sprint velocity is computed to three decimal places by steady Taylorist hands. Across the country, micromanagement and Taylorism are two sides of the same coin, committed to eliminating employees’ control over their own futures and any sense of ownership in their work product. As Daniel Pink says in Drive, we are meant to be autonomous individuals, not individual automatons. This is particularly true for developers, who are inherently self-directed and intrinsically motivated. Scrum is allegedly based on Theory Y, but like Matrix Management a generation earlier, too many cases of Scrum are Theory X at core with a veneer of Theory Y.

Management is utterly broken, especially at the lowest levels. It is shaped to fill two forgotten needs – the deskilling of labor, and communication within fragmented networks.

Henry Ford is quoted as saying, “Why is it every time I ask for a pair of hands, they come with a brain attached?” Likely a misattribution derived from Wedgwood (below), the quote reflects generations of self-destructive management sentiment. The intentional de-skilling of the workforce accompanied industrialization in 18th century England. Division of labor yielded efficient operations on a large scale; and it reduced the risk of unwanted knowledge transfer.

When pottery maker Josiah Wedgwood built his factory, he not only provided for segmentation of work by tool and process type. He also built separate entries to each factory segment, with walls to restrict communications between workers having different skills and knowledge. Wedgwood didn’t think his workers were brain-dead hands; but he would have preferred that they were.

He worried that he might be empowering potential competitors. He was concerned that workers possessed drive and an innovative spirit, not that they lacked these qualities. Wedgwood pioneered intensive division of labor, isolating mixing, firing, painting and glazing. He ditched the apprentice-journeyman-master system for fear of spawning a rival, as actually became the case with employee John Voyez. Wedgwood wanted hands – skilled hands – without brains. “We have stepped beyond the other manufactur[er]s and we must be content to train up hands to suit our purpose” (Wedgwood to Bentley, Sep 7, 1769).

When textile magnate Francis Lowell built factories including dormitories, chaperones, and access to culture and education, he was trying to compensate for the drudgery of long hours of repetitive work and low wages. When Lowell cut wages the young female workers went on strike, published magazines critical of Lowell (“… just as though we were so many living machines” – Ellen Collins, Lowell Offering, 1845) and petitioned Massachusetts for legislation to limit work hours. Lowell wanted hands but got brains, drive, and ingenuity.

To respond to market dynamics and fluctuations in demand for product and in supply of raw materials, a business must have efficient and reliable communication channels. Commercial telephone networks only began to emerge in the late 1800s. Long distance calling was a luxury well into the 20th century. When the Swift Meat Packing Company pioneered the vertically integrated production system around 1915, G.F. Swift faced the then-unique challenge of needing to coordinate sales, supply chain, marketing, and operations people from coast to coast. He set up central administration and a hierarchical, military-style organizational structure for the same reason Julius Caesar’s army used that structure – to quickly move timely knowledge and instructions up, down, and laterally.

So our management hierarchies address a long-extinct communication need and our command/control management methods reflect an industrial age wish for mindless carrot-stick employees – a model the industrialists themselves knew to be inaccurate. But we’ve made this wish come true; treat people badly long enough and they’ll conform to your Theory X expectations. Business schools tout best-practice management theories that have never been subjected to testing or disconfirmation. In their views, it is theory, and therefore it’s science.

Much of modern management theory pretends that today’s knowledge workers are “so many living machines,” human resources, human capital, assets, and personnel units.

Unlike in the industrial era, modern business has no reason to de-skill its labor, blue collar or white. Yet in many ways McKinsey and other management consultancies like them seem dedicated to propping up and fine tuning Theory X, as evidence to the priority of structure in the 7S, Weisbord, and Galbraith organizational models for example.

This is an agency problem with a trillion dollar price tag. When asked which they would prefer, a company of self-motivated, self-organizing, creative problem solvers or flock of compliant drones, most CEOs would choose the former. Yet the systems we cultivate yield the latter. We’re managing 21st century organizations with 19th century tools.

For almost all companies, a high-performing workforce is the most important source of competitive advantage. Most studies of employee performance, particularly white-collar knowledge workers, find performance to hinge on engagement and trust (level of trust in managers and the firm by employees). Engagement and trust are closely tied to intrinsic motivation, autonomy, and sense of purpose. That is, performance is maximized when they’re able to tap into their skills, knowledge, experience, creativity, discipline, passion, agility and internal motivation. Studies by Deloitte, Towers Watson, Gallup, Aon Hewitt, John P Kotter, and Beer and Eisenstat over the past 25 years reach the same conclusions.

All this means Taylorism and embedding Theory X in organizational structure and management methodologies simply shackle the main source of high performance in most firms. As Pink says, command and control lead to compliance; autonomy leads to engagement. Peter Drucker fought for this point in the 1950s; America didn’t want to hear it. Frederick Taylor’s been dead for 100 years. Let’s let him rest in peace.

___


What actually stood between the carrot and the stick was, of course, a jackass. – Alfie Kohn, Punished by Rewards

Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity. – General George Patton

Control leads to compliance; autonomy leads to engagement. – Daniel H. Pink, Drive

The knowledge obtained from accurate time study, for example, is a powerful implement, and can be used, in one case to promote harmony between workmen and the management, by gradually educating, training, and leading the workmen into new and better methods of doing the work, or in the other case, it may be used more or less as a club to drive the workmen into doing a larger day’s work for approximately the same pay that they received in the past. – Frederick Taylor, The Principles of Scientific Management, 1913

That’s my real motivation – not to be hassled. That and the fear of losing my job, but y’know, Bob, that will only make someone work just hard enough not to get fired. – Peter Gibbons, Office Space, 1999

___

___

Bill Storage is a scholar in the history of science and technology who in his corporate days survived encounters with strategic management initiatives including Quality Circles, Natural Work Groups, McKinsey consultation, CPIP, QFD, Leadership Councils, Kaizen, Process Based Management, and TQMS.

 


			

3 Comments

McKinsey’s Behavioral Science

You might not think of McKinsey as being in the behavioral science business; but McKinsey thinks of themselves that way. They claim success in solving public sector problems, improving customer relationships, and kick-starting stalled negotiations through their mastery of neuro- and behavioral science. McKinsey’s Jennifer May et. al. say their methodology is “built on an extensive review of neuroscience and behavioral literature from the past decade and is designed to distill the scientific insights most relevant for governments, not-for-profits, and business leaders.”

McKinsey is also active in the Change Management/Leadership Management realm, which usually involves organizational, occupational  and industrial psychology based on behavioral science. Like most science, all this work presumably involves a good deal of iterating over hypothesis and evidence collection, with hypotheses continually revised in light of interpretations of evidence made possible by sound use of statistics.

Given that, and McKinsey’s phenomenal success at securing consulting gigs with the world’s biggest firms, you’d think McKinsey would set out spotless epistemic values. A bit has been written about McKinsey’s ability to walk proud despite questionable ethics. In his 2013 book The Firm Duff McDonald relates McKinsey’s role in creating Enron and sanctioning its accounting practices, and its 2008 endorsement of banks funding their balance sheets with debt, and its promotion of securitizing sub-prime mortgages.

Epistemic and Scientific Values

I’m not talking about those kinds of values. I mean epistemic and scientific values. These are focused on how we acquire knowledge and what counts as data, fact, and information. They are concerned with accuracy, clarity, falsifiability, reliability, testability, and justification – all the things that separate science from pseudoscience.

McKinsey boldly employs the Myers Briggs Type Indicator both internally and externally. They do this despite decades of studies by prominent universities showing MBTI to be essentially worthless from the perspective of survey methodology and statistical analysis. The studies point out that there is no evidence for the binomial distributions inherent in MBTI theory. They note that the standard error of measurement for MBTI’s dimensions are unacceptably large, and that its test/re-test reliability is poor. I.e., even in re-test intervals of five weeks, over half the subjects are reclassified. Analysis of MBTI data shows that its JP and SN scales strongly correlate with each other, which is undesirable. Meanwhile MBTI’s EI scale correlates with non-MBTI behavioral near-opposites. These findings impugn the basic structure of the Myers Briggs model. (The Big Five model does somewhat better in this realm.)

Five decades of studies show Myers-Briggs to be junk due to low evidential support. Did McKinsey mis-file those reports?

McKinsey’s Brussels director, Olivier Sibony, once expressed optimism about a nascent McKinsey collective decision framework, saying that while preliminary results we good, it still fell short of a standard psychometric tool such as Myers–Briggs.” Who finds Myers-Briggs to be such a standard tool? Not psychologists or statisticians. Shouldn’t attachment to a psychological test rejected by psychologists, statisticians, and experiment designers offset – if not negate – retrospective judgments by consultancies like McKinsey (Bain is in there too) that MBTI worked for them?

Epistemic values guide us to ask questions like:

  • What has been the model’s track record at predicting the outcome of future events?
  • How would you know if were working for you?
  • What would count as evidence that it was not working?

On the first question, McKinsey may agree with Jeffrey Hayes (whose says he’s an ENTP), CEO of CPP, owner of the Myers-Briggs® product, who dismisses criticism of MBTI by the many psychologists (thousands, writes Joseph Stromberg) who’ve deemed it useless. Hayes says“It’s the world’s most popular personality assessment largely because people find it useful and empowering […] It is not, and was never intended to be predictive…”

Does Hayes’ explanation of MBTI’s popularity (people find it useful) defend its efficacy and value in business? It’s still less popular than horoscopes, which people find useful, so should McKinsey switch to the higher standards of astrology to characterize its employees and clients?

Granting Hayes, for sake of argument, that popular usage might count toward evidence of MBTI’s value (and likewise for astrology), what of his statement that MBTI never was intended to be predictive? Consider the plausibility of a model that is explanatory – perhaps merely descriptive – but not predictive. What role can such a model have in science?

Explanatory but not Predictive?

This question was pursued heavily by epistemologist Karl Popper (who also held a PhD in Psychology) in the mid 20th century. Most of us are at least vaguely familiar with his role in establishing scientific values. He is most famous for popularizing the notion of falsifiability. For Popper, a claim can’t be scientific if nothing can ever count as evidence against it. Popper is particularly relevant to the McKinsey/MBTI issue because he took great interest in the methods of psychology.

In his youth Popper followed Freud and Adler’s psychological theories, and Einstein’s physics. Popper began to see a great contrast between Einstein’s science and that of the psychologists. Einstein made bold predictions for which experiments (e.g. Eddington’s) could be designed to show the prediction wrong if the theory were wrong. In contrast, Freud and Adler were in the business of explaining things already observed. Contemporaries of Popper, Carl Hempel in particular, also noted that explanation and prediction should be two sides of the same coin. I.e., anything that can explain a phenomenon should be able to be used to predict it. This isn’t completely uncontroversial in science; but all agree prediction and explanation are closely related.

Popper observed that Freudians tended to finds confirming evidence everywhere. Popper wrote:

Neither Freud nor Adler excludes any particular person’s acting in any particular way, whatever the outward circumstances. Whether a man sacrificed his life to rescue a drowning child (a case of sublimation) or whether he murdered the child by drowning him (a case of repression) could not possibly be predicted or excluded by Freud’s theory; the theory was compatible with everything that could happen. (emphasis in original – Replies to My Critics, 1974).

For Popper, Adler’s psychoanalytic theory was irrefutable, not because it was true, but because everything counted as evidence for it. On these grounds Popper thought pursuit of disconfirming evidence to be the primary goal of experimentation, not confirming evidence. Most hard science follows Popper on this value. A theory’s explanatory success is very little evidence of its worth. And combining Hempel with Popper yields the epistemic principle that even theories with predictive success have limited worth, unless those predictions are bold and can in principle be later found wrong. Horoscopes make countless correct predictions – like that we’ll encounter an old friend or narrowly escape an accident sometime in the indefinite future.

Popper brings to mind experiences where I challenged McKinsey consultants on reconciling observed behaviors and self-reported employee preferences with predictions – oh wait, explanations – given by Myers-Briggs. The invocation of sudden strengthening of otherwise mild J (Judging) in light of certain situational factors recalls Popper’s accusing Adler of being able to explain both aggression or submission as the consequence of childhood repression. What has priority – the personality theory or the observed behavior? Behavior fitting the model confirms it; and opposite behavior is deemed acting out of character. Sleight of hand saves the theory from evidence.

What’s the Attraction?

Many writers see Management Science as more drawn to theory and less to evidence (or counter-evidence) than is the case with the hard sciences – say, more Aristotelian and less Newtonian, more philosophical rationalism and less scientific empiricism. Allowing this possibility, let’s try to imagine what elements of Myers-Briggs theory McKinsey leaders find so compelling. The four dimensions of MBTI were, for the record, not based on evidence but on the speculation of Carl Jung. Nothing is wrong with theories based on a wild hunch, if they are born out by evidence and they withstand falsification attempts. Since this isn’t the case with Myers-Briggs, as shown by the testing mentioned above, there must be something in it that attracts consultants.

I’ve struggled with this. The most charitable reading I can make of McKinsey’s use of MBTI is that they want a quick predictor (despite Hayes’ cagey caution against it) of a person’s behavior in collaborative exercises or collective-decision scenarios. They must therefore believe all of the following, since removing any of these from their web of belief renders their practice (re Myers-Briggs) arbitrary or ill-motivated:

  • that MTBI is a reliable indicator of character and personality type
  • that personality is immutable and not plastic
  • that behavior in teams is mostly dependent on personality, not on training or education, not on group mores, and not on corporate rules and behavioral guides

Now that’s a dark assessment of humanity. And it conflicts with the last decade’s neuro- and behavioral science that McKinsey claims to have incorporated in its offerings. That science suggests our brains, our minds, and our behaviors are mutable, like our bodies. Few today doubt that personality is in some sense real, but the last few decades’ work suggest that it’s not made of concrete (for insiders, read this as Mischel having regained some ground lost to Kenrick and Funder).  It suggests that who we are is somewhat situational. For thousands of years we relied on personality models that explained behaviors as consequences of personalities, which were in turn only discovered through observations of behaviors. For example, we invented types (like the 16 MBTIs) based on behaviors and preferences thought to be perfectly static.

Evidence against static trait theory appears as secondary details in recent neuro- and behavioral science work. Two come to mind from the last week – Carstensen and DeLiema’s work at Stanford on the fading of positivity bias with age, and research at the Planck Institute for Human Cognitive and Brain Sciences showing the interaction of social affect, cognition and empathy.

Much attention has been given to neuroplasticity in recent years. Sifting through the associated neuro-hype, we do find some clues. Meta-studies on efforts to pair personality traits with genetic markers have come up empty. Neuroscience suggests that the ancient distinction between states and traits is far more complex and fluid than Aristotle, Jung and Adler theorized them to be – without the benefit of scientific investigation, evidence, and sound data analysis. Even if the MBTI categories could map onto reality, they can’t do the work asked of them. McKinsey’s enduring reliance on MBTI has an air of folk psychology and is at odds with its claims of embracing science. This cannot be – to use a McKinsey phrase – directionally correct.

If personality overwhelmingly governs behavior as McKinsey’s use of MBTI would suggest, then Change Management is futile. If personality does not own behavior, why base your customer and employee interactions on it? If immutable personalities control behavior, change is impossible. Why would anyone buy Change Management advice from a group that doesn’t believe in change?

 

 

2 Comments