In my blog of Sept 10, I noted that my newsletter statements in "Robust Criteria for Robust Decisions" had started an interesting conversation with Ralph Keeney. In the newsletter I had stated:
Research has shown:
1. The more effort put into understanding the criteria early in the process, the better the decision
2. Too little effort is generally put into understanding criteria.
He asked for references, and I provided, in my blog what weak evidence I had. He in turn sent me an interesting paper titled “Generating Objectives: Can Decision Makers Articulate What They Want?” (Management Science Vol 54 No 1, Jan 2008, pp 56-70). In this paper, Keeney and his colleagues present the results of a series of experiments designed to address how well people can list the objectives (i.e. criteria) they used in making decisions on real problems. In summary, Keeney and company concluded that people commonly undertake important decisions without considering many of the most important criteria. It seems that they generate only the objectives that are cued by their incomplete representation of the problem. In other words, as people work to understand a problem, by reading a problem statement, talking with others, hearing a news cast, etc, they build a mental model of the situation and base their criteria on this model.
The implications of this are:
- Decision making is guided by whatever incomplete set of objectives is made salient
- Criteria generation can be improved by helping decision makers develop a broader understanding of the problem through:
- Time - One of the studies in the paper showed that addressing the problem at multiple points in time increases the number of criteria identified.
- Multiple perspectives - Although not in the study, multiple perspectives (i.e. a team approach) can help develop the broader understanding
- Templates - External aids can help in generating the broader understanding.
What follows are some thoughts on these three criteria development crutches
My research in the 1980s focused on how engineers design products. In these studies my students and I video taped engineers solving simple but realistic design problems. We observed how engineers repeatedly returned to the problem statement as their mental model of the situation evolved. Before these experiments I tried to force students in my design classes to develop criteria first and then alternatives. The rationale was that, if you have a solution (or a set of solutions) in mind, then the criteria will evolve to match them. More recently, I have come to believe that criteria and alternatives co-evolve as the understanding is developed. That is not to say that you should just dive in. I now teach Quality Function Deployment (QFD) first but treat it as a living document. Also I have the students work in teams on all projects as that brings multiple perspectives.
One criticism of virtually all the research on decision making (including my own) is that it has been on individual decision makers. I comment on this in an earlier blog. The reality is that at work and to a great degree at home, we all solve problems with others. Either we bounce ideas off of each other or are on teams. In these situations the multiple perspectives help flesh out understanding and criteria. Large teams can actually reverse the situation. In many of my consulting jobs I see teams from multiple groups in an organization working to winnow down the criteria that are important for each individual group into a single, shared understanding. This is almost the antithesis of Keeney’s study.
The idea of using templates or other criteria crutches is one that we have tried to incorporate into Accord, our decision support software. Currently there are templates for about six different generic problems (e.g. Concept selection, Portfolio evaluation, Proposal selection, Vendor selection, Job candidate selection). However, many decisions in business are unique and developing a template for these problems is not possible. The paper that started this string “Robust Criteria for Robust Decisions” was an effort to address those problems that don’t allow for templates.
The upshot of the dialog with Ralph Keeney is that I will be doing some experiments this fall that address the two points I made initially. We will see what evolves to help frame decision problems.
Labels: criteria, criteria development, Ralph Keeny
I just wrote a newsletter that appear on my web site on how to develop criteria and titled "Robust Criteria for Robust Decisions
" In it I state:
Research has shown:
- The more effort put into understanding the criteria early in the process, the better the decision
- Too little effort is generally put into understanding criteria.
After sending this to my mailing list, I received a message from Ralph Keeny (A major thinker in decision making for the last 20+ years) asking " I am very interested in both of these issues, and I believe that each are true. However, research addressing these issues is not so easy to come by. Hence, if it is not too inconvenient, I would be pleased to receive references of the research referred to in the statements. Thank you very much."
This is what I wrote back.
Thanks for the note. I agree that data supporting these two contentions are hard to come by. By far the best I have seen is from a German PhD dissertation that used protocol studies of mechanical engineers similar to those I did in the late 1980s: Thinking Methods and Procedures in Mechanical Design
, Dissertation, Dylla, N., Technical University of Munich, 1991, in German. From Dylla’s data, I wrote the following and developed the plot (note this is from page 142 in Making Robust Decisions, some copies of which have the wrong plot for the figure)The experimenters measured the amount of time each of the six engineers spent developing criteria. This included reading the given criteria, rereading them, and refining them. Then a team of professional engineers evaluated the technical quality of each design. Part of the evaluation concerned how well the final designs met the criteria, and part was more objective— evaluating the elegance of the solution. The evaluation team scored each of the six designs on a scale of 0 to 100. As figure 6.1 shows, there is a significant relationship between the percentage of time spent analyzing the goals of the problem and the technical quality of the result. The engineers who spent around 7% of their time understanding and developing criteria had a 60% better solution than those who spent 2–3% of their time developing criteria. I don’t mean to imply that 7% is an adequate time for working on the criteria; this particular experiment involved a simple, crafted problem and just one decision maker. The engineers didn’t spend all their criteria time at the beginning of the task. In fact, the successful engineers worked hard to refine the criteria at the beginning and then revisited and refined them many times during the course of the experiment. This result should come as no surprise: a prime measure of the success of a decision is how well the results meet the criteria. In general, the time you spend up front to clarify the problem (understand the criteria) saves time and many headaches later.
Admittedly, I have taken some license that a better mechanical design is analogous to a better decision. I don’t think the leap is very great however as deign is repetitive decision making.
The second point is based partailly on Dylla’s finding (4 of the 6 engineers might have done better had they put in more time on criteria) and partially on the studies done for the book Why Decisions Fail
, by Paul Nutt, Berrett-Koelher, 2004. One of his three decision blunders is “Decision makers base many decisions on premature commitments.” Premature commitment implies that to little time is spent on one or all of the following 1) developing alternative courses of action, 2) developing criteria, 3) evaluating alternatives relative to criteria or 4) managing the decision making strategy. He never breaks this down, but on page 167 he compares the success of four different evaluation tactics: analytical, bargaining, subjective and judgment. Paraphrasing Nutt: In an analytical evaluation, data is gathered and inferences made from analytical tools. In judgment there are no specifics. Thus, analytical methods require more effort on the measures, i.e. the criteria than does judgment. He found a decision adoption rate of 64-75% when analytical methods were used versus 36% -47% for judgment. Unfortunately, Nutt never really addresses the evaluation details and wraps criteria development in with evaluation as many authors do.
All pretty weak stuff. To add useless anecdotal “data”, I see companies do a very poor job of defining criteria for making decisions. One let an RVP with 60+ specs. After reading the 15+ proposals these specs enabled them to separate them into two piles, acceptable and not acceptable. The specs were not really what they needed to make the decision amongst the “acceptable” proposals. They then needed to spend additional time determining what their criteria were for finding the best amongst the acceptable.
Do you have any references that might add to this?
Labels: criteria, criteria development, Ralph Keeny
Many people use pairwise comparisons during their decision making efforts. First, this method may be used to determine the relative importance (the weightings) of the criteria. Second, pairwise comparisons are sometimes used to evaluate the alternatives relative to the criteria. BOTH OF THESE ARE A WASTE OF TIME!! Don’t get me wrong, I think pairwise comparisons can be helpful, just that there are faster ways of getting virtually the same results. In this brief note I will only tackle why you shouldn't bother for finding importance.
Through his books and companies, Tom Saaty has popularized pairwise comparisons as a part of his Analytic Hierarchy
(AHP) and Analytic Network
Processes In the Analytic Network Process book, on pages 26- 31, Saaty gives the example of using eight criteria to help select a house (e.g. Size of House, Transportation, Neighborhood, etc). His method requires that all the criteria be compared to each other one pair at a time to find the most important for each comparison. Further, a dominance factor is given to the better of the pair relating how much more important one factor is to another. If working with a team, they need to come to agreement on the dominance factors (I will come back to this point later). For the 8 criteria, there are 28 comparisons and the need to judge 28 dominance factors. In general, for N factors there are (N-1) + (N-2) +…. 1 comparisons.
For the example in his book (where the numbers represent the 8 criteria (i.e. factors)) a matrix of the pairwise comparisons is a shown below. Note that the opposite entries are just reciprocals of each other. Criterion 1 is 5 times as important as criterion 2 and so criterion 2 is 1/5 as important as criterion 1.
The Priority Vector is reduction of the values (using an eigenvector analysis) to develop the relative weightings - the importance of each criterion.
Compare this to a method proposed by Ward Edwards, one of the fathers of modern decision theory. He suggested that asking decision makers to weigh criteria is so fraught with error that it is easier, and no less accurate, just to ask them to rank the criteria and then automatically set the weights according to the ranking
. This “error” is exasperated when there are multiple constituencies represented in the organization.
Find the rank order the criteria, write each criterion on a sticky note and arrange them on a wall or desk, and reorder with the most important on top. This is best done in a pairwise fashion by selecting the criteria two at a time and asking, “If an alternative could meet only one of these, which criteria would I choose?” Then, move the chosen one to the top and the other to the bottom of the arrangement.
You can convert the ranking to a weighting by using the table below. This table shows the Rank Order Centroid (ROC) method developed by Edwards and shows the weights for up to 12 criteria.
Weights based on ROC method
If you have more than 12 criteria, you can use the equation wk= (1/K) ∑ (1/ i ) as i goes from k (the number of the criterion, with 1 being the highest weighted and K being the lowest) to K (the number of criteria). This equation was used to generate the values in the table. The values for 8 criteria are shaded in the table above and plotted below compared to the pairwise method.
The results of the two methods are shown on the bar chart below.
The Mean Absolute Error between the two (the sum of the absolute differences between the two) is, on average, 2%. Other examples I have tried have even had less error. Considering that there is no right answer and that one change in pairwise comparisons can change the results. The difference between the two comes at the expense of a major difference in the amount of effort. Twenty eight comparisons and assignments of priorities versus simply rank ordering the criteria.
Now, adherents of pairwise comparisons can argue that the method also computes consistency, a measure of how well the many dominance factors agree with each other. I believe this is of little importance as the dominance factors are just averages across a committee of stakeholders who are trying to quantify subjective values. In other words, the uncertainty in the pairwise scoring is so high due to averaging and quantifying subjective values that consistency in the dominance factors is just noise. Consistency analysis gives false comfort that the matrix is consistent when the numbers themselves are very uncertain.
Based on the above arguments, I believe pairwise comparisons are a waste of time. I prefer to allow all the stake holders to rank order and use the resulting inconsistent weighting factors in my analysis. Thus, I honor all party’s values and use them all in downstream analysis. More on this in another note.
Decision Making for leaders: The Analytic Hierarchy Process for Decisions in a complex World, Thomas L. Saaty,RSW Publications, Pittsburgh PA, 1999
The Analytic Network Process, Thomas L. Saaty, RWS Publications, 1996.
Edwards, Ward and F. Hutton Barron, "SMARTS and SMARTER: Improved Simple Methods for Multi-attribute Utility Measurement" and F. Hutton Barron and Bruce Barrett, "Decision Quality Using Ranked and Partially Ranked Attribute Weights."
Labels: criteria importance, criterion importance, pair-wise comparison, Pairwise comparison, Saaty
I was listening to NPR the other day and heard the word “constult”. This word is in Oxford English Dictionary (I checked). It is a verb that means “To play the fool together.” Both examples in the OED are from the 17th century:
1630 J. TAYLOR (Water P.) World's eighth Wonder Wks. II. 67/1 Some English Gentlemen with him consulted And he as nat'rally with them constulted. 1659 GAUDEN Slight Healers (1660) 91 What do they meet, and sit, and consult (or rather constult) together?
So, this brings up the question (posed on the Car Talk web site); “Do two people who don’t know what they are talking about know more or less than one?” This is worth asking since most hard decisions are plagued by a lack of knowledge. In fact, this was this question (not so well articulated) that led me to start thinking about decision making in the mid 1990s.
When I am designing a new product, I don’t know much about the new details of it. Designing is learning. Often I will enlist others to help on the functions and features I know the least about. Are these colleagues consulting or constulting, and if the latter, why do I seek their help?
I think I know the answer. If I know absolutely nothing about something and my response is a wild guess, then the probability of my being right is 50%. If another person is added to my team and they too know nothing then she is truly constulting, together we know nothing and the probability is still 50%. However, if I know a little and what I know is correct, then my probability is greater than 50%. And, if I add a colleague and she knows a little then the probability of us together being correct is greater still.
As a numerical example (taken from “Making Robust Decisions” page 231), say that I know enough that the probability that I am right is 68%. If another person is added to the team and they independently also have a probability of being correct of 68%, then our fused probability is 82%. If there are three of us, it raises to 91%. The theory behind these numbers is in the book, but they do make intuitive sense.
This also supports the examples in the book “Wisdom of Crowds” (a good read). Obviously there are some limitations. What if members of the team disagree on the answer? What about Group Think? What is the level of knowledge is less than estimated? All of these can lead to constulting.
So to answer the question:
- Two people who know nothing know as much as one person who knows nothing.
- Two people who know a little independently (no group think or peer influence) and their meager knowledge is correct, know more than one person. This is how we get to the wisdom of crowds. If their knowledge is conflicting, they still know more, just that answer is still in doubt.
- Two people who know a little and they have influenced each other through peer pressure (think alpha males), group think or other mechanism, or one or more of them knows less than they think they do; together they may actually know less and be constulting.
The goals then are to know when you know nothing and how to avoid the pitfalls that lead to constulting. I have been struggling with these goals for about ten years.
Labels: constulting, team decision making, team knowledge, wisdom of crowds