Discuss: The Myth of Usability Testing
by Robert Hoekman Jr.
- Editorial Comments
12 Usability Testing subjects are not the real users
The biggest question mark for me is how valid is the usability data when it was produced using subjects who are not the real users of the application.
For instance, you can tell me to buy something from a website(me as a usability test subject), but even if the direction informs me I’m buying a product, if I’m not the real customer I may not really understand what I’m looking for. There’re things that a true customer considers before making a purchase or not, that a usability subject may bypass, in effect inventing the truth about the flow of a real-life user. To spend additional funds and change the direction of a site altogether based on the ‘data’ of a non-user test subjects, how valuable has it proven to be?
posted at 06:14 pm on October 22, 2009 by Minu
13 Great reading
Thanks for a great article, Robert – highly interesting reading!
I get the feeling that conclusions from tests often are drawn too early in order to find some kind of business short-cut. This should certainly be good and educative reading for them… :-)
posted at 08:21 am on October 23, 2009 by esset
14 Fish gotta swim, birds gotta fly. And bloggers go
But in this day of USA Today attention spans, and especially given our discipline’s struggle for respectability and acceptance, there is danger in the titillating but misleading article title or the carelessly arrived at, but well written, conclusion.
I write in reaction to “The Myth of Usability Testing” by Robert Hoekman Jr. (http://www.alistapart.com/articles/the-myth-of-usability-testing/).
Misleading title # 1: The article title implies either that a) all of usability testing is a myth, or b) there is only one myth associated with usability testing. As “Mashhoor” asked in his/her post to the discussion about the article, “. . . couldn’t you pick a better title? I’m getting linked to this article by people who know me (since I specialize in this field) who think that it’s against usability testing.” Indeed, Hoekman Jr. says, in discussion item #10, “I knew [the title] would probably get exactly this type of reaction, and . . . I decided the potential controversy could only draw attention to [usability testing]. . . . I wholeheartedly support running usability studies.” I hope all the readers who think the title might suggest otherwise choose to read this far.
Sloppy or misleading conclusion # 1: In discussing Molich’s CUE-2 test, Hoekman Jr. says, “Collectively, the teams reported 340 usability problems. However, only nine of these problems were reported by more than half of the teams. And a total of 205 problems—60% of all the findings reported—were identified only once. Of the 340 usability problems identified, 61 problems were classified as ‘serious’ or ‘critical’ problems.”
“Think about that for a moment.”
“For the Hotmail team to have identified all of the ‘serious’ usability problems discovered in the evaluation process, it would have to have hired all nine usability teams.”
In stark contrast to Hoekman Jr.’s conclusion that usability testing can’t possibly be cost-effective is Molich’s own conclusion: “Realize that single tests aren’t comprehensive. They’re still useful, however, and any problems detected in a single professionally conducted test should be corrected” (http://www.dialogdesign.dk/CUE-2.htm). Also, in summarizing CUE-4 (http://www.dialogdesign.dk/CUE-4.htm), Molich says: “Many of the teams obtained results that could effectively drive an iterative process in less than 25 person-hours. Teams A and L used 18 and 21 hours, respectively, to find more than half of the key problem issues, but with limited reporting requirements.”
Misleading title # 2: First major header – “Why usability evaluation is unreliable.” Even if some usability evaluation is unreliable – and given the low barriers to entry for the field of usability engineering, who would be surprised? — that doesn’t mean all usability evaluation is unreliable. Indeed, Hoekman Jr. goes on in this section to describe BAD usability evaluation (e.g., “Right Questions, Wrong People, and Vice Versa”). With this I agree totally – bad usability evaluations are unreliable, and are just generally, um, bad. I wonder if a better header for this section might have been “Some things that lead to unreliability of usability evaluations”? Or maybe “Good methods gone bad”?
Sloppy or misleading conclusion # 2: “Usability evaluations are good for a lot of things, but determining what a team’s priorities should be is not one of them.”
Allow me to observe that usability evaluations are also poor for Julienning fries – for that I’d recommend a Veg-o-Matic. For establishing your team’s priorities, I’d recommend, oh, some sorta business process. But if your goal is to identify and prioritize potential problems your users may have with your product or site design – well then, usability evaluation can kick Veg-o-Matic ass. Which brings me to the best part of the Hoekman Jr. article . . .
Great, representative illustration # 1 – the graphic at the head of the article, drawn by Kevin Cornell, showing a hammer resting against a bent and undriven screw. EXACTLY. Here, a hammer is the wrong tool for the job. There are many jobs for which usability evaluation is the wrong tool, but, as with the hammer, many for which it is the right tool.
Sloppy or misleading conclusion # 3: “It’s only natural that existing users perform tasks capably and comfortably despite poor task design. After all, the most usable application is the one you already know. But this doesn’t mean poor designs should not be revamped. Rather, to adapt to and harness the power of usability testing, current users should be brought in to test new ideas—ideas that surface from expert evaluation and collaboration with designers to create new solutions.” Yes, and non-current-but-still-representative users may be brought in, at any time, to evaluate old and new interfaces. Why the focus on only current users? If one tested only current users, it would be another example of “the wrong people for the right question.”
Wheel Rediscovery # 1: “To identify problems on which to focus, these teams, and yours, can take a variety of approaches. Consider a revised workflow that begins with an expert-level heuristic evaluation used in conjunction with informal testing methods, followed by informal and formal testing. More specifically, consider using online tools and paid services to investigate hunches, then use more formal methods to test and validate revised solutions that involve a designer’s input.” Yes, this sounds like a fairly thorough course of User-Centered Design (UCD) (see Vredenburg, Isensee, and Righi, 2002), though there are earlier steps of user-based requirements gathering that are also important. (Though it seems odd to parry “Usability evaluation may be too costly” with “Go with a heuristic evaluation and informal methods, plus some more informal and formal testing.”) Molich, in his CUE-2 summary, offers “Use an appropriate mix of methods.”
Odd, unsubstantiated claim # 1: “Here are several tools that can be used with a heuristic evaluation to identify trouble spots: Five-second tests: . . . Click stats: . . . Usability testing services: . . .Click stats on screenshots: . . . . In handling usability projects in this way, teams will identify priorities and achieve better outcomes, and can still gain all the benefits of being actively involved with usability tests.” So, heuristic evaluation plus these remote, unmoderated testing tools yield the same benefits as usability testing? I wonder. It’s an empirical question, and in my opinion it’s the next big question for our field – the empirical comparison of the value of usability engineering methods; which methods at which points in the development cycle of which types of user interfaces? (Alas, so far the National Science Foundation doesn’t agree with me, that answering this question is worthy of funding.)
Odd, but widely-shared misconception #1: “Obviously, not every team or organization can bear the expense of usability testing.” Which teams would that be? For which teams is it OK to “just get something out there and let our first users be our first test participants”? (I am NOT quoting Hoekman Jr., here – rather, it’s a snarky but too-often-deserved characterization of development teams’ approach.) Which teams are OK with the potential costs of a post-ship rework of the product, PLUS the alienating of those users who struggled to learn how to interact with that first design, given that “After all, the most usable application is the one you already know”? Which teams (and ya’ gotta be able to identify ‘em in advance, right?) are going to be those teams that happen to get the design right the first time?
So, to summarize, in my should-be-humbler opinion: – yes, usability evaluations can be pursued at the wrong time, and can be performed poorly even when the timing is good; – but that is true of any method or tool in software (or any) engineering, and no reason for criticism of the method itself; – usability evaluation, applied and conducted well, IS a tried-and-true technique for identifying potential usability problems; – but maybe not all the problems; and so – yes, we need to get better at choosing and applying usability engineering methods.
I’m workin’ on that.
posted at 05:40 pm on October 23, 2009 by Randolph Bias
15 Free usability data
Though the discussant gets no feedback on this (and the title does not appear in the preview when it is cut-and-pasted from the comment itself), there’s a limit to the length of a message title. For my previous post the intended title was: “Fish gotta swim, birds gotta fly. And bloggers gotta blog.”
posted at 05:43 pm on October 23, 2009 by Randolph Bias
16 Well put, Randolph!
But actually, I do agree that “bloggers go…” :-)
posted at 08:58 pm on October 23, 2009 by Nick Gould
17 RE Fish Gotta Swim ...
Randolph: Thanks for your comments. Unfortunately, it would take more time than I have available to address everything you brought up, so I’ll have to leave it up to readers to parse it all and form their own opinions, but there are a few points I feel I must address.
1. Regarding “Sloppy or misleading conclusion # 1”: Molich’s conclusion isn’t at all in contrast to my own conclusion. I said the Hotmail and Hotel Penn teams would’ve had to hire all 9 and 17 teams, respectively, to identify all the issues spotted during the CUE experiments. Molich, in different terms but with effectively the same message, said “single tests aren’t comprehensive.”
2. Regarding “Sloppy or misleading conclusion # 2”: Of course you can identify problems with a design through testing, but by using the method to prove out a hypothesis, not by using it as a discovery tool. You took the statement out of context. In context of the surrounding paragraphs, you can see that the statement is about the ineffectiveness of determining what a team’s priorities should be when testing is used as a discovery method.
3. Regarding “Odd, unsubstantiated claim # 1”: Again, you’ve taken this out of context. A benefit of testing I spoke of in the article is that of feeding a designer’s instincts. Informal testing methods can most definitely provide that benefit. And as Molich clearly demonstrated, full usability studies are no more predictable or consistent between teams than heuristic evaluations, so yes, teams who are trying to determine priorities can gain exactly that benefit through informal methods just the same as with formal methods. Neither is more correct than the other.
There are many myths of usability testing — I’d need a much, much longer article to cover them all. And the fact is, since human beings are involved in every last usability study performed, and no study is an exact or perfect process (because it can’t be), the results are bound to be wildly inconsistent. A widely-held belief, though, is that testing is scientific. Clearly, it’s not. It’s important that people understand myths like this one before throwing significant amounts of money and time at a method that won’t necessarily work for them.
Thanks very much for joining in the discussion. I love that this topic has sparked so much debate. It’s exactly what we need.
posted at 12:17 pm on October 24, 2009 by Robert Hoekman Jr.
18 Are you kidding?
As a certified human factors engineering professional CHFP and over 30 years experience with complex usability issues your piece grossly miss-represents the intent and structure of that from of usability analysis known as “heuristics”. To those with a serious background in usability the studies you mention are known to be grossly misleading and poorly executed. Finally, for the record Jacob Nielsen DID NOT invent heuristics. The process was well understood and used successfully in many military applications before JN was born.
Charles L. Mauro CHFP
President/Founder
MauroNewMedia
posted at 03:31 pm on October 25, 2009 by CM2
19 Don't throw the baby out with the bathwater
Thanks for writing such a provocative article. While I agree that usability testing isn’t the right tool to identify the answer to every question about an interface, I’m not really sure I follow your logic here.
In the examples you cite, the research teams made some pretty huge recruitment gaffes. Clearly, if you test with the wrong audience, and ask them the wrong questions, your findings aren’t going to be worth shit. But that doesn’t mean the method is lacking; the implementation is.
Do you have any anecdotes illustrating the method’s shortcomings that DON’T involve research teams that made some serious newbie failures?
I also don’t understand your assertion that:
“… usability testing fails wholly to do what many people think is its most pertinent and relevant purpose—to identify problems and point a team in the right direction …”
Setting aside your examples of poorly run testing scenarios, I really don’t see how you can make this kind of assertion. It’s been my experience that usability testing is a terrific method to identify problems in an existing interface. Am I misunderstanding your point? Could you clarify what you mean?
After reading this article multiple times now, the main point I am left with is this: Don’t hire usability testers who don’t know what they’re doing. I wholeheartedly agree with this sentiment. But as far as I can tell, there’s no clear evidence given here to justify using inflammatory phrases like “the myth of usability testing” or “why usability evaluation is unreliable”.
I just don’t buy it.
posted at 10:52 am on October 26, 2009 by Angela Colter
20
Nice article, having studied psychology I know a lot about field studies, experiments and what to look for in them – for example, these usability tests may not have been in the right conditions so whilst they may be correct do a degree, some of these usability problems may not be real problems when used by a normal person.
I’m not saying that the job these people do is unecessary, I’m saying that it should always be taken with a pinch of salt and analysed further.
posted at 04:41 am on October 27, 2009 by traxor
Got something to say?
Discuss this article. We reserve the right to delete flames, trolls, and wood nymphs.
Create a new account or sign in below if you’d like to leave a comment.
Subscribe to this article's comments: RSS (what’s this?)



11 Myth?
Robert, from some of your comments I understand that the “myth” you are referring to in the article title is that usability research is good for “determining what to focus on next.” But I didn’t really get that from the article itself. Are you saying that usability testing and evaluation (BTW, are you equating the two for the purposes of your thesis?) are bad methods for deriving high-level strategic guidance? I would definitely agree with that, but mainly because the method focuses on too-granular issues and is limited in terms of research participants. But then what is the significance of the Molich story and the other examples you cite of, quite frankly, faulty thinking / planning? These stories are worrisome, for sure, and worthy of further study / discussion in and of themselves. But I’m not sure that they say anything positive at all about usability methodologies – regardless of the purpose or intent. In fact, I hope none of my clients read your title and the first few paragraphs alone…and leave thinking that usability research is unreliable!
Some other questions: Doesn’t your Blink reference support the idea that a good usability expert can provide value? Does that mean that the Molich evaluators were just incompetent? Also, what about just using usability research for its intended purpose – to identify specific design problems that impede user success and / or fail to encourage behaviors that the site wants to encourage (engagement, exploration, interaction, etc.)? Is this a “good” use or a “bad” use of the methodologies?
I know there must be more to this than “use the right tool for the job” and use it properly… but I confess that I’m not seeing it. Help?
posted at 02:55 pm on October 22, 2009 by Nick Gould