Skip to document

A Comparison of Questionnaires for Assessing

Course

Public International Law

116 Documents
Students shared 116 documents in this course
Academic year: 2016/2017
Uploaded by:
0followers
2Uploads
0upvotes

Comments

Please sign in or register to post comments.

Preview text

UPA 2004 Presentation—Page 1 A Comparison of Questionnaires for Assessing Website Usability Thomas S. Tullis and Jacqueline N. Stetson Human Interface Design Department, Fidelity Center for Applied Technology Fidelity Investments 82 Devonshire St., V4A Boston, MA 02109 Contact: tom@fidelity ABSTRACT: Five questionnaires for assessing the usability of a website were compared in a study with 123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of Microsoft’s Product Reaction Cards, and one that we have used in our Usability Lab for several years. Each participant performed two tasks on each of two websites: finance.yahoo and kiplinger. All five questionnaires revealed that one site was significantly preferred over the other. The data were analyzed to determine what the results would have been at different sample sizes from 6 to 14. At a sample size of 6, only 30-40% of the samples would have identified that one of the sites was significantly preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at least 90% of the time. Introduction A variety of questionnaires have been used and reported in the literature for assessing the perceived usability of interactive systems, including QUIS [3], SUS [2], CSUQ [4], and Microsoft’s Product Reaction Cards [1]. (See [5] for an overview.) In our Usability Lab, we have been using our own questionnaire for the past several years for assessing subjective reactions that participants in a usability test had to a web site. However, we had concerns about the reliability of our questionnaire (and others) given the relatively small number of participants in most typical usability tests. Consequently, we decided to conduct a study to determine the effectiveness of some of the standard questionnaires, plus our own, at various sample sizes. Our focus was specifically on websites. Method We decided to limit ourselves to our own questionnaire plus those in the published literature that we believed could be adapted to evaluating websites. The questionnaires we used were as follows (illustrated in Appendix A): 1. SUS (System Usability Scale)—This questionnaire, developed at Digital Equipment Corp., consists of ten questions. It was adapted by replacing the word “system” in every question with “website”. Each question is a statement and a rating on a fivepoint scale of “Strongly Disagree” to “Strongly Agree”. 2. QUIS (Questionnaire for User Interface Satisfaction)—The original questionnaire, developed at the University of Maryland, was composed of 27 questions. We dropped three that did not seem to be appropriate to websites (e., “Remembering names and use of commands”). The term “system” was replaced by “website”, and the term “screen” was generally replaced by “web page”. Each question is a rating on a ten-point scale with appropriate anchors at each end (e., “Overall Reaction to the Website: Terrible … Wonderful”). 3. CSUQ (Computer System Usability Questionnaire)—This questionnaire, developed at IBM, is composed of 19 questions. The term “system” or “computer system” was A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 2 replaced by “website”. Each question is a statement and a rating on a seven-point scale of “Strongly Disagree” to “Strongly Agree”. 4. Words (adapted from Microsoft’s Product Reaction Cards)—This questionnaire is based on the 118 words used by Microsoft on their Product Reaction Cards [1]. (We are grateful to Joey Benedek and Trish Miner of Microsoft for providing the complete list.) Each word was presented with a check-box and the user was asked to choose the words that best describe their interaction with the website. They were free to choose as many or as few words as they wished. 5. Our Questionnaire—This is one that we have been using for several years in usability tests of websites. It is composed of nine statements (e., “This website is visually appealing”) to which the user responds on a seven-point scale from “Strongly Disagree” to “Strongly Agree”. The points of the scale are numbered -3, -2, -1, 0, 1, 2, 3. Thus, there is an obvious neutral point at 0. Note that other tools designed as commercial services for evaluating website usability (e., WAMMI [6], RelevantView [7], NetRaker [8], Vividence [9]) were not included in this study. Some of these tools use their own proprietary questionnaires and some allow for the construction of your own. The entire study was conducted online via our company’s Intranet. A total of 123 of our employees participated in the study. Each participant was randomly assigned to one of the five questionnaire conditions. Each was asked to perform two tasks on each of two wellknown personal financial information sites: finance.Yahoo and Kiplinger. (In the rest of this paper they will simply be referred to as Site 1 and Site 2. No relationship between the site numbers and site names should be assumed.) The two tasks were as follows: 1. Find the highest price in the past year for a share of <company name>. (Note that a different company was used in each task.) 2. Find the mutual fund with the highest 3-year return. The order of presentation of the two sites was randomized so that approximately half of the participants received Site 1 first and half received Site 2 first. After completing (or at least attempting) the two tasks on a site, the user was presented with the questionnaire for their randomly selected condition. Thus, each user completed the same questionnaire for the two sites. (Technically, “questionnaires” was a between-subjects variable and “sites” was a within-subjects variable.) Data Analysis For each participant, an overall score was calculated for each website by simply averaging all of the ratings on the questionnaire that was used. (All scales had been coded internally so that the “better” end corresponded to higher numbers.) Since the various questionnaires use different scales, these were converted to percentages by dividing each score by the maximum score possible on that scale. So, for example, a rating of 3 on SUS was converted to a percentage by dividing that by 5 (the maximum score for SUS), giving a percentage of 60%. Special treatment was required for the “Words” condition since it did not involve rating scales. Before the study, we classified each of the words as being “Positive” (e., “Convenient”) or “Negative” (e., “Unattractive”). (Note that they were not grouped or identified as such to the participants.) For each participant, an overall score was calculated by counting the total number of words that person selected and then dividing that number A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 4 CSUQ 160 140 Frequency 120 100 Site 1 80 Site 2 60 40 20 0 20% 40% 60% 80% 100% Percentage of Maximum Rating Figure 3. Results using CSUQ. Survey 4: Words 12 Frequency 10 8 Site 1 6 Site 2 4 2 0 20% 40% 60% 80% 100% Percentage of Maximum Score Figure 4. Results using Microsoft’s Words A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 5 Our Questionnaire 120 100 Frequency 80 Site 1 60 Site 2 40 20 0 20% 40% 60% 80% 100% Percentage of Maximum Score Figure 5. Results using our questionnaire. Comparison of Means 80% 74% 73% 74% 66% 70% 60% Mean Score 72% 50% 50% 48% 52% 48% Site 1 38% 40% Site 2 30% 20% 10% 0% SUS QUIS CSUQ Words Ours Survey Figure 6. Comparison of mean scores for each site using each questionnaire. All five questionnaires showed that Site 1 was significantly preferred over Site 2 (p<.01 via t-test for each). The largest mean difference (74% vs. 38%) was found using the Words questionnaire, but this was also the questionnaire that yielded the greatest variability in the responses. Both of these points are apparent from examination of Figure 4, where you can A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 7 Conclusions First, some caveats need to be pointed out about the interpretation of these data. The primary one is that they really only directly apply to the analysis of the two sites that we studied. We selected two popular sites that provide financial information, finance.Yahoo and Kiplinger. We chose these sites because they provide similar kinds of information but in different ways. Had the two sites studied been even more similar to each other, it would have been more difficult for any of the questionnaires to yield a significant difference. Likewise, if they had been more different, it would have been easier for any of the questionnaires to yield a significant difference. Another caveat is that the users’ assessments of these sites were undoubtedly affected by the two tasks that we asked them to do on those sites. Again, we did not choose tasks that we thought would be particularly easier or more difficult on one site vs. the other. We chose tasks that we thought were typical of the tasks people might want to do on these kinds of sites. It’s also possible that the results could have been somewhat different if we had been able to collect data from more participants using each questionnaire. The minimum number of participants that we got for any one questionnaire was 19. Some researchers have argued that still larger numbers of participants are needed to get reliable data from some of these questionnaires. While that may be true, one of our goals was to study whether any of these questionnaires yield reliable results at the smaller sample sizes typically seen in usability tests. Finally, this paper has only addressed the question of whether a given questionnaire was able to reliably distinguish between the ratings of one site vs. the other. In many usability tests, you have only one design that you are evaluating, not two or more that you are comparing. When evaluating only one design, possibly the most important information is related to the diagnostic value of the data you get from the questionnaire. In other words, how well does it help guide improvements to the design? That has not been analyzed in this study. Interestingly, on the surface at least, it appears that the Microsoft Words might provide the most diagnostic information, due to the potentially large number of descriptors involved. Keeping all of those caveats in mind, it is interesting to note that one of the simplest questionnaires studied, SUS (with only 10 rating scales), yielded among the most reliable results across sample sizes. It is also interesting that SUS is the only questionnaire of those studied whose questions all address different aspects of the user’s reaction to the website as a whole (e., “I found the website unnecessarily complex”, “I felt very confident using the website”) as opposed to asking the user to assess specific features of the website (e., visual appearance, organization of information, etc). These results also indicate that, for the conditions of this study, sample sizes of at least 12-14 participants are needed to get reasonably reliable results. REFERENCES 1. Benedek, J., & Miner, T. (2002). Measuring desirability: New methods for evaluating desirability in a usability lab setting. Proceedings of UPA 2002 Conference, Orlando, FL, July 8-12, 2002. A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 8 2. Brooke, J. (1996). SUS: A Quick and Dirty Usability Scale. In: P. Jordan, B. Thomas, B. Weerdmeester & I. McClelland (Eds.), Usability Evaluation in Industry. London: Taylor & Francis. (Also see cee.hw.ac/~ph/sus) 3. Chin, J. P., Diehl, V. A, & Norman, K. (1988). Development of an instrument measuring user satisfaction of the human-computer interface, Proceedings of ACM CHI '88 (Washington, DC), pp. 213-218. (Also see acm/~perlman/question?form=QUIS and lap.umd/QUIS/index.html) 4. Lewis, J. (1995). IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of Human-Computer Interaction, 7 (1,) 1995, 57-78. (Also see acm/~perlman/question?form=CSUQ) 5. Perlman, G. (Undated). Web-Based User Interface Evaluation with Questionnaires. Retrieved from acm/~perlman/question on Nov. 7, 2003. 6. WAMMI: wammi 7. RelevantView: relevantview/ 8. NetRaker: netraker/ 9. Vividence: vividence/ Appendix A: Screenshots of the Five Questionnaires Used SUS A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 10 CSUQ A Comparison of Questionnaires for Assessing Website Usability UPA 2004 Presentation—Page 11 Words (based on Microsoft’s Product Reaction Cards) A Comparison of Questionnaires for Assessing Website Usability

Was this document helpful?

A Comparison of Questionnaires for Assessing

Course: Public International Law

116 Documents
Students shared 116 documents in this course
Was this document helpful?
UPA 2004 Presentation—Page 1
A Comparison of Questionnaires for Assessing Website Usability
Thomas S. Tullis and Jacqueline N. Stetson
Human Interface Design Department, Fidelity Center for Applied Technology
Fidelity Investments
82 Devonshire St., V4A
Boston, MA 02109
Contact: tom.tullis@fidelity.com
ABSTRACT:
Five questionnaires for assessing the usability of a website were compared in a study with
123 participants. The questionnaires studied were SUS, QUIS, CSUQ, a variant of
Microsoft’s Product Reaction Cards, and one that we have used in our Usability Lab for
several years. Each participant performed two tasks on each of two websites:
finance.yahoo.com and kiplinger.com. All five questionnaires revealed that one site was
significantly preferred over the other. The data were analyzed to determine what the
results would have been at different sample sizes from 6 to 14. At a sample size of 6, only
30-40% of the samples would have identified that one of the sites was significantly
preferred. Most of the data reach an apparent asymptote at a sample size of 12, where two
of the questionnaires (SUS and CSUQ) yielded the same conclusion as the full dataset at
least 90% of the time.
Introduction
A variety of questionnaires have been used and reported in the literature for assessing the
perceived usability of interactive systems, including QUIS [3], SUS [2], CSUQ [4], and
Microsoft’s Product Reaction Cards [1]. (See [5] for an overview.) In our Usability Lab, we
have been using our own questionnaire for the past several years for assessing subjective
reactions that participants in a usability test had to a web site. However, we had concerns
about the reliability of our questionnaire (and others) given the relatively small number of
participants in most typical usability tests. Consequently, we decided to conduct a study to
determine the effectiveness of some of the standard questionnaires, plus our own, at
various sample sizes. Our focus was specifically on websites.
Method
We decided to limit ourselves to our own questionnaire plus those in the published literature
that we believed could be adapted to evaluating websites. The questionnaires we used were
as follows (illustrated in Appendix A):
1. SUS (System Usability Scale)—This questionnaire, developed at Digital Equipment
Corp., consists of ten questions. It was adapted by replacing the word “system” in
every question with “website”. Each question is a statement and a rating on a five-
point scale of “Strongly Disagree” to “Strongly Agree”.
2. QUIS (Questionnaire for User Interface Satisfaction)—The original questionnaire,
developed at the University of Maryland, was composed of 27 questions. We
dropped three that did not seem to be appropriate to websites (e.g., “Remembering
names and use of commands”). The term “system” was replaced by “website”, and
the term “screen” was generally replaced by “web page”. Each question is a rating
on a ten-point scale with appropriate anchors at each end (e.g., “Overall Reaction to
the Website: Terrible … Wonderful”).
3. CSUQ (Computer System Usability Questionnaire)—This questionnaire, developed at
IBM, is composed of 19 questions. The term “system” or “computer system” was
A Comparison of Questionnaires for Assessing Website Usability