1The Delta Centre, Norwegian Directorate of Health, 0130 Oslo, Norway, firstname.lastname@example.org
2Department of Computer and Information Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway, email@example.com
The Web Content Accessibility Guidelines (WCAG) by the Web Accessibility Initiative (WAI) have become the de-facto standard for accessibility on the Web. WCAG 1.0 has become significant both as a practical tool as well as an academic set of principles, and is presently the basis of Web accessibility evaluations and guidelines in many countries. WCAG 2.0 was released in 2008. The purpose of the reported study has been to validate empirically the usefulness of using the WAI accessibility guidelines as a heuristic for website accessibility. Through controlled usability tests of two websites with disabled users (N=7) and a control group (N=6), we found that only 27% of the identified website accessibility problems could have been identified through the use of WCAG 1.0. A similar analysis of conformance to WCAG 2.0 showed a marginal 5% improvement concerning identified website accessibility problems. We conclude from this that the application of the WAI accessibility guidelines is not sufficient to guarantee website accessibility. We recommend that future versions of accessibility guidelines should be based on empirical data and validated empirically.
As a part of the World Wide Web Consortium (W3C), the Web Accessibility Initiative (WAI) produced the first version of the Web Content Accessibility Guidelines (WCAG) in 1999 (Chisholm et al. 1999).
It has since been widely recognized that special care should be taken to include web users with disabilities.
WCAG 1.0 has been widely used both as a design guideline and as a heuristic in website evaluations. A number of national evaluations of public websites have included criteria on accessibility from WCAG 1.0. WCAG 1.0 is used as a basis for policy-making and website testing when it comes to online accessibility in several countries, such as Korea (Hyun et al. 2005), Norway (Norge.no 2008), the Netherlands (Web Guidelines 2007) and Denmark (IT 2009). WCAG 1.0 is also the basis for the Unified Web Evaluation Methodology (UWEM) (Velleman et al. 2007). Since WCAG 2.0 (Caldwell 2008) was released in December 2008, it is expected that these organisations will migrate to the new W3C recommendation.
For a widely used guideline like WCAG it is reasonable to ask for empirical evidence that conformance to the guidelines will guarantee accessibility for disabled users. There have been surprisingly few attempts at validating WCAG empirically, and what has been done does not give conclusive evidence that following WCAG will result in accessible websites for all. WAI has never published any indications of what it would see as an acceptable level of match between actual problems encountered by disabled users and problems that can be identified with WCAG.
The purpose of the present study has been to validate empirically the usefulness of using the WAI accessibility guidelines as a heuristic for website accessibility. It is our belief that any discussion about improvements to WCAG must be based on empirical evidence on its usefulness for practical web design, development and evaluation.
WCAG 1.0 was developed during the late 1990s and finalised as a W3C recommendation in May 1999. It consists of 14 high-level guidelines and 65 specific checkpoints. Each checkpoint has a priority level between 1 and 3 based on the checkpoint's impact on accessibility.
WCAG 1.0 defines the three priority levels as:
Recognizing that WCAG 1.0 would become outdated, the W3C formed a working group in 2000 to develop WCAG 2.0 as the second version of the W3C Web Content Accessibility Guidelines.
Since the year 2000, the Web has changed dramatically. It is no longer an HTML-only world. It has evolved into an exciting, compelling medium for providing innovative services. One of the major goals of WCAG 2.0 was to describe the requirements for Web content accessibility in technology neutral language so that it could be applicable to any W3C or non-W3C technology, such as CSS, SMIL, SVG, XML, PDF, or Flash in addition to HTML and XHTML. A second major goal of WCAG 2.0 was to ensure that the requirements are all objectively testable so that policy makers can adopt them unchanged (Reid 2008).
WCAG 2.0 became an official W3C recommendation in December 2008. Compared to WCAG 1.0, the guidelines are no longer technology specific and the requirements are organized around four general principles of accessibility, 12 guidelines and 61 success criteria. WCAG 1.0 Priority levels 1, 2 and 3 correspond to conformance levels A (lowest), AA, and AAA (highest) in WCAG 2.0.
The four general principles of accessibility lay the foundation necessary for anyone to access and use Web content. Anyone who wants to use the Web must have content that is:
If any of these are not true, users with disabilities will not be able to use the Web.
Under each of the principles are guidelines and success criteria that help to address these principles for people with disabilities. There are many general usability guidelines that are designed to make content more usable by all people, including those with disabilities. However, WCAG 2.0 only includes those guidelines that address problems particular to people with disabilities. This includes issues that block access or interfere with access to the Web more severely for people with disabilities.
Additionally, in order for a web page to conform to WCAG 2.0, five specific conformance requirements must be satisfied:
In addition to the principles, guidelines and success criteria, there is a set of Sufficient and Advisory Techniques, which documents a wide variety of techniques for each of the guidelines and success criteria in the WCAG 2.0 document itself.
Because of the technology independent nature of WCAG 2.0, a number of WCAG 1.0 checkpoints have been deemed obsolete. Most of the dropped checkpoints relate to either outdated technology (ASCII-art), specific technology (mainly HTML) or clauses that have been met (W3C 2008). Instead, WCAG 2.0 sometimes refer to sufficient and advisory techniques.
The international standard ISO 9241-171, Ergonomics on human-system interaction - Part 171: Guidance on software accessibility (ISO 2008), provides guidance on the design of the software of interactive systems so that those systems achieve as high a level of accessibility as possible. Designing human-system interaction to increase accessibility promotes increased effectiveness, efficiency and satisfaction for people having a wide variety of capabilities and preferences. Accessibility is therefore strongly related to the concept of usability.
This part of ISO 9241 provides guidance for incorporating accessibility goals and features in the design as early as possible and addresses the increasing need to consider social and legislative demands for ensuring accessibility by the removal of barriers that prevent people from participating in life activities. This part of ISO 9241 is applicable to software the forms part of interactive systems used in the home, in leisure activities, in public situations and at work. For additional guidance on the accessibility of Web content, the standard refers to WCAG 2.0.
This part of ISO 9241 is based on the current understanding of the characteristics of individuals who have particular physical, sensory and/or cognitive impairments. However, accessibility is an issue that affects many groups of people. The intended users of interactive systems are consumers or professionals - people at home, at school, engineers, clerks, salespersons, Web designers etc. The individuals in such target groups vary significantly as regards to physical, sensory and cognitive abilities and each target group will include people with different abilities. Thus, people with disabilities do not form a specific group that can be separated out and then discarded.
Accessibility for interactive systems is defined as the usability of a product, service, environment or facility by people with the widest range of capabilities. Usability is the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. Specified users include people with very wide ranging abilities, and, very probably, some "people with disabilities". Specified users and people with disabilities are not separate groups.
The standard consists of 21 guidelines and 143 requirements, where 62 of the requirements must be met in order to claim conformance with this part of ISO 9241. The requirements deal with general issues, inputs, outputs and online documentation.
The study of 1000 websites conducted for the Disability Rights Commission (DRC 2004) found that 45 % of the problems encountered by disabled users could not be attributed to explicit violations of WCAG 1.0. A similar study of a sample of international museum websites found that the museum website with the highest conformance to WCAG 1.0 was the one that disabled users found most difficult to use (Petrie et al. 2005).
The study by Lopes and Carrico (Lopes et al. 2008) allowed them to verify that, despite being able to control several aspects of accessibility quality, template mechanisms such as those of Wikipedia cannot guarantee a high quality of user experience to the audiences covered by WCAG 1.0.
Harrison and Petrie (Harrison et al. 2006) focused on severity ratings of actual problems encountered in usability tests versus priority levels in WCAG. They did user testing of six websites with two visually impaired people, two dyslectics and two controls. The participants were asked to rate the severity of the problems they experienced on the websites. In addition, the researchers made independent ratings of the severity of the problems. When comparing these results to WCAG 1.0, the researchers found no significant relationship between WCAG 1.0 priority levels and either the expert ratings or the user ratings. Harrison and Petrie concluded that developers should obtain severity ratings from users or an expert rather than relying on those provided by the WAI guidelines.
Petrie and Kheir (Petrie et al. 2007) performed a study with 6 disabled (blind) and 6 non-disabled (sighted) people, where they gathered empirical data through usability testing of two commercial websites. In this study, the researchers also obtained severity ratings from the participants and the researchers. Problems encountered by the two user groups comprised two intersecting sets, with approximately 15 % overlap. There was high agreement between participants as to the severity of the problems, and agreement between participants and researchers. However, there was no significant agreement between either participants or researchers and the priority levels given by WCAG 1.0. This study thus confirmed the findings in (Harrison et al. 2006).
WCAG is an accessibility guideline, and an empirical validation must consequently exclude the usability problems from the problem set. The inclusion of a control group makes it possible to differentiate between usability problems and accessibility problems. To our knowledge, Petrie and Kheir's study is the first published empirical validation of WCAG 1.0 based on a comparison of the performance of disabled users and a control group. They defined usability problems as those experienced by both disabled users and the control group, while accessibility problems were those experienced only by the disabled users.
This fits well with the ISO definition of accessibility. ISO defines accessibility as "usability for users with disabilities". This broadens the definition of accessibility and makes it more understandable, without redefining the scope.
In (Rømen et al. 2008) we reported on a similar study where we attempted to validate WCAG 1.0 by identifying accessibility problems through a comparison of the performance of disabled users and a control group, followed by a WCAG 1.0 conformance analysis.
Petrie and Kheir's study covered only one kind of disability (blindness), while a complete validation of WCAG should include a number of other disabilities. In ibid. we included motor impairment and dyslexia in addition to visual impairment.
None of the earlier studies had attempted to give empirically-based concrete guidance on how WCAG could be improved. In addition to a quantitative analysis of WCAG conformance, we included a qualitative analysis of the most frequently encountered accessibility problems not covered by WCAG for each of the three disability groups.
Our main research questions in ibid. were related to WCAG 1.0:
In addition to answering the above main research questions, we gave descriptive data concerning the correlation between problem severity and WCAG 1.0 priority.
In the present study we have extended the analysis to also cover WCAG 2.0. This adds the following research question to the two above:
Seven disabled participants undertook the study together with six controls. Of the disabled participants, three were visually impaired (two blind, one severely weak-sighted), two were motor impaired with reduced dexterity and two were dyslexic. The visually impaired participants were all experienced users of a screen reader, either Jaws® (2 users) or Window-eyes® (1 user) and the two blind participants used a Braille display. Both of the motor impaired participants used a standard mouse and keyboard, while none of the dyslexic participants used any assistive technology.
The group of disabled participants was comprised of four male and three female and the able-bodied group of two male and four female. All participants were used to working with computers to perform tasks such as online banking, web browsing and word processing on a weekly basis.
The user groups were matched on computer literacy, but due to practical problems, the disabled users were on an average older than the control group. We do however not see this as a grave validity threat, as the role of the control group was not to compare performance to the disabled group, but to help differentiate between accessibility problems and usability problems.
The websites tested were those of two neighbouring municipalities in central Norway. Both municipalities were well known to the participants, and more importantly, both websites offered the same array of online services. The similarity in content, purpose and user-groups facilitated the design of comparative tasks.
For each of the websites, the participants were asked to locate the mayor's e-mail address, find a price list for kindergartens in the municipality, locate an application form and download a document from a council meeting. The tasks and order of the tasks were identical for both websites and all thirteen participants (disabled and controls).
The websites were evaluated through individual usability tests, where the participants were asked to "think aloud" as they went through the tasks. After the completion of all tasks, a short interview was conducted to uncover further problems experienced by the participant that had not been expressed through the "think aloud" procedure.
The use of a mobile usability lab allowed us to perform the tests with the disabled users at their workplace or in their home. The disabled participants used their own computer and assistive technology. The test with the control group was done in a usability laboratory with a standard PC equipment. All tests were video and audio recorded after asking for the participant's consent.
In the following we will define a usability or accessibility problem as a situation in which a user is hampered in performing a given task by a deficiency in the website being tested. In our statistical analysis of the problems encountered we count problems in the websites and not problem categories. E.g. if the users experience problems reading link texts in two different parts of a site, this is counted as two website problems, even though the two problems belong to the same problem category.
Our rationale for focusing on website problems and not problem categories is that we want to measure to what extent WCAG can be used to improve the user experience for disabled users, and users experience website problems, not problem categories.
In a similar fashion we distinguish between problem instances and website problems. A problem instance is a situation in which a specific user experiences a website problem. A website problem can consequently give rise to a number of problem instances as more than one user can run into the same website problem.
The tests showed that the disabled participants on average experienced a significantly larger amount of problems compared to the controls. The disabled participants experienced on average 17.1 problems (total=120, N=7), while the controls experienced on average 9.3 problems (total=56, N=6).
The fact that the disabled users on average experienced close to twice as many problems as the controls tells us that despite the efforts of WAI and others, much is still to be done concerning web accessibility.
When comparing data from all tests, we found that the total of 176 problem instances experienced by the 13 users were caused by a total of 80 website problems.
Of these website problems, 18 were encountered by the control group only, 15 were encountered both by the control group and the disabled users, while 47 were encountered by the disabled users only (Figure 1). Following Petrie and Kheir's definition, the users identified 47 accessibility problems and 33 (15+18) usability problems in the websites.
Figure 1. Website problems
The distribution of website problems in our data for the three groups disabled only, disabled and control, and control only matches surprisingly well that found by Petrie and Kheir in their tests. Our distribution is (59%, 19%, 22%) while their distribution was (62%, 14%, 25%). We interpret this positively concerning the validity of our study.
The severity of the 80 website problems were classified according to Molich’s criteria (Molich 2000). In short, a critical problem is one that inhibits a user from performing a task; a serious problem is slowing down the user significantly, but the user is able to find a way around the problem; while a cosmetic problem just makes it a bit harder for the user to perform the task.
Of the 47 website accessibility problems identified, 6 were critical, 18 were serious and 23 were cosmetic. Of the 33 website usability problems, 3 were critical, 13 were serious and 17 were cosmetic.
For each of the 80 website problems identified, we searched WCAG 1.0 for guidelines that could have identified these problems in a heuristic evaluation with WCAG 1.0.
Figure 2 shows the WCAG 1.0 conformance for all severity levels each of the three categories disabled only, disabled and controls, and controls only.
Figure 2. WCAG 1.0 conformance for all website problems.
Of the 47 website accessibility problems, only 13 were found to be due to violations of WCAG 1.0. This corresponds to a 27% match, i.e. more than two-thirds of the website accessibility problems identified by the disabled users would not have been identified by application of the WCAG 1.0 guidelines alone.
Of the six critical website accessibility problems, only one was found to be a violation of WCAG 1.0, i.e. five out of six critical problems would not have been identified with WCAG 1.0.
Concerning the 33 usability problems, seven could have been identified by application of WCAG 1.0. We interpret the latter not as a problem with WCAG 1.0, but only as a reminder that design for all also improves usability for able-bodied users.
Table 1 shows the distribution of priority levels for the website accessibility problems. We see that of the 47 accessibility problems, only one matched a priority 1 WCAG 1.0 guideline, and this was not a critical problem (serious). We also see that of the six accessibility problems that actually were critical (inhibiting the user), the one identified by WCAG 1.0 was priority 2.
|WCAG 1.0 priority||Critical||Serious||Cosmetic|
|Not WCAG 1.0||5||12||17|
This indicates that there is little or no match between WCAG 1.0 priority and problem severity. The results are in accordance with the findings of Harrison and Petrie (Harrison et al. 2006). Unfortunately, the numbers are too small to do a statistical test of independence.
Similar to WCAG 1.0, we searched WCAG 2.0 for guidelines that could have identified these problems in a heuristic evaluation with WCAG 2.0 (figure 3).
Figure 3. WCAG 2.0 conformance for all website problems.
Of the 47 website accessibility problems, only 15 were found to be due to violations of WCAG 2.0. This corresponds to a 32% match, and is only a 5% improvement compared to WCAG 1.0 Still, more than two-thirds of the website accessibility problems identified by the disabled users would not have been identified by application of the WCAG 2.0 guidelines alone.
Of the six critical website accessibility problems, two were found to be a violation of WCAG 2.0, i.e. two out of six critical problems would not have been identified with WCAG 2.0 and seven of the 33 usability problems could have been identified.
Table 2 shows the distribution of priority levels for the website accessibility problems. We see that of the 47 accessibility problems, eleven matched a level A WCAG 2.0 guideline, but none was a critical problem (serious and cosmetic). We also see that of the six accessibility problems that were critical (inhibiting the user), the two identified by WCAG 2.0 were level AAA.
Where a problem could be solved by several success criteria, the success criteria with the lowest level (A) was counted. The lower level is easier conform to, and thus represents a problem that is easier to solve.
|WCAG 2.0 priority||Critical||Serious||Cosmetic|
|Not WCAG 2.0||4||11||17|
This indicates that there is still little or no match between WCAG priority and problem severity.
Examples of additional problems solved by WCAG 2.0:
By combining WCAG 1.0 and WCAG 2.0, we searched for guidelines that could have identified these problems in a heuristic evaluation with both sets of WCAG (figure 4).
Figure 4. WCAG 1.0 + 2.0 conformance for all website problems.
Of the 47 website accessibility problems, 18 were found to be due to violations of both WCAG 1.0 and WCAG 2.0. This corresponds to a 38% match. By combining both versions of WCAG, there is 10% improvement compared to WCAG 1.0 and 6% improvement compared to WCAG 2.0.
Of the six critical website accessibility problems, three were found to be a violation of both sets of WCAG, i.e. three out of six critical problems would not have been identified. There was an overlap of ten problems; where three would have been identified by WCAG 1.0 alone, and five by WCAG 2.0.
Concerning the 33 usability problems, 12 could have been identified by application of both WCAG 1.0 and 2.0. Similarly to the accessibility problems, there was an overlap between the two versions of WCAG, of two usability problems; where five would have been identified by WCAG 1.0 and five by WCAG 2.0.
This means that there would be something to gain by using both WCAG 1.0 and 2.0 in combination to solve a higher number of problems than by adhering to only one set of guidelines.
Table 3 shows the distribution of priority levels for the website accessibility problems.
|WCAG 1.0 + 2.0 priority||Critical||Serious||Cosmetic|
Examples of additional problems solved by WCAG 1.0:
WCAG 2.0 has replaced checkpoint 13.1 with two success criteria, namely 2.4.4 "Link Purpose (In Context): The purpose of each link can be determined from the link text alone or from the link text together with its programmatically determined link context" (A), and 2.4.9 "Link Purpose (Link Only): A mechanism is available to allow the purpose of each link to be identified from link text alone" (AAA). However, both these criteria come with the clause "Except where the purpose of the link would be ambiguous to users in general". WCAG 1.0 made no such reference to ambiguity, stating merely that the link target should be clearly identified. With this clause, there is no guarantee that links are identifiable, thus reducing usability for all users, given an extreme interpretation of the guidance provided in WCAG 2.0.
The number of accessibility problems that could have been identified through an analysis of WCAG (WCAG 1.0: 27%, WCAG 2.0: 32%, WCAG 1.0 + 2.0: 38%) answers our first research question.
For each of the three disabilities we have identified the kind of problems that were most frequently experienced by this user group. The following answers the second research question.
Figure 5. 17 phrases starting with the text "Bystyre" (City council).
Figure 6. Clickable surface; affordance (right) vs. the real world (left).
Figure 7. Website perception to users with dyslexia.
In addition to the above accessibility problems, the disabled users also experienced usability problems such as too many levels of navigation and lack of instructions for advanced functions or forms.
The current study has been an attempt to validate empirically the usefulness of using WCAG as a heuristic for website accessibility. We are aware that the low number of participants and websites pose threats to the validity of our findings. In addition, important disability groups have not been included (e.g. screen magnifier users, people with learning difficulties, and people with hearing impairments), and the web sites tested are to a large extent traditional html pages that fit better WCAG 1.0 than WCAG 2.0.
Through controlled usability tests of two websites with disabled users and a control group we found that only 27% of the identified website accessibility problems could have been identified through the use of WCAG 1.0. A similar analysis of conformance to WCAG 2.0 showed a marginal 5% improvement concerning identified website accessibility problems. In our analysis of the website accessibility problems, we found no correlation between WCAG priority and problem severity.
WAI has never said anything about what it considers an acceptable match with reality, but we assume the numbers presented here are well below that level. Despite the listed threats to validity in the tests, the extreme difference between what we have found and what one should expect of an accessibility guideline gives us no reason to doubt that WCAG has a large potential for improvement.
A lot of good has been added to WCAG in version 2.0, but some things have unfortunately been discarded when the new recommendation was made. The documentation is now technology independent and has become somewhat vague. Techniques only add to the size of WCAG and makes it more difficult to orientate for someone who is new to WCAG.
Combining WCAG 1.0 and WCAG 2.0 gave a 38% match. To further increase accessibility, one could use a combination of WCAG 1.0, 2.0 and the ISO standard for accessibility. However, this only adds further to the complexity of the use of guidelines.
We conclude from our findings that the application of WCAG alone is not sufficient to guarantee website accessibility. However, the application of WCAG is a good start and could be regarded as minimum requirements for making accessible websites. To further improve website accessibility and usability, website developers should use a user-centered design approach and perform usability tests of their website with specified users. These specified users should include people with very wide ranging abilities.
We do not interpret our findings as a criticism of accessibility guidelines as such, but we recommend that future versions of such guidelines to a larger extent should be based on empirical data and validated empirically.
We would like to thank the test participants and the organizations representing the disabled users. Also thanks to Terje Røsand at NTNU/NSEP for highly valuable technical assistance.