Readability User Testing: Strengths, Weaknesses and Strategic Approaches for Patient Information Leaflets
11 feb 2025
Mark Gibson
,
UK
Health Communication Specialist
Readability user testing for Patient Information Leaflets (PILs) provides real-world feedback, identifies misunderstandings, and assesses how usable and accessible a document is. It also reveals barriers for diverse audiences.
However, readability user testing is not a perfect method. It has its challenges, including resource intensity, sample bias, subjectivity, and contextual limitations. Understanding the balance between these strengths and weaknesses is important. Combining these with other evaluation methods provides a comprehensive understanding of the usability and communicative value of a patient-facing document like a PIL.
The Value of Readability User Testing
Direct feedback from users reveals real-world usability and comprehension issues. It can highlight shortcomings and document failures that might not have been anticipated during the design phase. By observing how users interact with the information, developers can identify areas where messages are unclear or difficult to understand. This can lead to meaningful changes.
User feedback also helps identify specific sections or terms that are commonly misunderstood. When patterns of confusion emerge, it becomes possible to make targeted revisions to improve clarity and reduce the risk of misinterpretation. This proactive approach ensures that the information is not only accurate but also easily understood by the intended audience.
‘Proactive’ user testing involves digging deeper, as some seemingly simple instructions reveal ambiguities when applied to real-life situations. ‘Take one tablet three times a day’ is an example of this, as explored in other articles. Participants can explain this in their own words, but what does it really mean for them? To uncover this, the user tester needs to work harder than merely accepting responses that only demonstrates surface-level comprehension.
In addition to readability, user feedback assesses engagement. It helps determine if users are motivated to read and retain the information. High engagement indicates clear and compelling content, essential for effective communication. Low engagement may signal the need for adjustments in structure or style. What is the typical reaction to an identikit Bible paper leaflet that falls out of your medicine pack and you cannot refold or get it back into the box? Does this presentation, content and inconvenient format draw you in or put you off?
Testing the effectiveness of communication is another key benefit of gathering real-world feedback. It evaluates whether the intended messages are successfully conveyed, ensuring that critical safety and usage instructions are clear and actionable. This is especially important in contexts where misunderstandings could lead to significant risks or non-compliance. Again, this would require the user tester to work harder in their interviews, beyond the official guidance, to gain this kind of insight. A frequently and easily overlooked aspect of readability testing of package leaflets is that it is another avenue to explore the Patient Voice.
Real-world feedback contributes to inclusive design by revealing whether the leaflet is accessible to diverse audiences, including those with low health literacy, non-native speakers, or older adults. By considering the needs of these groups, the content can be made more inclusive and user-friendly, so its reach and impact can be enhanced. How much diversity is really included in a typical readability test? Is this something that assessors have their eye on?
Where the Method Falls Short
User testing is resource-intensive, requiring significant time, costs, careful planning, participant recruitment, and thorough feedback analysis.
The value of readability testing has been deflated, with sponsors increasingly expecting quicker timelines and lower costs. Ever since its introduction by EMA as a mandatory activity in 2005, it has become a checkbox activity, rather than an opportunity for research and engagement with users.
Sample bias occurs when the user group lacks diversity, leading to biased conclusions that may not reflect the experiences of all users. Similarly, using participants repeatedly will also potentially bias responses. For example, a company that conducts user tests in one city for over 20 years using the same database of participants may theoretically lead to individuals who have taken part in 40 tests by now. These participants are likely to anticipate the interview process in advance. Unlike cognitive debriefing, where it is appropriate to use the same patients to test different COAs that relate to the same illness, readability user testing involves the same kind of questions from the same leaflet sections over and over again. Therefore, ‘repeat offenders’ will certainly know what to expect from the test.
User feedback is subjective, leading to variability due to individual experiences, preferences, and cognitive abilities. It is possible to test the same leaflet on two sets of 20 participants and have radically different results.
The scope of readability user testing is often limited to assessing readability and understanding, potentially overlooking other crucial aspects. For example, emotional responses, cultural appropriateness, and the level of trust users place in the information are not typically evaluated. These elements, however, can greatly impact how users engage with and act upon the information provided.
There is also the risk of the Hawthorne Effect, where users change their behaviour or how they respond just because they are aware they are being observed. This awareness can influence their engagement level, attention to detail, or even their feedback, leading to skewed results that do not accurately reflect typical user behaviour.
Lastly, contextual limitations arise because testing is generally conducted in controlled environments, which may not accurately reflect real-world usage. Users might engage with leaflets differently in stressful or unwell conditions, which controlled settings cannot fully replicate. This means that the results are only ever based on hypothetical situations. They may not fully capture the practical challenges users face in authentic contexts. I will never forget one participant living with haemophilia whom I was interviewing who remarked, upon unfolding a long Roman scroll of a leaflet, ‘Imagine having a bleed and having to find the information you need in amongst all of this writing.’
A return to Eden?
Readability user testing is a powerful tool for evaluating the effectiveness of patient information leaflets to provide proof of comprehension and usability. However, it must be carefully designed to account for biases, resource constraints, and contextual limitations. Combining user testing with other methods, such as readability formulas, cognitive debriefing for some aspects of leaflets not easily tested via readability testing, expert reviews, and real-world monitoring, can provide a more comprehensive evaluation.
Current practices rush towards getting data that is just-so, feedback that meets the needs of a bureaucratic, checkbox activity. This can lead to data that provides only superficial insights into the usefulness and accessibility of patient-facing materials, such as a Patient Information Leaflet. This assembly-line approach has also pervaded the user testing of IFUs for medical devices and lay summaries of clinical studies. This is a shame because these are real opportunities to engage the Patient Voice. Companies commissioning readability testing, those performing them and those assessing them need to find a way to return to Eden; to treat readability testing as a research activity with the users’ wellbeing at the heart of it.
Thank you for reading!
Mark Gibson, Leeds, United Kingdom, March 2025
Originally written in
English