Director of Health Outcomes, Paul O'Donohoe, joins Mapi Group's Dr. Catherine Acquadro as co-presenter of her webinar to discuss the often confusing issue of what usability testing actually is, as well as review scientific evidence of ePRO usability in a range of different situations.
(Courtesy: Mapi Group , used with permission)
Good morning and good afternoon, everyone, and welcome to Mapi’s educational webinar series. Today we will be presenting the topic of Usability Testing of Electronic Patient Reported Outcomes.
Our first presenter will be Catherine Acquadro and I’ll pass the reins on to her now. Thank you everyone.
Thank you Deirdre, and good afternoon and good morning to all of you. So I’m Catherine Acquadro, Scientific Advisor at Mapi and Mapi Research Trust. I have been working in the field of patient-centred research for more than 24 years. And now I will let Paul introduce himself.
Hello everyone. My name is Paul O’Donohoe and I’m Director of Health Outcomes for CRF Health, based in our London office, and I’m responsible for providing and coordinating our scientific support, both our internal education, but then also supporting our clients with various scientific consulting, as well as driving the thought leadership of the company.
Hello everyone. I’m Alexandre Itani. I’m a Project Manager at Mapi Language Services. I’m also a usability testing referee. And now I will let Catherine begin this webinar.
Thank you Alexandre. So today here is now a review of what will be presented today. I will take care of definition and methodology. Paul will present a practical example. Alexandre will provide some tips and tricks for the preparation of usability testing. And I will finish with a few words of conclusion. Then, I hope that we will have some room for questions.
So what is usability testing? It is part of a multi-step approach to a instrument faithful migration from paper to electronic format. It is usually defined as the process exploring whether respondents from the target population are able to use the software and the device appropriately. This process includes a formal documentation of respondents’ ability to navigate the electronic platform, follow instructions, and answer questions. The overall goal is to demonstrate that respondents can complete the computerized assessment as intended. This definition was provided by Coons et al. in their paper entitled “Recommendations on Evidence Needed to Support Measurement Equivalence Between Electronic and Paper-Based PRO Measures,” which was published in 2009 in Value Health as an ISPOR ePRO Good Research Practices Task Force report. Do we have any guidance on how to perform UT? The FDA ISPOR Guidance does not provide methodological guidance. The PRO Consortium has issued several best practices documents, which give some clues. The ISPOR ePRO Task Force has published several articles, which are of great interest, and of course all entities involved in UT have their own guidelines.
How do we define a faithful migration? This is the implementation of the PRO measures on alternative modes of data collection that do not bias responses. And the electronic modalities are numerous. You have handheld devices, interactive voice response, and web. Migration must ensure that there are no change in content and that patients would interpret and respond the same way regardless of the mode.
An eCOA measure that has been adapted from a paper-based measure has to produce data that are equivalent or superior to the data produced from the original paper version. Measurement equivalence is a function of the comparability of the psychometric properties of data obtained, against the original and adapted administration mode. This comparability is treated by the amount of modification to the content and format of the original paper trial measures required during the adaptation process. And the amount of modification that occurs during migration to the electronic platform will dictate the amount of evidence necessary to demonstrate that the change did not introduce response bias and negatively affect the measured psychometric properties.
The magnitude of a particular modification—that is to say, is it minor, moderate, or substantial—is defined with reference to potential effects on the content, meaning, or interpretation of the measures, items, and scales. Based on the magnitude of the modification, evidence for measurement equivalence can be generated through combination of the following: usability testing plus cognitive debriefing, equivalence testing plus usability testing, full psychometric testing and usability testing. This table here is adapted from Coons et al. and provides an idea of the role of UT in all levels of modification needed. You can see here that UT is required at all levels of modification.
Now I would like clarify this, because we have realized that there are some confusions about what is UT and its goals. This is why I have chosen to make a list of what UT is not.
Usability testing is not a feasibility testing, which assesses whether the data collection mode will work in the context of a specific study design. Usability testing is not a screenshot validation, which is a pre-UT step to make sure that all elements of the measure are included. Usability testing is not a user acceptance testing, which is the last phase of the software testing process. It’s also known as meta testing and it’s more than a simple UAT. Usability testing is not a cognitive debriefing, which is a quantitative research tool to determine whether concepts and items are understood by the patients in the same way, regardless of the data collection mode. It is often performed at the same time or in parallel with UT. These two are separate processes with fundamentally different roles. However, it’s sometimes extremely hard to really separate them in real life, because you can get information from UT that refers to the content of the questionnaire, and vice versa.
Here is a figure presenting another view of the whole process. I will not talk about the training steps, which represent everything between our work here at Mapi and the ePRO provider. We have represented in grey the steps which are committed just before UT. That is to say, screenshot validation. And Alexandre will explain why this step, which is not part of UT, is key to a successful UT.
To perform usability testing, we need subjects which should be part of the intended target population of the study. Subjects are most of the time patients, and of if course it’s the measures intended for a particular population, we should recruit children or adolescents depending on the age group that is targeted. In the case of rare diseases, the recruitment might be difficult, so we have to find alternative solutions, recruiting patients with similar symptoms or impacts in their lives. And this is something that we often do in linguistic validation. We always use the device which will be used in the trials, and of course we never use screenshots, since this use would prevent us to test all the medication parts, which is one of the key elements tested during the process.
Here are all the documents that we use here at Mapi during UT. We use a patient socio-demographic form for each respondent. After signing the informed consent form, the patient will be asked to fill out a short form which contains questions related to gender, age, education, employment status. We also use a UT interview guide specifically designed to conduct the interviews, which includes some illustrative questions. I will show you and example of the questions that we use. We also develop UT session reports, one per participant, as well as a summary of all results, and of course at the end we issue a final report.
The UT interviews are conducted face to face by trained interviewers. They are audio recorded with the patient’s permission for tracking purposes. The interviewer explains to the subject the goal of the testing, and the tasks to be performed. Then, each patient uses the device. After that, the interviewer debriefs with the patient about their experience with regard to the electronic administration of the questionnaire, and asks questions about their understanding of the instructions. Here are some examples of the questions that we ask. How easy or difficult did you find it to complete the questionnaire and to select answers to questions using the response buttons. Did you need to change any of your answers? If so, did you find it difficult or easy? Were the instructions on how to answer the questions clear and precise enough?
Here is a review of the criteria applied to decide whether or not we are facing issues during the UT. I think before we explored the comprehension of the instructions and the ease of use of the device, and the navigation. Issues are rated on three levels, and as you can see, issues rated level 1 are the worst. In this case, patients were unable to use the device and/or could not understand the instructions.
UTs categorized as successful and an instrument on the device is considered ready to be used if the subjects being tested proceeded quickly and without oral guidance through the questionnaire. The subjects had no difficulty in understanding the instructions. The text display was accurate. No major improvements were recommended by the subjects. If the issues raised are considered severe, then the ePRO provider may need to perform additional development and changes. And then they will need to retest again and to redo a UT.
At the end of the process, we write a report, which provides all the information about all the individual UT sessions, all observations made, and of course our recommendations. No change, so it’s okay you can move on. Changes are needed, wait, or we now have to perform a retest.
So we can say that UT, when done, is quite a straightforward process. However, there are some pending questions. For instance, do we need UT when we migrate from one device to another one? Sometimes we miss what I call the big picture. If we only do one or two questionnaires, then we have no information about the whole trial. We have no information about the training modules. We also will wonder about the organization that may be needed between all questionnaires to be used for the future clinical studies. And sometimes it will be nice to have more information, you know, so this is a call to the sponsors. Also, what happens if the patients do not understand some questions, and you cannot change the questionnaire. This is not specific to the electronic format. This may impair the use of the instrument in the targeted trial. And this is something that may happen during the UT, when it’s something that should happen during the cognitive debriefing. So that’s something that should be handled, and we need a lot of communication between the sponsor, the developer, and the ePRO provider.
Then I have a provocative last question: does all of this makes sense? And of course the answer is yes, but there is still a grey area concerning the translations. And I wanted to share with you the results of a study that was done by Sonya Eremenco and her team at Evidera in their study entitled “Testing ePRO device usability during the translation process: a case study of the EXACT in 7 countries.” They concluded that, “given the consistent ease-of-use findings in these diverse, device naive subjects across 7 countries and the emphasis on subject training in clinical trials, it was determined that usability testing with future translations was unnecessary.” Of course this is the conclusion based on performing UT across seven countries. And we don’t know what it would have concluded if some big issues had been found in one or more countries. But for me the case of UT in translation is still debated, and I do not have a definite answer. And I will be much interested if you, some people, you guys in the audience, you could share your experience, if any, on the subject.
So now this will be the end for me. I will give the floor to Paul, who will present a very interesting practical example of usability testing. Thank you very much.
Lovely. Thank you kindly Catherine. So I think Catherine did a really good job of setting the scene from a kind of theoretical point of view, outlining exactly what we’re talking about when we are talking about usability testing and exactly what we’re trying to do. But we also wanted to take the opportunity to share with you some of the things that we can hear back from participants or patients when you actually run these usability testing studies on electronic data capture systems, on ePRO systems.
Specifically a slightly unusual study that we were involved in with Mapi was a study looking at the BYOD methodology. Now, anyone who has had any exposure to ePRO or eCOA in the last couple of years will undoubtedly have heard of this term BYOD, or bring your own device. But just to ensure we’re all on the same page, traditional ePRO or eCOA studies involve providing all participants within a clinical trial the same device, so a provisioned device model where you hand out a device to every single participant within that clinical trial. BYOD methodology is aimed to take advantage of the devices a large number of participants have on them at all times. So we’re taking advantage of participants’ own smartphones, for example, to actually complete the study questionnaires. And there’s various outstanding issues that need to be addressed from a methodological and logistical point of view around BYOD. But we were interested in exploring some questions that we had about the feasibility of BYOD, really largely around equivalence but then also the usability piece. And as Catherine touched on, cognitive interviewing, cognitive debriefing, is traditionally rolled in with usability testing, so that’s something we did within this particular study as well to really dig into this issue of equivalence. That’s not something I’m going to talk about during the next few slides. I’m going to focus purely on the usability side of things for this particular presentation.
So it was a very small scale study, 20 participants, good mix of genders but also a nice spread of ages. Mean age of 40, but a range of 21 to 69. This was from the general population, we weren’t interested in targeting any specific therapeutic area, we were just interested in general feedback of the usability of this app-based version of a vaccine symptom diary. And it was also importantly, Mapi targeted a wide range of self-reported comfort with technology in the participant, all the way from hugely comfortable to not very comfortable with technology. This was a UK-based sample, it can be quite tricky getting a large number of people who are very uncomfortable with technology or have no experience with technology. That’s just a sign of the times we live in, I think. But we did manage to get a number of the people that were in there who rated themselves as being not particularly comfortable with technology. Unsurprising, that was correlated with the older age end of the spectrum. But we got, for 20 participants, we got a nice representative sample. And what we really wanted participants to do, and what we were focused on observing, was seeing participants interacting with this app-based vaccine symptom diary across a range of smartphone devices. So we were interested in how different sized devices might impact this issue of equivalence that I mentioned but also this issue of usability across the different sized devices. So we have three different devices that we defined as small, medium, and large. An iPhone 5s as a small device. Samsung Galaxy Note, as the large device. And then kind of a generic Android phone in there as the middle sized device. But importantly we also asked participants where possible to install the app on their own devices because we really wanted to see how that experience was for the participants, and obviously the key point of BYOD is that participants will in fact be able to bring and use their own advice in the study. And so we wanted to see how that experience was for the the participants getting the app onto their own device.
It’s also worth highlighting that we took a very worst case scenario approach to this particular study. We offered no training and no immediate guidance to the participant. We simply provided them a link to where they can access the app to download onto their phone and then basically told them to have at it. Obviously if they got stuck we started providing more and more verbal guidance, but the initial thing we were really trying to look at was a this very pure, intuitive usability of the system without any additional guidance being offered.
So to take a look at things from these very nice requirements for a successful usability test that Catherine provided us, obviously we’re really digging into: can the participants use the hardware and use the software to provide you answers to the questions you’re asking them. And so the first topic there was: the subject being tested proceeds quickly and without oral guidance through the questionnaire. Now I mentioned the fact that we were particularly interested in how participants got on downloading and installing the app on their own device, and this for me was one of the most interesting findings of the study, because almost half of the participants in fact struggled to get the app on their own phone in the first place. Again, wanting to highlight that this was kind of a worst case scenario, all we did was provide a link to the App Store for the participants, but ran into some really interesting cases of like, participant forgetting their password for the App Store, so not being able to access it in the first place. Issues around incompatible software. We did set out inclusion/exclusion criteria around having suitable versions of Android and version of iOS before the participant was allowed to take part in the study, but even then very often it’s very hard to get accurate feedback from participants on exactly what version of software they might have on their device, because they simply don’t know. Network issues we ran into, which wasn’t really the fault of any of the participants. Insufficient memory, you need a certain amount of memory to be able to install apps on your own device, and if you’ve used up a lot of that memory with other apps or with photos or videos or music, that can create challenges for getting an app onto your device. And then most interesting from my point of view was some participants were able to download the app but then couldn’t actually find it on their device once they’re downloaded. So I think this was a real key finding and, at least to me, highlighted the maybe bias I have in regards to the technologically savvy circles that I move in and in regards to I just didn’t even consider was going to be an issue for participants to get their app on their own phone at the first go. But this really brought home the fact that you need to offer support and training to participants. And I’ll circle back on this issue in a few slides’ time.
We also observed a bit of a learning curve across the different devices. All participants interacted with all three devices, plus their own device, assuming they were able to get the app onto it. Some participants had no issues at all, they just flew through it. That tended to be the younger participants, though, that rated themselves as very comfortable with technology. Some were noticeably slower to begin with on the first device but then picked up speed over time. Again, this tended to be the more elderly patients and tends to echo what we see with provisioned device eCOA studies, where sometimes the more elderly patients can be a bit slow to get going, but once you provide training, once you ensure they’re very comfortable with the hardware and with the software, then they have absolutely no issues using, interacting, and providing answers on the electronic systems, and the same with this app-based study. Some of them found it a bit hard. Very nice quote there: “Blimey that was hard work, very stressful,” after completing the questionnaire on the first device. But then they had absolutely no issues on the second and third time they moved through the study. But still a useful and an interesting finding.
So the second important requirement that Catherine highlighted was: the subject had no difficulty in understanding instructions. And generally, across this, it was quite a complicated diary. The participants did have no issues, which was great to hear. Where we did run into some noticeable and interesting feedback was in regards to instructions around identifying a rash. As I said, this was a vaccine diary, there were some rashes associated with the administration of the vaccine. And the diary in fact included a visual guide to how to identify and measure a rash that actually showed kind of a very simplified picture of what a rash might look like and how you would go about measuring that rash. So to access that picture, participants had to press a specific button, you can see this on the screenshot on the screen now, underneath that heading “Rash,” the instructions to identify rashes with a little arrow. The participants had to touch that bar to get access to the picture of the rash. And this did lead to some confusion. Of the 16 participants who would have interacted with the screen—participants were asked to follow different scenarios, so they saw different screens—but of the 16 that interacted with this screen, only 7 of them did not in fact intuitively touch on the instructions, so were left a bit confused when they were asked to report on their rash, until that button was pointed out to them. There was also some subtleties around reporting rashes at the injection site, so at the point you would have an injection versus on any other part of the body. They were asked to report at least two different sites at two different time points. And this particular screen is asking about any part of the body, you can see at the top of the screen there. Some participants, three in particular, didn’t actually pick up on those differences and thought they were answering the same question again. So important feedback around making these instructions easier to get to, or at least more noticeable, as well as really highlighting what exactly you’re asking the participant, exactly what you’re asking them to report on.
The third requirement was around: text display is accurate, with regard to size and typography. And thankfully we had no issues across all devices even the smaller iPhone in this regards. The writing size is okay in all three, the size of the writing is good, so all positive feedback in regards to the actual font size and layout of the questionnaires.
Interesting, and again specifically relating to BYOD and app-based study, ten of the participants did suggest they would in fact prefer to complete the diary on a larger device, and only one of those participants actually owned a larger device. So these are participants saying they would potentially enjoy completing the app on a different device to the one they had. And some nice feedback around a larger screen: “Slightly easier to read, but I prefer my own device.” “None of the devices are difficult but I still prefer the larger screen.” “It would be fine to use on the little one but could be slightly easier with a large screen.” All really positive feedback, all of them saying that, across the range of devices, it would be fine completing the diary, but potentially they would have a preference leaning towards a larger device. But overall, no major issues in regards to text display.
And then finally: no major improvements are recommended by the subject. Thankfully again, we didn’t have any major feedback or major issues that were raised by the participants. Some slight confusion around the “Back” and “Next” navigation, making that a bit clearer for the participants, they have to scroll down to the bottom of the screen if it was a particularly large or long screen before they’re able to access the “Next” button. And some participants were suggesting that that could be made a bit clearer. And probably the biggest pain point and source of feedback was around using number scroll bars for reporting temperature and dates, and that’s an example again, on that screenshot you can see there, where participants have to touch the plus and minus buttons to increase and decrease the various numbers. I believe this is for reporting a temperature. And they have some issues, some participants—seven—had issues interacting with these particular user elements, they found it a bit “fiddly” and sometimes it was “difficult to enter correct values.” And one participant saying, “Even when you’re not finished putting in your number it will pop up and tell you that you are technically dead.” What they meant by that was there’s an out of range warning that pops up if they try and put, for example, an unrealistically high temperature. So very interesting and important feedback there specifically relating to these user elements, and highlighted for us that we need to revisit this and take a look at how we were asking participants to report dates and report temperatures.
So overall it was a very positive. Beyond this very interesting issue of participants installing the app on their phone in the first place, I was very happy with the feedback that we got from participants in just the usability of the app, once they were into the app and interacting with the app, the feedback was very very positive. That issue around getting the app on their device, I think is very important for all of us to bear in mind as we start moving more towards this BYOD model. There’s certainly— it’s a big buzzword in the industry at the moment, there’s a lot of interest in it. And I think it just really highlights the fact that we can’t just push technical support on participants or on site staff, we can’t just assume that because a participant owns a particular smartphone, they are completely comfortable using that smartphone beyond what they use it for day to day, which might just be simply for making phone calls, controversially enough, using a phone just to make phone calls. But they might have no experience installing apps in the first place. And so we really have to ensure, if we’re looking to use a BYOD methodology in a clinical trial, that we have suitable generic training in place that can provide guidance for how to get an app onto, for example, Android phones in general or Apple phones in general. There’s a limited number of permutations and combinations for getting an app onto a device. We can develop training and guidance that will take people through that in a step-by-step way that will overcome some of these challenges. And the feedback really provided guidance for us, actionable feedback that could help us improve the usability. Thankfully none of it was too severe. But I think the key takeaway message from me, beyond this issue of ensuring the patients and participants are suitably supported with training, is that they far prefer the app to the paper-based version of the diary. The paper diary was quite large, there was a large number of pages in it, and consistently the feedback we got from participants was that they much prefer the app, it was more user friendly, it was much simpler to use, and they would find it easier to use the app on a day-by-day basis as compared to the paper version.
So really positive stuff, really interesting feedback, as found by Mapi, and slightly more unusual take on the usability testing that we typically do with more permutations and combinations that we might possibly see in a typical ePRO study but really provided some interesting and actionable feedback.
So now I’m going to hand over to Alexandre who is going to take us through some tips and tricks for running a usability test.
Right, thank you Paul. Well as I come from the operational side, I wanted to share with you a couple of tricks of major importance actually, which save a lot of time in the initial phases of a UT project.
Catherine has previously explained the process of the usability testing. However at Mapi this process implies a validation step and a certification step. Before getting into detail I wanted to remind you of a couple of elements for a smooth, stressless, and efficient project.
First of all, I would like to highlight the fact that all types of testing should be anticipated as much as possible. Conducting UTs last minute is definitely not recommendable. Proceeding to migration of target language versions or linguistic validation work on eCOA before completion and validation of the UT shouldn’t even be considered. Planning ahead will provide sufficient time to find solutions in case of any issues and reduce impacts, especially when the UT is part of a large scale project. I think that to consider this as a waste of time is rather reckless because when you look at the big picture and assess risk, you will quickly realize that it’s worth the wait.
Now, are you confident enough in your project to rush everything and risk compromising the project if something unexpected comes up? How can you schedule the project in order to minimize impact? Would you rather be on the safe side and finish early if all goes well, or are you willing to rush everything, risking to realize that the project is not meeting your expectations and in the end compromise your whole project. Well, this is the kind of questions that you should be asking yourself beforehand. Always keep in mind that this is not a routine control and that modifications may be needed after the interviews if the outcome is negative. This of course implies additional time and budget. This should be thought out early enough.
Now, quickly jumping to the end of the project. The final report we deliver will provide you with the outcome of the testing of course, but also with details on the whole process, subject feedback, and recommendations. These reports should be taken very seriously because most of the time a small change can make a big difference in a user’s experience.
Now as you all know, communication is also key in project management, and obviously this includes UT projects. One should always remember that UTs are a complete collaboration. Communicate with your service provider and collaborator about all necessary contexts regarding the study, the other questionnaires or materials involved, and any other specific requirements. Actually all the details that usually remain in the shadows until the last week of the project when it becomes too difficult to react. The more information you share, the more help, assistance, and flexibility you will get. Because we need to see the big picture here and, of note, the results of usability testing are more reliable when subjects get to test the whole application.
Finally, from a technical point of view, all questionnaires, diaries, and training modules should ideally be designed, checked, tested, and migrated by the same team. By team I mean the same e-provider and language service provider. This is to ensure the best harmonization possible in terms of content, but also layout, and in the end a great product. All in all, for improved efficiency, think anticipation and communication.
Now a couple of minutes ago Catherine explained the basics of a UT project. However, don’t forget about what we call the validation and certification of screenshots. This is the preliminary phase of the usability testing project at Mapi. Now, let’s take a closer look.
Everything starts with the creation, design, and generation of the eCOA screenshots. Usually, we are not involved in this process. However, we do have a couple of recommendations which are detailed in a checklist we created with all our hearts. Now, jokes aside, we have used our experience and expertise to create this checklist, and it has proven very helpful. It is, I think, the most efficient way of avoiding any oversights. You can feel free to request it in the early steps of the project, as we will gladly share it. We’re very proud of it, actually.
I also use this checklist myself when it comes to verifying the screenshots I receive. And honestly, thorough checking is the key to validation and certification. This will contribute to stacking all the odds in our favor for the UT. Generally, the feedback I provide when checking these screenshots is linked to a section of this checklist. My main advice for this creation set would be to never forget the fact that the electronic version of a questionnaire must be in line with the paper version of the same questionnaire. Check the materials before you send them over, and you will quickly realize that you are saving time. During your verifications, and in addition to this checklist, we also provide suggestions for improvement based on our experience, expertise, and also on prior respondent feedback, especially when it comes to questionnaires for which we have exclusivity.
This brings me to the mandatory validation step. Once we come to an agreement on the final version of the screenshots, we will certify them as the version that will be tested during the usability testing. When the author’s approval is needed, it is wise to get in touch with them beforehand. They will be able to provide advice on the questionnaire completion and scoring, explain their requirements, and optimize the electronic version. We have recently had a case of a project in which the author and e-provider did not come to terms, and this resulted in changes further to the usability testing, leading to extra non-budgeted work and stress. In the case of a disagreement like this, I believe that any argument can be heard and understood as long as it is justified. And don’t forget: this is a collaboration.
Once certified, the final version of the eCOA can be implemented onto the device, ideally within 24 hours in order to optimize timelines. Depending on devices, they can be shipped while screenshots are still being reviewed, provided changes can be implemented from distance. This is by far, I think, the best option, as it allows us to quickly test it and to provide further feedback if need be in due time.
And here we are, ready to move on to the actual usability testing.
Now overall there are many things all parties can do to make sure the UT goes smoothly. The most important ones would be the following. Anticipate and communicate for improved efficiency. Let us see the big picture for product harmony. Check the material thoroughly. Consider the other interests seriously. And finally, just tell us about you. How do you operate? What system are you using? How flexible is your software? What limitations do you have? What information do you think is relevant to us, so that we can provide you with the best service that we can.
Well I will let you meditate on that, and in the meantime, I’m letting Catherine conclude this webinar.
Okay, thank you Alexandre. So some words of conclusion. So as you have seen, UT is one of the key steps of the migration of PRO measures from paper to electronic format. It needs, as Alexandre has shown, anticipation and careful planning. And something that we did not talk about much during the session, I believe that subject training in clinical trials is of paramount importance to make sure that the patients will use the device in the right way. So now I think this is the end. And we can have some room for questions.