Criteria for assessing the quality of mHealth apps: a systematic review

Copyright © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Associated Data

Supplementary Data. GUID: 13D1D6A5-E5D4-49F6-93FF-7F19E474666A

Abstract

Objective

Review the existing studies including an assessment tool/method to assess the quality of mHealth apps; extract their criteria; and provide a classification of the collected criteria.

Methods

In accordance with the PRISMA statement, a literature search was conducted in MEDLINE, EMBase, ISI and Scopus for English language citations published from January 1, 2008 to December 22, 2016 for studies including tools or methods for quality assessment of mHealth apps. Two researchers screened the titles and abstracts of all retrieved citations against the inclusion and exclusion criteria. The full text of relevant papers was then individually examined by the same researchers. A senior researcher resolved eventual disagreements and confirmed the relevance of all included papers. The authors, date of publication, subject fields of target mHealth apps, development method, and assessment criteria were extracted from each paper. The extracted assessment criteria were then reviewed, compared, and classified by an expert panel of two medical informatics specialists and two health information management specialists.

Results

Twenty-three papers were included in the review. Thirty-eight main classes of assessment criteria were identified. These were reorganized by expert panel into 7 main classes (Design, Information/Content, Usability, Functionality, Ethical Issues, Security and Privacy, and User-perceived value) with 37 sub-classes of criteria.

Conclusions

There is a wide heterogeneity in assessment criteria for mHealth apps. It is necessary to define the exact meanings and degree of distinctness of each criterion. This will help to improve the existing tools and may lead to achieve a better comprehensive mHealth app assessment tool.

Keywords: mobile applications, evaluation studies, mobile health

Introduction

Mobile health, or more commonly, mHealth, has been defined as “the use of wireless communication devices to support public health and clinical practice”. 1 Mobile health applications have been defined as “soft wares that are incorporated into smartphones to improve health outcome, health research, and health care services.” 2 For the general public, mHealth apps could empower patients to take a more active role in managing their own health. 3 , 4 They could more effectively engage patients, 5 , 6 positively influence their behavior, and potentially impact health outcomes. 7 Therefore, mHealth apps could help empower high-need and high-cost patients to self-manage their health. 8

Management and control of diabetes, mental health, cardiovascular diseases, obesity, smoking cessation, cancer, pregnancy, birth, and child care are some examples of the targets of mHealth apps for patients and the general public. 9–12 Healthcare professionals also use mHealth apps in performing important tasks including patient management, access to medical references and research, diagnosing medical conditions, access to health records, medical education and consulting, information gathering and processing, patient monitoring, and clinical decision-making. 13–19 With this diversity of use cases and the growth of the needs that could be addressed by mHealth apps, concerns arise about potential misinformation and the role of the health professional in recommending and using apps.

There is continuing worldwide growth in the number of these apps. Recent reports showed that there are more than 325 000 mHealth apps available on the primary app stores, and more than 84 000 mHealth app publishers have entered the market. 20 Despite the huge number of mHealth apps, a quarter of these apps are never used after installation. Many are of low quality, are not evidence-based, 21 and are developed without careful consideration of the characteristics of their intended user populations. 22 Users often pay little attention to the potential hazards and risks of mHealth apps. 23 This has led to an interest in better oversight and regulation of the information in these apps. Changes in mobile devices and software have been accompanied by a broadening discussion of quality and safety that has involved clinicians, policy groups, and, more recently, regulators. Compared to 2011, there is greater understanding of potential risks and more resources targeted towards medical app developers, aiming to improve the quality of medical apps. 24

The decision to recommend an app to a patient can have serious consequences if its content is inaccurate or if the app is ineffective or even harmful. Healthcare providers and healthcare organizations are in a quandary: increasingly, patients are using existing health apps, but providers and organizations have quality and validity concerns, and do not know which ones to recommend. 25 Although there are various tools to assess the quality of health-related web sites, there is limited information and methods describing how to assess and evaluate mHealth apps. 26

The issues concerning mHealth apps are wide-ranging, including location and selection of appropriate apps, privacy and security issues, the lack of evaluation standards for the apps, limited quality control, and the pressure to move into the mainstream of healthcare. 27

Important reported limitations of mHealth apps include lack of evidence of clinical effectiveness, lack of integration with the health care delivery system, lack of formal evaluation and review, and potential threats to safety and privacy. 28

Creating a comprehensive set of criteria that covers every aspect of mHealth app quality, seems to be a complex task. In this study, we aimed to review the existing papers that included an assessment tool or method to assess the quality of mHealth apps, to extract and summarize their criteria, if possible, and to provide a classification of the collected criteria.

Various stakeholders including app developers, citizens/patients, policy makers, health business owners, assessment bodies/regulators, clinicians/health professionals, authorities/public administration, funders/health insurance, and academic departments may find a consolidated view of the existing criteria helpful.

Methods

This study was conducted according to the PRISMA 29 (Preferred Reporting Items for Systematic reviews and Meta-Analyses).

Search Strategy

A comprehensive literature search was conducted in MEDLINE (OVID), EMBase, ISI web of science and Scopus for English language citations published from January 1, 2008 (considering that the first mobile phone app store was started in mid-2008) to December 22, 2016. A researcher with a library and information science degree (RN) developed and carried out a Boolean search strategy using key words related to “mobile health applications” (eg mHealth apps OR mobile health applications OR mobile medical applications OR medical smartphone applications) and keywords related to quality assessment or scaling the mobile health applications (eg evaluation OR assessment OR measurement OR scaling OR scoring OR criteria). We used MeSH terms in Medline and EM Tree terms in EMBase and also truncation, wildcard and Proximity operators to strengthen the search (Supplementary Appendix 1).

Inclusion Criteria

The inclusion criteria were studies in English that provide a tool or method to evaluate mHealth apps and published from January 1, 2008 to December 22, 2016.

Exclusion Criteria

Studies in a language other than English, studies on mobile apps that were not related to medicine or health area, and those that contained a tool without presenting a scientific method for developing it, were excluded. Papers that focused only on the design and development of mHealth apps were also excluded.

Study Selection

After duplicates were removed, the titles and abstracts were screened by two researchers (SRNK, RN) according to the inclusion/exclusion criteria. The full texts of potentially relevant papers were then individually assessed by the same 2 researchers. Disagreements were resolved in consultation with a senior researcher (MY, the lead author), who also examined and confirmed the relevance of all included papers.

Data Extraction and Classification

Data elements extracted from selected articles, included authors, date of publication, subject fields of target mHealth apps, method used to develop the assessment tool, and the assessment criteria. The extracted criteria were then reviewed by an expert panel including two medical informatics specialists and two health information management specialists. The panel compared similar criteria from different studies and created categories that grouped all criteria relevant to a specific concept of evaluation (eg, usability). We analyzed each criterion and tried to find or create classes and subclasses that could encompass all criteria. Whenever a criterion did not match an already existing class, a new class was added. The criteria found in each of the included studies, the classes and subclasses were then listed incrementally, as they were discovered.

Results

We retrieved 1057 records by searching the previously mentioned databases. After removing duplicates, 851 articles remained. Based on the review of titles and abstracts, 72 were found to have met the initial selection criteria. After examination of the full texts, 23 articles met the inclusion criteria and were included in the final review. Two of these articles were found by reviewing the references ( Figure 1 ). Most of the 23 articles were published in 2016 (39.1%) or 2015 (34.8%).

An external file that holds a picture, illustration, etc. Object name is ocy050f1.jpg

The flowchart that schematizes the approach to identify, screen and include the relevant studies in this review.

Some of the assessment criteria and tools described in the included studies 30–42 were developed based on a literature review. The reliability and validity of these tools were then evaluated in the respective studies. Others 43–52 were adapted versions of a selected array of existing tools for software or website evaluation. Many of the mobile apps evaluation criteria, which were reviewed in this investigation, were originally developed as website assessment criteria; Silberg, 37 , 44 , 47 , 51 HON code, 37 Kim Model, 30 , 33 , 38 Brief DISCERN, 37 HRWEF (Health-Related Website Evaluation Form) 36 and Abbott’s scale 37 are examples of these instruments and scales.

There is great variety in the mHealth apps assessment criteria and their classifications in the articles reviewed. Jin and Kim 30 developed an evaluation tool for mHealth apps that contains 7 main criteria including accuracy, understandability, objectivity, consistency, suitability of design, accuracy of wording, and security. Stoyanov et al. 33 identified 5 main categories of criteria including 4 objective quality scales: engagement, aesthetics, functionality, and information quality, and one subjective quality scale, with 23 sub-items. Anderson et al. 34 presented a protocol for evaluating self-management mobile apps for the patients suffering from chronic diseases; 4 main groups of criteria including engagement, functionality, ease of use and information management were addressed. Taki et al. 36 presented 9 main groups of criteria: currency, author, design, navigation, content, accessibility, security, interactivity and connectivity, and software issues. Loy et al. 41 developed a tool for quality assessment of apps that targets medication-related problems. Their criteria were classified in 4 main sections including reliability, usability, privacy, and appropriateness to serve intended function. Reynoldson et al. 52 proposed another classification of criteria which included 4 main classes of criteria: product description, development team, clinical content, and ease of use. Therefore, there were not homogeneous definitions across the different sets of criteria we reviewed. This was due to not clearly defined or non-existent documented definitions for the assessment criteria in each set of criteria. However, the problem is not raised within each set of criteria and individual studies were consistent independently according to their context.

Most of the tools found in the included articles 30–33 , 39 , 42–45 were developed for no special subject category of mHealth apps. Other tools were developed for specific subject categories of mHealth apps including self-management of asthma, 46 cardiovascular diseases apps, 47 chronic diseases, 34 depression management and smoking cessation, 35 HIV prevention, 48 infant feeding, 36 medication-related problems, 41 mobile patient health records (PHRs), 40 self-management of pain, 52 panic disorder, 37 self-management of heart diseases, 49 prevention and treatment of tuberculosis 50 and weight loss. 51 Table 1 summarizes the main characteristics of each studied paper.

Table 1.

Details of included studies

Engagement Functionality Aesthetics Information subjective quality

The criteria are not classified from general to specific. Some of the criteria that had greatest interrater reliability are:

- Interactiveness/ Feedback - Password protection - Uploaded by health care agency - Number of consumer ratings - Explicit privacy policy - Encryption - Basis of research - Product advisory support Usability criteria including: - Attractiveness - Learnability - Operability - Understandability Usability criteria including: - Visibility of system status - Match between system and the real world - User control and freedom - Consistency and standards - Help users recognize, diagnose, and recover from errors - Error prevention - Recognition rather than recall - Flexibility and efficiency of use - Aesthetic and minimalist design - Help and documentation Understandability Objectivity Consistency Suitability of design Accuracy of wording Quality of the health information including: - Authorship - Attribution - Disclosure - Error prevention - Completeness - Memorability - Information needs - Flexibility /Customizability - Learnability - Performance speed - Competency - Other outcomes Product description Clinical content Interface design Ease of use was Android guidelines Quality of Experience Content quality Ease of use Availability Performance Appearance Interactivity, Abbott’s scale The health on the net code (HON) Brief DIS9CERN Silberg scale for accountability self-help model Content Quality Authorship Attribution Disclosure Ease to Use Usefulness Navigation Accessibility Interactivity and connectivity Software Issues Medical aspects and content validity Legal consistency Ethical issues IT security Engagement Functionality Aesthetics Information subjective quality Appropriateness Reliability Engagement Functionality Ease of use Information management Remind/alert Communicate Attribution Purpose clearly stated Privacy policy Information referencing Contact details Funding model clear Editorial/advertising policy Justification of claims Accountability Scientific coverage Content accuracy Technology-enhanced features Security risks Safety measures Quality of Experience including: - Content quality - Ease of use - Availability - Performance - Appearance

There were 38 main classes of criteria in the 23 papers reviewed: Accessibility, Accuracy, Advertising, Policy, Aesthetics, Appearance, Attribution, Authority, Availability, Complementarity, Consistency, credibility, Currency, design, disclosure, Ease of Use, Engagement, Ethical Issues, Financial disclosure, Functionality, Information/content, Interactivity and connectivity, Justifiability, learning, legal consistency, navigation, Objectivity, performance, Precision, privacy, reliability, Safety, Security, User-perceived value, Transparency, Understandability, Usability, usefulness, and Wording Accuracy. The criteria, the articles in which they appeared, and the details from each article (criteria sub-classes, descriptions and questions) are included in Supplementary Appendix 2. These criteria were reclassified by our expert panel. In this process, some classes of criteria were merged, and the sub-class criteria were rearranged under the main class to provide a consolidated classification of evaluation criteria for mHealth apps. The consolidated classification contains 7 main classes of criteria including Design, Information/Content, Usability, Functionality, Ethical Issues, Security and Privacy and User-perceived value. These 7 main classes contain a total of 37 sub-classes of criteria ( Figure 2 ). More details are presented in Supplementary Appendix 3.

An external file that holds a picture, illustration, etc. Object name is ocy050f2.jpg

Outline of developed classification of mhealth apps evaluation criteria.

Each of the main classes in the consolidated classification was mentioned in several studies, either directly or indirectly. “Design” was found in 9 different articles 30 , 31 , 33 , 34 , 36 , 38 , 48 , 49 , 52 “Information/Content” (using different terms) observed in 15 studies 30 , 33 , 34 , 36–38 , 41 , 42 , 44 , 45 , 47 , 49 , 51 , 52 “Usability” in 14 articles, 30 , 31 , 34 , 36 , 38–43 , 47–49 , 52 “Functionality” in 7 studies, 31 , 33 , 34 , 38 , 43 , 49 , 50 “Ethical Issues” in 1 study, 42 “Security and Privacy” in 8 articles 30–32 , 36 , 41 , 42 , 49 , 52 and finally “User-perceived value” in 4 studies. 31 , 33 , 38 , 49

Discussion

We conducted a systematic review of the studies that applied a type of evaluation method or an assessment approach for mHealth apps. After reviewing all these studies, we have classified the extracted criteria in 7 main groups including: design, information/content, usability, functionality, ethical issues, security and privacy, and User-perceived value. Each of these classes was divided into various sub-classes. In total, we identified 37 sub-classes of criteria.

The development and evaluation of tools for assessing mHealth tools is an area of active research interest. More than a third of the papers included in this review were published in 2016, our last year of coverage, and therefore not included in previous systematic reviews on the topic. BinDhim et al. 26 conducted a systematic review and tried to summarize the criteria used to assess the quality of mHealth apps and analyze their strengths and limitations. Their study was limited to the mHealth apps designed for consumers. More recently, Grundy et al. 60 conducted a systematic review to find and explain emerging and common methods for evaluating the quality of mHealth apps. They also provide a framework for assessing the quality of mHealth apps.

There are great differences in the way assessment criteria are defined and classified in the studies reviewed. For example, “usability” was treated in many different ways. Zapata, et al. 39 divided “Usability” in four sections; attractiveness, learnability, operability, and understandability. Brown et al. 43 had other sub-classes under “usability’ including error prevention, completeness, memorability, information needs, flexibility/customizability, learnability, performance speed, and competency. In the study of Yasini et al., 42 “usability” included ease of use, readability, information needs, operability, flexibility, user satisfaction, completeness, and user contentment by look and feeling perceived after using the app. Anderson et al. 34 viewed usability as a sub-class of ease of use, while Yasini et al. 42 and Loy et al. 41 placed ease of use under usability. Reynoldson et al. 52 classified ease of use and usability as 2 separate classes that each one has its own sub-classes. Stoyanov et al. 33 placed usability as a part of functionality.

In the study of Cruz Zapata et al., 40 usability included style, behavior, and structure subclasses. In the study of Loy, 41 it included ease of use, user support, and ability to adapt to different user needs.

Similarly, “design” was identified in some studies 30 , 36 , 52 as a separate main criterion, but in others it was placed under 4 separate criteria including functionality, 33 , 34 , 38 usability, 48 consistency, 30 and engagement. 33 , 38 Design is a multi-dimensional criterion and may be considered from various viewpoints.

In some of the studies, criteria were used interchangeably or overlapped. For example, security, privacy, and safety overlapped in some sets of criteria, and there were different interpretations of these 3 concepts. 30 , 32 , 36 , 41 , 49

Mobile health apps have various functionalities. Two of the assessment tools reviewed (Yasini 42 and Loy 41 ) provided dynamic assessment criteria based on the use cases and features of specific mHealth apps. In these methodologies, the relevant criteria are selected for each app according to its use cases. For example, the criterion “accuracy of the calculations” will be only used for apps that provide at least one calculation. This can lead to more accurate and efficient assessment. As an example, the criteria to assess an app that geolocates the nearest pharmacy in real time would be completely different from the criteria to assess an app created to manage a chronic disease. Dynamic assessment of apps according to use cases is not in contradiction with providing a single and comprehensive set of criteria in a data base. Therefore, the first step would be detecting the use cases offered by the app. This could be carried out by using a classification questionnaire. Once the use cases of an app are discovered, the appropriate criteria could be selected to assess the app. If the data base is well designed with relevant decision trees, the assessment criteria could be selected even automatically according to the answers given to the classifying questionnaire. It is clear that some criteria are applied to all apps and do not need to be selected for special functionalities.

We have classified the criteria extracted from the 23 studies reviewed to provide a consolidated and inclusive set, based on literature published through December 2016. There will never be a complete and perfect mHealth apps assessment criteria, because these criteria must apply to apps that are changing in development continuously. We need decisive, accurate and reliable criteria to assess the compliance of these apps to the existing regulations and best practices. We do not have to add a jungle of criteria to the existing jungle of apps. Today, many public and private institutions (The French National Authority for Health, 61 NHS in England, 62 the European commission, 63 etc.) are working to publish guidelines concerning mobile health applications. We hypothesized that all experts of all institutions could collaborate to create a community that publishes common exhaustive guidelines in this field. An open source project in this field allows us to ensure adaptability and transparency. To meet this aim, the results of this study as a framework of criteria and sub-criteria can be applied as an approved layout for further investigation. Furthermore, developing a new assessment tool for mHealth apps based on the classification presented in this study could be one of the perspectives of the utility of this research. A considerable number of papers was published on this topic after the cutoff for this review, including some reporting on the application of some existing reviewed assessment criteria (for example, uMARS 38 ). The reports of experiences with existing criteria are likely to be a valuable source of input for any efforts to achieve one or more sets of standard criteria. This would be the subject of a further research that could be designed applying the criteria used in this review.

Limitation: We excluded non-English articles, and we did not take into account existing guidelines and standards about mHealth apps assessment that were not indexed in our search resources (EMBase, Medline, Web of sciences and Scopus). For example, some European states have published related guidelines in this field. 61 Another limitation was the general lack of good definitions among assessment criteria that could lead to some misinterpretations of the expert panel, for the construction of our consolidated set of criteria.

Conclusion

In conclusion, in this study 7 main categories of criteria and 37 sub-classes were presented for health-related app assessment. There is a huge heterogeneity in assessment criteria for mHealth apps in different studies. This can be either due to the various assessment approaches used by researchers or different definitions for each criterion. Although in some cases providing a unique and scientific definition of a criterion or defining its place in an appropriate hierarchy of criteria may be very difficult, it seems necessary to reach a consensus by experts of this field about related concepts. It is also necessary to provide precise and mutually exclusive definitions for assessment criteria. Other findings indicate that, depending on their use cases, different kinds of mHealth apps may need different assessment criteria. Addressing these points may lead to improvement of existing tools and development of better and more standard mHealth app assessment tools.

Funding

This work was supported by the Deputy of Education of Tehran University of Medical Sciences.

Contributors

All authors contributed to the study conception and design. The search procedure (screening the papers and full text assessment) was carried out by RN and SRNK with arbitration and confirmation by MY. Drafting of the manuscript was carried out by RN, SRNK and MG. All authors contributed the analytic strategy to achieve the final classification of assessment criteria. MY critically revised the manuscript and provided insights on the review discussion. GM and MY approved the version to be published.

Competing interests

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.