Recent developments in machine learning and artificial intelligence (AI) have produced numerous achievements across a range of industries, including healthcare. Intelligent medical system innovations have completely changed the way healthcare services are delivered. These innovations cover everything from automating administrative activities and cutting operating expenses to generating clinical diagnoses, creating customised treatments and medications, and helping with patient monitoring. The authors of this book highlight important uses of AI in the broader field of health care, where it has achieved notable success.
The many chapters will provide the readers with an array of instances that demonstrate the vast array of application domains that employ cutting-edge AI techniques, so corroborating the adaptability and efficacy of an AI strategy in the medical and health fields. We believe that this book is perfect for both those who are unfamiliar with the idea of artificial intelligence in healthcare and early-career scholars who want to deepen their understanding of the field. While by no means comprehensive, the list of applications that will be provided is undoubtedly diverse.
About this book
Recent developments in machine learning and artificial intelligence
(AI) have produced numerous achievements across a range of industries,
including healthcare. Intelligent medical system innovations have
completely changed the way healthcare services are delivered. These
innovations cover everything from automating administrative activities
and cutting operating expenses to generating clinical diagnoses,
creating customised treatments and medications, and helping with
patient monitoring. The authors of this book highlight important uses
of AI in the broader field of health care, where it has achieved
notable success.
The many chapters will provide the readers with an array of instances
that demonstrate the vast array of application domains that employ
cutting-edge AI techniques, so corroborating the adaptability and
efficacy of an AI strategy in the medical and health fields. We believe
that this book is perfect for both those who are unfamiliar with the
idea of artificial intelligence in healthcare and early-career scholars
who want to deepen their understanding of the field. While by no means
comprehensive, the list of applications that will be provided is
undoubtedly diverse.
FOREWORD
The Roundtable on Evidence-Based Medicine was established in 2024 with the goal of giving national experts in healthcare and medicine a reliable space to collaborate on their shared dedication to providing creative, efficient care that continuously adds value for both patients and society. "A system in which science, informatics, incentives, and culture are aligned for continuous improvement and innovation, with best practices seamlessly embedded in the delivery process and new knowledge captured as an integral by-product of the delivery experience" is the definition of the "Learning Health System," which became apparent very quickly. The Digital Health Learning Collaborative was founded to further this objective and acknowledge the growing significance of data and analytics advancements in digital health for accomplishing this goal. The remarkable implications of rapid advances in artificial intelligence (AI) and machine learning for clinical and preventative medicine have become crucial issues for the consortium over the course of the cooperation. The article you are currently reading addresses the need for basic understanding, the state of the art, and the implications of the AI and machine learning revolution for doctors, nurses, and other clinicians, data scientists, health care administrators, public health officials, policy makers, regulators, buyers of health care services, and patients. I think that anybody looking for knowledge regarding vital definitions, concepts, applicability, pitfalls, rate-limiting steps, and future developments in this growingly significant field will find this publication to be pertinent, pertinent, intelligible, and helpful.
Dr. Vishwesh Prashant Joshi
M.D.S. (CONSERVARTIVE DENTISTRY AND ENDODONTI
Healthcare applications of artificial intelligence
1.1. The need for artificial intelligence in healthcare
1.1.1. Key issues facing the healthcare systems in the EU
Prior to going over the most current advancements in medical AI in this
chapter, it's critical to outline the primary healthcare issues and
unmet demands that could be helped by the use of AI in medical care in
the future:
ageing population and persistent illnesses. In 2017, on average, 37% of
the EU member states' ageing population reported having two or more
chronic illnesses. In the EU, on average, 47% of men and 56% of women
over 80 reported having several chronic conditions (OECD/European
Union, 2020).
Insufficient medical staff.Europe has shortages in both the number and calibre of medical professionals. In 2013, it was anticipated that the EU lacked 1.6 million healthcare workers overall; to make up for this deficiency, annual exponential growth of more than 2% would be required. But since this pace of growth has not materialised, it is projected that by 2030, there would be 4.1 million shortages in the healthcare industry (0.6 million doctors, 2.3 million nurses, and 1.3 million other healthcare workers) (WHO, 2016; Michel, 2020).
Incompetence. The OECD (2017) has provided significant evidence of pervasive inefficiencies throughout EU healthcare systems. Even while each country's healthcare system varies in its ability to convert resources into results, there is a significant amount of resource waste associated to health care, which drives up costs (Medeiros, 2015).
Durability. In the EU, the problem of sustainable health systems is expanding quickly. The OECD publication 'Health at a glance: Europe 2020' states that the EU spends 8.3% of GDP on healthcare, with notable regional variations: 11% of GDP is spent in Germany and France, and less than 6% in Luxembourg and Romania. It is anticipated that health spending will keep rising, mostly as a result of sociodemographic shifts including the ageing of the population and the rise in chronic illnesses and long-term care requirements that follow, as well as the influence of new technology. Apart from the above-mentioned obstacles, economic hardships have put a great deal of strain on EU healthcare systems recently (Quaglio, 2020).
Imbalances in healthcare. The people of the EU's member nations continue to experience disparities in healthcare. Among the fundamental tenets of the recently established European Pillar of Social Rights is the entitlement of each and every EU citizen to prompt, inexpensive, high-quality preventive and curative treatment (European Commission. The European Pillar, 2021). Numerous obstacles and disparities concerning healthcare access were noted in a recent report. These included: (a) insufficient public funds allocated to the healthcare system; (b) dispersed population coverage; (c) gaps in the benefits covered; (d) exorbitant user fees, especially for prescription drugs; (e) inadequate safeguards against user fees for vulnerable groups; and (f) a lack of clarity regarding the waiting list process.
1.1.1. Principal Healthcare Application Domains for AI
Up until now, artificial intelligence has been gradually developed and used into almost every facet of medicine, including public health, emergency medicine, primary care, rare diseases, and biomedical research. New AI-mediated technologies are also projected to enhance policy and several management issues associated to health administration (e.g., increased efficiency, quality control, fraud reduction) (Gómez-González, 2020).
We use a more thorough categorization of AI applications for the sake
of this study, grouping them into four practices: 1) clinical; 2)
research; 3) public health; and 4) administrative (Figure 2). The most
recent advancements and uses of AI in these four fields are outlined in
the sections that follow.
Figure 2: Primary categories of AI technologies examined in this study
Images are not included in the reading sample.
1.1. The use of AI in therapeutic settings
AI has a huge potential for use in the clinical setting, with
applications ranging from clinical research and therapeutic
decision-making to the automation of diagnostic procedures. Numerous
sources provide the information needed for diagnosis and therapy,
including genetic data, clinical notes, laboratory results,
pharmaceutical data, and medical imaging.
Automation of image analysis (e.g., radiology, ophthalmology,
dermatology, and pathology) and signal processing (e.g., ECG,
audiology, and electroencephalography) are two activities where
artificial intelligence will be crucial. AI has the potential to
improve clinical workflows by integrating test and image
interpretation, as well as by facilitating the integration of results
with other clinical data (Topol et al., 2019).
Radiology
One of the medical fields where artificial intelligence has advanced significantly in recent years is radiology. Imaging AI technologies have the potential to help radiologists with the quantification of medical images. Deep network models, for instance, have made it possible to autonomously localise and define the boundaries of anatomical structures or lesions, enabling segmentation with little to no human supervision (Peng & Wang, 2021). Additionally, radiologists can focus on pictures that are most likely to be aberrant by using these AI technologies to prioritise and track abnormalities that require immediate attention (Lee et al., 2018; Peng & Wang, 2021).
Another imaging processing method where AI has shown promise is radiomics. The overall goal of radiomics, despite the lack of a formal definition, is to extract quantitative data from diagnostic and treatment planning images (also known as radiomic characteristics) (Gillies, 2016; Mayerhoefer et al., 2020). Radiomic features can be utilised alone or in conjunction with demographic, histologic, genomic, or proteomic data to solve clinical problems. They capture tissue and lesion properties such as heterogeneity and form. When AI approaches are used to process the abundance of information that radiomics gives, the impact of radiomics rises (Cook et al., 2019; Mayerhoefer et al., 2020).
In the field of imaging-based diagnosis, a recent meta-analysis evaluated the effectiveness of radiologists and deep learning software (Liu, 2019). The study found that deep learning models can diagnose problems just as well as medical practitioners. Nonetheless, a significant discovery of the review is that the majority of the examined studies have significant limitations: Very few studies reported comparisons with health professionals using the same test dataset; (i) most studies assessed deep learning diagnostic accuracy in isolation (many were excluded at screening because they did not provide comparisons between the human and the machine); (ii) very few studies reported prospective studies conducted in real clinical environments (most studies were retrospective and based on previously assembled datasets).
Digital pathology
When the word "digital pathology" was first used, it referred to the process of digitising entire slide images by sophisticated slide-scanning methods. These days, it also refers to AI-based methods for digital picture analysis and identification (Bera et al., 2019; Niazi et al., 2019). Histopathological analysis is intrinsically constrained by its subjective nature and by disparities in opinion across independent experts, even while the application of established criteria can aid in the harmonisation of diagnostic procedures (Chi et al., 2016; Evans et al., 2008; Bera et al., 2019).Inter-subject and inter-operator variability are two issues that AI can help oncologists and pathologists deal with lessening.
According to a number of studies (Ehteshami Bejnordi et al., 2017), AI can be as accurate as pathologists and, more importantly, when combined with them, can perform better during diagnosis (Steiner et al., 2018; Bera et al., 2019). Artificial Intelligence has been used in digital pathology for a range of image processing and classification applications. These comprise lower-level tasks like object recognition problems (Sornapudi et al., 2018) and higher-level tasks like assessing the severity and outcome of diseases (Mobadersany et al., 2018), predicting disease diagnosis and prognosis (Corredor et al., 2018), and predicting therapy response using assays (Bera, 2019).
Emergency medicine can benefit from AI in different phases of patient management. For instance, it offers potential value for improved patient prioritisation during triage, and is versatile in analysing
different elements of the patient's clinical history. Currently, patients are assessed with limited information in the emergency department (Berlyand et al., 2019; Kirubarajan et al., 2020). However, there is potential for emergency department flow metrics and resource allocation to be optimised through AI-driven decision making (Berlyand et al., 2018). Nevertheless, concerns remain regarding the use of AI for patient safety considering the limited body of evidence to support its implementation (Challen et al., 2019; Kirubarajan et al., 2020).
A recent scoping review analysed the applications of AI in emergency medicine in a total of 150 studies (Kirubarajan et al., 2020). According to the review, the majority of interventions are centred on: (i) the predictive capabilities of AI; (ii) improving diagnosis within the emergency department; (iii) studies focused on triage of emergent conditions; and iv) studies demonstrating that AI can assist with organisational planning and management within the emergency department.
In the field of surgery, judgements occasionally have to be made under pressure and in the dark about the diagnosis and expected course of therapy for a certain patient. Lack of high-level evidence to support crucial management choices or the unavailability of patient data (such as external hospital records or diagnostic test results) can also create uncertainty. Clinicians may instead rely on cognitive shortcuts and quick decisions employing pattern recognition and intuition when faced with such time limits and ambiguity (Dijksterhuis et al., 2006; Loftus et al., 2020).
In the end, these elements may result in prejudice, mistake, and avoidable injury. Conventional decision-support technologies don't seem to be sufficiently capable of handling time restrictions and uncertainty about diagnosis and treatment response in several situations.
1.1.1. Estimating risk
The main goal of risk prediction is to determine how likely it is for people to experience particular health conditions or consequences. It usually produces probabilities for a broad range of outcomes, from fatalities to unfavourable medical occurrences (such as heart attacks, strokes, and bone fractures). The procedure entails identifying those who have specific illnesses or ailments and classifying them in accordance with factors like severity, stage, and other attributes. Following that, these people might be the focus of particular medical therapies (Miotto et al., 2016; Steele et al., 2018; Fihn et al., 2019).
In the medical field, risk prediction models have long been accessible. Unfortunately, because they presently only use portions of the clinical data that is available and regression analysis, they have a low prediction accuracy, which makes them less useful in a clinical environment. Notably, the emergence of vast data repositories and artificial intelligence methodologies has demonstrated encouraging indications regarding AI's applicability in customising patient-specific traditional methods for risk assessment (Islam, 2019). When compared to statistically produced predictive risk models, predictive AI-based models, for instance, have demonstrated superior performance in cardiovascular disease risk assessments (Jamthikar et al., 2019).
Adaptive interventions can be used to enhance medical therapy in two ways: (i) directly through patient self-assessments; or (ii) passively through the use of specific sensors to collect physiological data. Ecological momentary assessment is the term used to describe the process of gathering self-assessments using mobile technology (De Vries et al., 2020). The latter aids individuals in self-monitoring their actions both during and within the context of their occurrences.
For instance, ecological momentary assessment improves the ability to link cravings to maladaptive behaviours, among other benefits in the context of substance-use disorders. In order to obtain location-based data, passive data collection frequently makes use of technologies that track patterns of mobility within the patient's surroundings, such as wireless local area networks (Wi-Fi) and global positioning systems (GPS) (Vijayan et al., 2021).
The ability to collect both spatial and temporal data—that is, the location and time of the subject's behaviors—makes these instruments extremely targeted. Additionally, in order to obtain a more comprehensive profile of the patient's behaviour, including the monitoring of physiological responses, physiological data from specialised sensors (such as those measuring blood pressure, heart rate, temperature, or substance concentration levels in blood) can be combined with spatial and temporal data.
Home care
Over one-fifth (20.3%) of the population of the EU-27 was 65 years of age or older in 2019. Between 2019 and 2100, the percentage of the population that is 80 years of age or older is expected to increase by 1.5 times, from 5.8% to 14.6% (Eurostat. Statistical Outlook, 2020). It is noteworthy that as people age, the prevalence of dementia rises quickly (Quaglio et al., 2016). Compared to 5.9 million in 2000, an estimated 9.1 million adults over 60 in EU member states (or roughly 7% of the over-60 population) were predicted to be living with dementia in 2018.
Indeed, during the next twenty years, the number of dementia sufferers in EU countries is predicted to increase by around 60%, reaching 14.3 million by 2040 (OECD/EU, 2018).
Moreover, AI has a big impact on how older patients and those with chronic illnesses manage their own health. Taking prescription drugs, modifying a patient's diet, and using medical equipment are examples of self-management duties. By monitoring physical space and falls, home monitoring offers the potential to improve ageing at home and promote independence. According to Sapci et al. (2019), tools, software, smartphones, and mobile applications in particular can help patients manage a significant portion of their own healthcare and make interactions with the healthcare system easier.
According to Lopez-Jimenez et al. (2020), the automated processing of cardiac imaging data is the most promising use of artificial intelligence (AI). This processing is essential for the assessment of heart anatomy and function in cardiology. Cardiologists must put in a lot of effort and time processing complex spatiotemporal data from cardiac imaging modalities like cardiovascular magnetic resonance imaging, cardiac ultrasound, and cardiac computer tomography. Cardiologists may now assess patients more quickly in their daily practice thanks to the development of new AI-driven cardiac image processing algorithms, which has completely changed the field of cardiac clinical practice (Lopez-Jimenez et al., 2020).
The prevailing cardiac imaging modality, echocardiography, is strongly dependent on human experience; machine learning (ML) models have the potential to enhance its diagnostic capacity (Alsharqi et al., 2018). Artificial intelligence (AI) is projected to produce automated and more accurate echocardiograms that will minimise the limitations associated with human interpretation while revealing previously unidentified imaging features that will aid in the identification of cardiovascular illness.
This is already the case for electrocardiography (ECG), where big digitised ECG datasets taken from clinical records have been used to create AI models for the procedure (Siontis et al., 2021). These models include deep-learning convolutional neural networks. Consequently, diseases like silent atrial fibrillation and asymptomatic left ventricular dysfunction, as well as phenotypic characteristics like age, race, and sex, can now be detected by AI-enabled ECGs (Adedinsewo et al, 2020; Attia et al., 2019a; Attia et al., 2019b; Noseworthy et al., 2020).
Moreover, artificial intelligence has been widely applied in nuclear cardiology, a field that investigates non-invasive imaging techniques for assessing cardiac blood flow among other things. In order to improve the diagnosis and prognosis of obstructive coronary artery disease, machine learning (ML) models have been specifically applied to two techniques: myocardial perfusion imaging (MPI) and single-photon emission computed tomography (SPECT) (Noseworthy et al., 2020). ML algorithms that can extrapolate data and identify hidden patterns in data derived from clinical records are thought to improve the accuracy of cardiac risk scores, which calculate the 10-year risk of presenting with cardiovascular disease (Quer et al., 2021).
Cardiovascular experts' skills will always be crucial, even though cardiovascular medicine seems to be at the forefront of AI in health. Consequently, in order for imaging processing techniques to realise their full potential and maybe completely transform patient care, practitioners must actively participate in this young and developing field (Quer et al., 2021).
Kidney
Compared to other medical specialties, nephrology uses artificial intelligence (AI) less frequently (Lindenmeyer et al., 2021; Chaudhuri et al., 2021). However, because of the encouraging developments over the past ten years, physicians are beginning to notice its potential more and more. For example, chronic kidney disease (CKD) is non-invasively classified using a novel deep learning model for ultrasound renal imaging (Kuo et al., 2019). Furthermore, the creation of a deep neural network that can annotate and identify human kidney samples has made the digital interpretation of histological pictures easier (Hermsen, 2019).
To improve the early treatment of acute kidney damage (AKI), researchers developed an AI model that can predict inpatient bouts of AKI up to 48 hours ahead of time using data from electronic healthcare records, a general rise in data (Tomašev, 2019). Conversely, the 'Intraoperative Data Embedded Analytics' (IDEA) algorithm has been developed to integrate physiological data obtained prior to and during a surgery in order to forecast the risk of developing postoperative AKI (Adhikari et al., 2019).
AI has the potential to be used in computer-aided kidney cancer diagnostics as well. Algorithms are getting better at detecting renal masses and differentiating between benign and malignant ones as they become more resilient and broadly applicable (Giulietti et al., 2021).
Hepatology
Hepatology is one of the numerous medical fields where AI research is rapidly advancing (Ahn et al., 2021). A multitude of liver diseases have been diagnosed with greater ease thanks to the widespread application of machine learning models, the majority of which are fatal. Since the majority of patients still have non-alcoholic fatty liver disease (NAFLD), attention has mostly been directed towards automated detection of this condition asymptomatic up till the liver cirrhosis started to manifest. A newly created artificial intelligence neural network can diagnose NAFLD with 97.2% accuracy (Okanoue et al., 2021).
Mental health
There is a serious mental health crisis in the EU. In EU Member States, neuropsychiatric illnesses account for 26% of all ailments. These kinds of mental health conditions, particularly depression, account for up to 40% of years spent living with disability in the EU (WHO, 2021a). Anxiety and mood disorders cost the EU approximately €170 billion annually (WHO, 2021a). Furthermore, research has demonstrated that anxiety and depression, particularly major depression, are frequently untreated conditions that significantly increase the likelihood of prolonged sick leave from the workplace.
AI may help patients with mental health issues and lessen the impact of the shortage of medical professionals who specialise in treating mental health issues. Actually, there are a number of tools being developed right now. Using interactive chatbots, speech, voice, facial recognition, keyboard interaction, Fitzpatrick et al. (2017), Fitzpatrick et al. (2017), and other means, these include digital tracking of depression and mood.
AI systems' computational capability may be used to better understand the intricate pathophysiology of mental illnesses and develop therapeutic interventions (Graham 2019; Lee, 2021). Many applications of machine learning have been investigated, including the prediction of antidepressant medication efficacy (Chekroud et al., 2016), depression characterization (Wager et al., 2017), suicide prediction (Walsh et al., 2017), and psychosis in schizophrenics (Chung et al., 2018).
According to Dwyer et al. (2018), artificial intelligence (AI) can aid in the differentiation of diseases that have clinical symptoms but have distinct treatment choices. Examples are the distinction between unipolar and bipolar depression (Redlich et al., 2014) and the many forms of dementia (Lee et al., 2021).
These days, a large portion of the public uses social media as a daily communication tool. Thus, analysing social media language and content patterns can shed light on the field and open up new avenues for predictive psychiatric diagnosis. Online environments may make mental health issues visible, and machine learning-analyzed social media data has already been used to forecast diagnoses and relapses (Reece et al., 2017; Birnbaum et al., 2019; Yazdavar et al., 2020; Lee et al., 2021).AI in biomedical research
1.1.2. Medical studies
While clinical applications tend to gain more from AI-derived solutions than biomedical research does, recent developments also point to interesting uses of AI in clinical information retrieval. For instance, ML algorithms—including ones that continually learn from users' search behavior—are already being used by mainstream medical information resources to rank search results (Fiorini et al., 2018a).
One example is PubMed, a widely used search engine for biomedical literature (Fiorini et al., 2018b). The AI technologies implemented by PubMed to optimise its search function include machine learning and natural language processing algorithms that are trained on patterns found in users' activities in order to improve a user's search (Fiorini et al., 2018b). For instance, Best Match is a new search algorithm for PubMed that leverages the intelligence of PubMed users and cutting-edge ML technology as an alternative to the traditional date sort order. The Best Match algorithm is trained using past user searches with dozens of relevance-ranking signals (factors), with the most important being the past usage of an article, publication date, relevance score, and type of article.
Drug discovery
ML approaches are often used by drug designers to create new medications by extracting chemical information from vast compound databases. The creation of AI methods to apply creative modelling based on the size of medication datasets is essential to this change. Consequently, novel approaches to improving the safety and efficacy assessment of potential medications based on big data modelling and analysis have been made possible by recently created AI techniques.
These kinds of AI models can help us comprehend a variety of drug classes and the potential clinical results they provide (Zhu et al., 2020). For instance, a deep learning algorithm was recently developed by researchers to forecast the possible antibacterial activity of compounds. Eight antibacterial compounds that were structurally distinct from well-known antibiotics were found by the algorithm after it screened over one billion molecules and virtually tested over 107 million (Stokes et al., 2020).
Both in vitro and in silico testing have the potential to significantly reduce the cost of drug development when compared to conventional animal models. Drug attritions can be decreased by using in vitro and in silico methods early in the drug research and development process (Zhang et al., 2017). Artificial Intelligence has enormous potential as a means of evaluating chemicals based on their biological potential and toxicity.
Extant artificial intelligence (AI) models, like those founded on quantitative structure-activity relationship (QSAR) methodologies (Golbraikh et al., 2016), possess the capability to forecast a multitude of novel chemicals across diverse biological end points.
Nevertheless, a number of limitations are present in the novel compounds' QSAR model predictions that are produced (Zhao et al., 2017; Zhu et al., 2020). The development of high-throughput screening (HTS) techniques has been accelerated during the last ten years by new initiatives (Zhu et al., 2014). Using a standardised technique, HTS is a process that screens tens of thousands to millions of chemicals. Modern screening programmes can generate massive volumes of biological data, made possible by the combined efforts of combinatorial chemical synthesis and high throughput screening (Zhu et al., 2020).
Clinical examinations
The most reliable technique for evaluating the risks and benefits of any medical intervention is to use randomised controlled trials (RCTs). But conducting an RCT isn't always possible. Poor patient selection, insufficient randomization, small sample size, and poor end point selection are common causes of failed RCTs (Lee et al., 2020). AI models can be trained to evaluate study endpoints in a data-driven manner and to choose trial participants using sophisticated statistical techniques more accurately. According to Lee et al. (2020), the use of AI will produce statistical power that is higher and more efficient than that of conventional RCTs.
To identify statistically significant differences across groups, a sufficiently large sample size is essential, in addition to an efficient selection technique. Due to the possibility of a tiny treatment effect, many RCTs need a sizable sample size. AI could help choose the best patients for randomised controlled trials. Moreover, AI might make it possible to quantify important study endpoints more precisely than is currently possible. Additionally, AI will greatly enhance and supplement RCTs in the future. To fully utilise AI algorithms in RCTs, however, increased cooperation and synergy between researchers, physicians, and industry is needed (Lee et al., 2020).
A scientific understanding of how a patient's distinct qualities, such as their genetic and molecular profiles, make them susceptible to disease and responsive to therapeutic interventions is essential to the practice of personalised medicine (Strianese et al., 2020). Numerous genes have been linked to human disease, and genetic variation across individuals has also been utilised to discern unique therapy reactions (Zhu et al., 2020; Strianese et al., 2020). A new idea known as "extended personalised medicine" has been created by incorporating additional features and unique clinical traits into the initial idea of personalised medicine. The latter is derived from other information sources, including clinical sources, demographic and social data, and lifestyle characteristics, including sleep patterns, physical activity levels, dietary habits, and environmental factors (Gómez-González, 2020). By assessing the therapeutic impact of various research techniques and many data kinds, AI technologies may accelerate the advancement of personalised medicine (Mamoshina et al., 2018). Recent advances in this field that rely on computer modelling include drug-target predictions (Sydow et al., 2019), metabolic network modelling, and population genetics pattern identifications (Schrider et al., 2018). (Lorkowski et al., 2021).
The phrase "the science and art of preventing disease, prolonging life, and promoting health through the organised efforts and informed choices of society, organisations, public and private, communities, and individuals" (Wanless, 2004) is one of the many definitions of public health that are commonly used. Numerous public health domains are presently conducting experiments with pertinent AI technologies. A few of these topics are covered in the sections that follow.
AI can be used to pinpoint particular geographic areas or demographic groups that have higher rates of sickness or high-risk behaviours (Maharana & Nsoesie, 2018; Shin et al., 2018). There is a wide range of AI systems available that help enhance disease surveillance. Thanks to data produced by sensors and robots, artificial intelligence has already impacted occupational and environmental health. AI has the ability to better focus services to patients and to increase interaction with them. Contacting a large number of patients using various automated, easily scalable methods—like text messaging and patient portals—is a crucial part of these activities (Fihn et al., 2019).
Global health
Opportunities to solve health issues in low- and middle-income countries (LMICs) may arise from AI. Acute shortages in the health staff and inadequate public health surveillance systems are two of these issues. These issues are not specific to these nations, but they are especially important in low- and middle-income environments because of how they affect morbidity and mortality (Schwalbe & Wahl, 2020). To lessen the workload of healthcare professionals, for instance, AI-driven treatments have occasionally been used in addition to clinical decision making (Guo & Li, 2018). Additionally, advances in AI have made it possible to detect disease outbreaks earlier than with conventional methods (Lake et al., 2019). AI research in low- and middle-income countries (LMICs) has also looked at public health from a wider angle, particularly in terms of health policy and management.
In addition to other health system concerns, these studies include AI research targeted at enhancing the efficiency of healthcare facilities, optimising resource allocation from a systems perspective, and lowering injuries related to traffic (Schwalbe & Wahl, 2020). While AI can aid in resolving a number of current and future health concerns in low- and middle-income countries, many issues still need to be investigated further. These concerns include the actual usefulness and effectiveness of particular AI-driven health solutions as well as their development. Furthermore, it is imperative to establish ethical regulatory norms to safeguard the welfare and interests of nearby communities and promote community-based research and involvement (Collins et al., 2019). Finally, strengthening the underlying healthcare systems will require funding for the successful implementation of numerous AI tools in LMICs.
AI in the management of healthcare
A complex administrative process involving numerous actors and institutions—patients (e.g., billing management), medical professionals, healthcare facilities and organisations (e.g., patient flow), imaging facilities, laboratories (e.g., consumable supply chain), pharmacies, payers, and regulators—defines healthcare systems. Numerous potential areas of concern within this highly administrative context were found in a report conducted in a primary care setting. These include time spent on financial reimbursement claims, data entry into many siloed practice-based information systems, information processing from hospitals and other outside providers, and patient assistance in navigating a disjointed healthcare system. According to the study's findings, bureaucracy accounted for more than 50% of practice time, much of which could have been avoided (Clay & Stern, 2015).
AI is capable of carrying out these mundane jobs in a more impartial, accurate, and efficient manner. Errors in administrative tasks are comparatively less serious than those in clinical settings, which is one argument in favour of implementing AI in these processes. Nonetheless, the risk of cyberattacks, invasions of privacy, and security issues persist (Roski et al., 2019; OECD, 2020). Applications of AI may be vital to the planning of patient flow. For instance, according to Kaddoum et al. (2016), a major factor in surgical cancellations is a shortage of available beds; yet, this is a preventable administrative error in patient flow. This issue arises often and is linked to clinical ward release delays (Stylianou et al., 2017).
The process of gathering data from clinical records and classifying it using diagnostic-related groups (DRGs) or the International Classification of Diseases (ICD) is known as coding. The process of coding is intricate and labor-intensive, and precision in coding is crucial for research, administration, and reimbursement. Although computer-assisted coding has been around for over ten years, artificial intelligence has the potential to improve the precision and openness of this administrative procedure (OECD, 2020).
Making Plans
Another area where AI can improve the administrative process is scheduling. By using algorithms that are given past data, clinicians can take proactive measures to manage the situation by predicting which patients are likely to miss their appointments. AI can respond to a patient's demands and inquiries in addition to providing general or even customised reminders (OECD, 2020).
identification of dishonest behaviour
Additionally, algorithms are able to detect fraudulent activities in the healthcare industry, such as the use of a code for a more expensive medical procedure than actually done (OECD, 2020).
Management of patient flow
Patient flow is defined as the seamless management and transfer of patients through the various stages of treatment with the least amount of delay (NHS, 2017). Notably, patient contentment and the calibre of services given by the healthcare systems has to be preserved at all times. It has been demonstrated that limited patient flow has a detrimental impact on patients, personnel, and the standard of treatment as a whole (Tlapa et al., 2020). AI and other technological solutions are being used more and more for patient flow-related applications (Dawoodbhoy et al., 2021).
The practice of going over patient data to find suggestions for improvement is known as healthcare auditing (NHS England, 2021). This procedure yields advice on how to enhance clinical results in addition to quantitative data on the situation as it stands right now.
AI's potential risk to healthcare
William B. Schwartz predicted that "computing science will probably exert its major effects by augmenting and, in some cases, largely replacing the intellectual functions of the physician" in a paper that was published more than 50 years ago (Schwartz, 1970). Schwartz's forecast has not yet come to pass, even in spite of encouraging instances of AI-powered healthcare solutions. It is challenging to evaluate the true impact of AI health applications because the initial results are not as solid as anticipated (Roski et al., 2019; Fihn et al., 2019).
Some participants assert that, despite the lack of evidence showing a real benefit in patient outcomes, the promise of AI medicine as a whole has been greatly underestimated (Angus, 2020; Parikh, 2019; Emanuel, 2019).
Concerns concerning the potential negative effects of medical AI, such as those related to clinical, technological, and socio-ethical issues, have been voiced by several experts in recent years (Challen et al., 2019; Gerke & Cohen, 2020; Ellahham et al., 2020; Morley & Floridi, 2020; Manne & Kantheti, 2021).
The primary hazards that the research has identified as likely to result from the use of AI in healthcare will be covered in this chapter. We'll concentrate on the following seven risk and challenge categories:
1. Patient injury brought on by AI mistakes
2. Improper use of AI in medicine
3. Potential for bias in medical AI and the maintenance of injustices
4. A lack of openness
5. Security and privacy concerns
6. AI Accountability Gaps
7. Implementation challenges in the real world of healthcare
Not only could these risks result in harms for the patients and citizens, but they could also reduce the level of trust in AI algorithms on the part of clinicians and society at large. Hence, risk assessment, classification and management must be an integral part of the AI development, evaluation and deployment processes.
1.1. Patient injury brought on by AI mistakes
AI-guided clinical solutions in healthcare may be linked to failures that raise safety issues for healthcare service end-users, despite ongoing advancements in data availability and machine learning (Challen et al., 2019; Ellahham et al., 2020). These errors in AI algorithms can result in a number of issues, such as (1) false negatives, which are missed diagnoses of serious illnesses, (2) needless treatments because of false positives, which are healthy people mistakenly classified as sick by the AI system, (3) inappropriate interventions because of inaccurate diagnosis, or erroneous prioritisation of interventions in emergency rooms (Figure 3).
Images are not included in the reading sample.
Figure 3 summarises the reasons behind medical AI algorithm faults and failures, their effects, and some suggestions for possible reduction
Even in the case when large-scale, high-quality datasets are available for AI developers to use in training their systems, there are still three main areas where AI in clinical practice goes wrong. First of all, noise in the input data during the use of the AI tool might have a substantial impact on AI forecasts. For instance, it is well recognised that scanning errors can occur while using ultrasound scanning, the imaging modality most frequently utilised in clinical practice because of its portability and inexpensive cost (Farina et al, 2012).
This is primarily dependent on the operator's experience, the patient's compliance, and the clinical setting (e.g., emergency ultrasonography) (Pinto et al., 2013). Such errors are predicted to occur in some scans, even in high-income countries with highly trained medical staff, hence influencing the AI's subsequent forecasts.
Second, dataset shift—a prevalent issue in machine learning that arises when the statistical distribution of the data used in clinical practice deviates, even somewhat, from the original distribution of the dataset used to train the AI algorithm—may lead to AI misclassifications (Subbaswamy et al., 2020). This change may be brought about by variations in the patient populations served, hospital acquisition practices, or the use of equipment made by various manufacturers.
According to a new study (Campello et al., 2020), artificial intelligence (AI) models that were trained on cardiac magnetic resonance image (MRI) scans from two separate scanners—Siemens and Philips, for example—lose accuracy when used with MRI data that was obtained from a different device—General Electric and Canon, for example.
A multi-center study conducted in the United States that used data from two hospitals to create an extremely accurate AI system for diagnosing pneumonia offers another illustration of dataset shift (Zech et al., 2018). A notable decline in accuracy was observed when testing with data from a third hospital, indicating the possibility of biases unique to that facility. In a another instance, the business DeepMind created a deep learning model for automated retinal illness diagnosis using optical coherence tomography (OCT) that was trained on a sizable dataset.
They discovered that the diagnosis error increased from 5.5% to an astounding 46% when the AI system was applied to photographs that were taken from a different machine from the one utilised for data collecting during the AI training stage. These examples highlight the existing difficulties in developing AI systems that retain a high degree of accuracy in the face of varied data from machines, hospitals, and communities.
Finally, the inability of AI systems to adjust to unforeseen changes in the context and environment in which they are used might lead to inaccurate predictions. Researchers at Harvard Medical School provided a good example in the field of artificial intelligence for medical imaging to highlight the issue (Yu & Kohane, 2019).
An artificial intelligence system, trained to identify shadows or thick structures on chest X-ray pictures linked to lesions in serious diseases like lung cancer, was the model they conjured up. After that, they outlined a few straightforward situations where the AI can make erroneous predictions. For example, if the patient wears a wedding ring and lays their hand on their chest during the scan, or if the X-ray technician leaves the adhesive ECG connectors on the patient's chest. Under these circumstances, there's a chance that the AI model may interpret these circular artefacts as one of the recognised chest lesions, leading to a false positive.
Healthcare AI solutions of the future must be dynamic, meaning they should have built-in methods to keep learning from new situations and errors as they arise in real-world application. The final component, though, will still need some human oversight and monitoring to spot issues as they arise, which might raise expenses and lessen the initial advantages of AI. Regular AI updates, based on both new and old training, will also require technological and infrastructure advancements. Policies ensuring the integration of these mechanisms into healthcare settings will also need to be put in place.
Misuse of artificial intelligence in medicine
Medical AI is susceptible to human mistake and misuse, just like the majority of health technologies. Even with reliable and accurate designed AI algorithms, their effectiveness depends on how end users—physicians, other healthcare professionals, and patients—use them in real-world scenarios. Inappropriate use of AI tools may lead to inaccurate medical evaluation and decision-making, which may put the patient at risk. Thus, it is imperative that medical professionals and the general public not only have access to medical AI tools but also know when and how to use them.
figure 4 – Main factors that can lead to incorrect use of medical AI algorithms by clinicians and citizens and potential mitigation measures to improve usability of future algorithms
Figure 4: Key elements that may cause physicians and the general public
to misuse medical AI algorithms, along with suggested countermeasures
to enhance the usability of algorithms in the future
Images are not included in the reading sample.
The medical AI systems now in use are vulnerable to misuse or human mistake due to a number of issues (Figure 4). Firstly, there's a good chance that computer scientists and data scientists created them, with little input from healthcare specialists and end users. Because of this, learning how to utilise and adjust to the new AI technology is the responsibility of the user, which could result in complicated and unusual interactions and experiences for the patient, clinician, nurse, or data management.
Existing medical AI technologies are vulnerable to human mistake or improper application due to a number of reasons (Figure 4). First off, end users and clinical experts have not always been heavily involved in the design and development process, which has been left to computer/data scientists. Because of this, the user—that is, the patient, the data manager, the physician, or the nurse—must become proficient in using and adjusting to the new AI technology, which may result in strange and complicated interactions and experiences. As a result, the clinical user can find it challenging to comprehend and use the AI algorithm in day-to-day practice, which will reduce the impression of informed decision making and raise the possibility of human error.
The fact that current medical training programmes are not yet designed for medical AI and typically do not provide new clinicians with the knowledge and abilities they need to tackle this issue is what makes it worse. According to a survey of 632 medical students in the fields of oncology, dermatology, and ophthalmology conducted in Australia and New Zealand in 2021, 71% of participants thought AI would advance medicine, particularly in the areas of better disease screening and the simplification of tedious tasks (Scheetz et al., 2021).
However, the majority of respondents (>80%) said they had never employed AI applications in their clinical practice, and only 5% thought they had outstanding knowledge of the subject.
A different study conducted in the UK with 484 students from 19 medical schools revealed that not a single student had any AI instruction as required by law (Sit et al., 2020). Similar findings were made for health professionals' understanding and use of technology-based interventions in various healthcare sectors across the European Union (Quaglio et al., 2019).
Future medical AI solutions will be actively used by citizens and patients, thus same thoughts on AI education and literacy also apply to them. The public typically has limited awareness and understanding of artificial intelligence (AI) and its use in daily life, according to a 2021 survey with over 6,000 participants conducted in five countries (Australia, the United States, Canada, Germany, and the United Kingdom).
The abundance of readily available medical AI apps is another factor that raises the possibility of medical AI being misused, which could get people and patients into danger. An AI algorithm within the app analyses and evaluates pictures of the user's skin that are uploaded by the user. For instance, numerous businesses have created commercial smartphone apps for the goal of detecting skin cancer. SpotMole, MelApp, skinScan, and Skinvision are a few instances of these apps. The public can easily access these tools, however there is frequently little information available about the development and validation process of the AI algorithms in issue, and there isn't always proof of their clinical efficacy or reliability.
Symptomate, Achu Health, Diagnostics.ai, DDXRX Doctor Ai, and many other AI-powered web and mobile applications have surfaced in a variety of medical fields and are commercially available for medical diagnostics and health monitoring, according to a brief search. While these services can be a promising way to diagnose patients remotely and monitor their progress, their widespread availability online raises potential public health issues, much as how easily accessible internet pharmacies have led to prescription drug abuse among the general public (Bandivadekar, 2020).
There are multiple ways to minimise human mistake or improper application of upcoming medical AI technologies (Figure 4). In the first place, end users—such as medical professionals, experts, technicians, or patients—should be actively involved in the creation and development of AI solutions to guarantee that their viewpoints, preferences, and environments are effectively incorporated into the finished products that will be used. In order to improve the knowledge and abilities of future AI end users and hence lower human error, education and literacy courses on AI and medical AI should be created.
1.1. Bias in medical AI and the maintenance of injustices
Figure 5 lists the most prevalent biases in medical AI, along with possible countermeasures to create algorithms that are more equitable and fairer.
Images are not included in the reading sample.
There are still significant disparities in access to healthcare in the majority of the world's nations, despite ongoing advancements in medical research and healthcare delivery. Sex and gender, age, ethnicity, income, education, and location are the primary causes of these disparities and inequalities. Human prejudices are a major factor in these disparities, even though some of them are systemic in nature—for example, because of socioeconomic disparities and discrimination. For instance, studies conducted in the United States have shown that physicians do not treat Black patients' complaints of pain with the same urgency or seriousness as they do White patients' (Hoffman et al., 2016). Despite continuous improvements in medical research and healthcare delivery, there are still large gaps in access to healthcare in most of the world's countries. The main factors that contribute to these differences and inequalities include sex and gender, age, ethnicity, income, education, and geography. Even though some of these discrepancies are institutional in nature—due to discrimination and socioeconomic inequities, for example—human prejudices play a significant role in these differences. For example, research done in the US has revealed that doctors do not address Black patients' pain concerns as urgently or seriously as they do White patients' (Hoffman et al., 2016).
Thus, there have been worries in recent years that future AI solutions may reinforce institutional gaps and human biases that lead to healthcare inequities if they are not adequately applied, assessed, and regulated. A few instances of algorithmic biases—some of which are listed below—have already garnered media attention in recent years.
An algorithm used in the US to aid in the referral process for patients in need of additional or specialised care has been demonstrated to discriminate against Black patients, according to a 2019 study published in Science (Obermeyer et al., 2019). According to the study's authors, Black patients are significantly sicker than White patients at a given risk score, as seen by symptoms of uncontrolled illnesses. By correcting this imbalance, the proportion of Black patients who receive extra assistance would rise from 17.7 to 46.5%. In 2020, Seyyed-Kalantari et al. conducted a study in Canada to assess the level of fairness of cutting-edge deep learning algorithms used to identify anomalies in chest X-ray pictures, such as pneumonia, lung lesions, fractures, and nodules. According to the study, patients on public health insurance for low-income individuals and households, Black patients, and young girls (ages 0 to 20) had the highest rates of underdiagnosis. Additionally, the highest rates of underdiagnosis were seen in individuals with intersecting identities, such as a female Hispanic patient with low-income health insurance. According to Seyyed-Kalantari et al. (2020), the authors' conclusion was that "models trained on large datasets do not provide equality of opportunity naturally, leading instead to potential disparities in care if deployed without modification."
Many contend that bias in the data used to train the machine learning models is the primary source of injustice in medical AI. In a recent talk on artificial intelligence in healthcare, Marzyeh Ghassemi of the University of Toronto said that biases are already prevalent in the clinical environment (Ghassemi, 2021). Thus, it isn't as though machine learning is trying to harm humans. It is that we may detect some of the biases that humans have introduced into the data while we are training on data that humans create, label, and annotate.
For instance, it was discovered in 2002 that just 4% of Black individuals were included in the National Lung Screening Trial databases, which collected data from 53,000 smokers to study strategies for early lung cancer identification (Ferryman & Pitcan, 2018). The International Skin Imaging Collaboration, one of the most popular open-access databases of skin lesions, contains images of mostly fair-skinned patients from the United States, Europe, and Australia. All too frequently, machine learning algorithms for skin cancer detection have been trained on highly biassed datasets. The diagnosis of melanoma lesions on dark-skinned people may be hampered by diagnostic algorithms that were exclusively developed on fair-skinned populations.
Geographic bias is another kind of bias that can be found in datasets. In 2020, a review of five years' worth of articles that had been utilised to train patient care-related deep learning algorithms was carried out by Stanford University biomedical researchers and radiologists (Kaushal et al., 2020). They discovered that only data from California, Massachusetts, and New York were used in 71% of US research where the geographic location was determined. Furthermore, they discovered that no data from 34 of the 50 states in the United States was included in the study. Due to unequal access to digital equipment and data availability, particularly in Eastern Europe, geographic bias can also be a significant problem throughout Europe.
Bias in the data labelling used in clinical assessment is another possible cause of unfairness in medical artificial intelligence. For instance, studies have demonstrated that women receive underdiagnoses for diseases like cancer and overdiagnoses for conditions like depression as a result of gender stereotypes (Dusenberry, 2018). Additionally, a large-scale Danish study that examined hospital admission data for about 7 million individuals and 19 illness groups discovered that women are identified with most diseases later than men (Westergaard et al., 2019). Importantly, anatomical or genetic abnormalities cannot account for many of these medical disorders, including poisoning, infectious infections, congenital anomalies, and injuries. If the data labels in the health registries are affected by such healthcare disparities, such as in environments where given groups have been systematically misdiagnosed due to stigma or stereotypes, then the AI models will likely learn to perpetuate this disparity (Rajkomar et al., 2018).
In recent years, awareness of algorithmic bias has increased and researchers, particularly in North America, have started to investigate mitigation measures to address the risk of unfairness in medical AI. First, it is evident that AI developers, in collaboration with clinical experts and healthcare professionals, must pay close and continuous attention to the selection and labelling of the data and variables to be used during model training. In terms of important characteristics like sex/gender, age, socioeconomic status, ethnicity, and geography, these must to be balanced and representative. Additionally, it is advised that development teams include social scientists, biomedical ethicists, public health experts, and biomedical researchers in addition to data scientists and biomedical researchers specialists, in addition to citizens and patients. In order to guarantee that a sufficient range of backgrounds, experiences, and requirements is taken into account throughout the AI production lifecycle and that the tools generated are truly representative and based on community-based research, the latter group needs to be as diverse as feasible.
Transparency is lacking
Even with ongoing advancements in medical AI, people with and without expertise alike still perceive current algorithms as complicated, esoteric technologies that are hard to completely understand, trust, and use. Recently, Google developed an AI algorithm for breast cancer screening that attracted a lot of attention due to its promising performance (McKinney, 2020). It was demonstrated to increase the robustness and speed of breast cancer screening, to generalise well to populations outside of the training countries, and to even outperform radiologists in certain scenarios. But because there were so little specifics provided about the algorithm's construction and important technical elements, this work was also criticised by the AI community and the media. While some opponents questioned the safety and efficacy of such an AI tool (Wiggers, 2020; iNews, 2020), a group of scientists presented a demand for greater transparency in medical AI in Nature (Haibe-Kains et al., 2020) using this algorithm as their main example.
One major problem with the development and application of current AI tools in healthcare is the general lack of transparency (Figure 6). It is anticipated that this would lead to a significant lack of faith in AI, particularly in delicate fields like healthcare and medicine that are concerned with the welfare and health of the public. Simultaneously, it is apparent that a lack of credibility will affect how widely patients, physicians, and healthcare systems adopt new AI algorithms.
The ideas of traceability and explainability, which relate to two different degrees of transparency that are necessary, are strongly related to AI transparency. These levels are (1) transparency of the AI creation and usage processes (traceability) and (2) transparency of the AI judgements (explainability).
Images are not included in the reading sample.
Figure 6: Key hazards stemming from the existing lack of transparency surrounding AI algorithms, along with potential countermeasures
As a fundamental prerequisite for reliable AI, traceability is defined as openly recording every step of the AI development process, including monitoring the model's performance in actual applications following its implementation (Mora-Cantallops et al., 2021). More precisely, traceability calls for keeping an accurate record of: (i) model details, including intended use, algorithm or neural network type, hyper-parameters, and pre- and post-processing steps; (ii) training and validation data, including data composition, acquisition protocols, and data labelling; and (iii) AI tool monitoring, including performance metrics, failures, and periodic evaluations (EU Regulation, 2017; FDA, 2019).
In actual use, healthcare AI systems are rarely provided with complete traceability. In actuality, businesses frequently would rather withhold some details about their algorithms, which means that they are provided as opaque instruments that are challenging for outside parties to decipher and analyse. As a result, there is less adoption and trust in real-world applications.
AI explainability is crucial for ensuring transparency for every AI prediction and decision, whereas traceability deals with the transparency of the AI algorithm's lifespan. The 'right to explanation' outlined in Article 22 of the General Data Protection Regulation (GDPR) of the European Union mandates that an explanation of the automated decision-making process be provided (Selbst & Powles, 2017).
However, AI solutions—and deep neural networks in particular—lack transparency and are sometimes referred to as "black box AI" (Yang et al., 2021), a term that highlights the fact that these models learn intricate functions that are difficult for humans to comprehend and whose operations and decision-making processes are opaque. Even in cases where the algorithm itself has the potential to increase the productivity of the clinician, clinicians and other stakeholders find it challenging to integrate AI solutions into their day-to-day work when there is a lack of transparency because in order to work with specific AI solutions, clinicians must be able to comprehend the underlying principles behind each decision and/or prediction (Lipton, 2017).
The transparency of AI technology in healthcare can be enhanced in a number of ways. Initially, each AI algorithm should be required to create a "AI passport" that contains all of the model's important data. Additionally, traceability tools must be created to keep an eye on how AI algorithms are being used after they are implemented. These systems must be able to record errors and performance degradation and conduct regular audits. AI developers should include clinical end-users early in the development process to help improve the explainability of AI algorithms. This will help them choose the best explainability strategy for each application and make sure the explanations they choose are practical and well-received in the clinic.
Privacy and security issues
The increasingly widespread development of AI solutions and technology in healthcare, recently highlighted by the COVID-19 pandemic, has shown potential risks for a lack of data privacy, confidentiality and protection for patients and citizens. This could lead to serious consequences (Figure 7), such as the exposure and use of sensitive data which goes against the rights of the citizens or the repurposing of patient data for non-medical gains.
Images are not included in the reading sample.
First, informed consent—that is, giving patients enough information to make an informed choice, like revealing personal health information—is connected to these difficulties. Since the introduction of digital technology has been ingrained in our daily lives, informed consent has become an increasingly important and fundamental part of the patient's experience in healthcare, as formalised in the Helsinki Declaration (Pickering, 2021). According to Ploug and Holm (2016), informed consent is related to a number of ethical concerns, such as autonomy preservation, privacy protection, and property rights pertaining to data and/or tissue. As per Vyas et al. (2020), the degree of autonomy and the effectiveness of shared decision-making between patients and physicians are restricted by the incorporation of opaque AI algorithms and intricate informed consent procedures.
Patients are finding it more and more difficult to comprehend the decision-making process, the various uses for their data, and the precise steps involved in choosing not to share their data. In big data research, particularly digital platform-based health data research, informed consent concerns are particularly prevalent since patients may not completely comprehend the extent to which their data is shared and reused (McKeown et al., 2021).
A significant instance of this occurred in 2016, when the Google-owned AI company DeepMind, which was developing an app to implement novel methods of kidney disease detection, received the records of 1.6 million patients in the United Kingdom without the patients' informed consent from the Royal Free NHS Foundation Trust (BBC, 2017). The UK Information Commissioner’s Office (ICO) declared in July 2017 that the Royal Free NHS Trust had violated data protection laws. The ICO was famously quoted as stating, 'the price of innovation does not need to be the erosion of fundamental privacy rights' (Gerke et al., 2020).
The application of AI in healthcare also carries the potential of data security breaches, whereby individuals' right to privacy may be violated and they may become targets of identity theft and other cyberattacks. A data breach that occurred in July 2020 at the New York-based AI company Cense AI resulted in the exposure of extremely sensitive patient information, including names, addresses, diagnostic notes, dates and types of accidents, insurance policy numbers, and more, for up to 2.5 million patients who had been involved in auto accidents (HIPPA Journal, 2020). This data was briefly available to anybody with an internet connection worldwide, even though it was eventually secured, highlighting the very real risk of patient privacy breaches.
Data repurposing, sometimes known as "function creep" in some situations, is another ongoing worry (Koops, 2021). In order to highlight an instance in Singapore where the information from the government's COVID-19 tracing applications was also made available for criminal investigations, the World Health Organisation issued a warning against the risk of function creep during the COVID-19 pandemic (WHO, 2021). This is a clear example of how health-related data is being used for purposes unrelated to healthcare, but repurposing can also happen within the healthcare industry. A 2019 study thoroughly examined the various uses of patient data in the European pharmaceutical sector. Pharmaceutical medication development uses registry data, health system data, and electronic health record data.
AI tools are particularly vulnerable to cyberattacks, which can have devastating effects depending on the situation in addition to concerns about data security and privacy. When the Düsseldorf University Hospital experienced a cyberattack in September 2020 that corrupted the hospital's data and rendered the facility's computer system unusable, a patient passed away after having to be sent to another hospital (Kiener, 2020). This case highlighted the actual physical harms that cyberattacks may inflict in the healthcare industry, even though it was later stated that it was not possible to prove that the cyberattack was the direct cause of the death because the patient already had a life-threatening disease. Another instance of how cyberattacks can impact patients' physical health is the April 2021 ransomware attack on the Swedish oncology software company Elekta, which impacted 170 US health systems and exposed private patient information in addition to delaying cancer treatment for patients nationwide (Mulcahy, 2021).
Moreover, studies have demonstrated the attack susceptibility of AI-controlled personal medical equipment. For instance, researchers found that insulin pumps driven by artificial intelligence (AI) in patients with diabetes may be compromised, remotely operated from other locations, and even tricked into over-supplying the patient's body with insulin (Wired, 2019).
Although this hack has never been executed in real life, the AI attack's creation by researchers revealed significant weaknesses in the functioning of the AI system. The topic of how algorithmic security, or lack thereof, might impact human lives in a high-stakes setting like healthcare was raised by these incidents because they attracted sufficient attention. Considering AI tools in the context of a broader technical landscape, it is evident that attacks and hacking threats need to be constantly watched.
To address these vital issues, there needs to be an increase in informed consent, cybersecurity, and understanding and literacy about privacy and security risks. To protect citizens against data breaches and repurposing, laws and regulations must also be strengthened to address accountability in addition to privacy. Supporting federated and distributed approaches to AI is crucial for maximising the potential of massive clinical centre data without necessitating unsafe data transfers. Accelerating and sustaining research is necessary to improve cloud-based system security and prevent cyberattacks on AI algorithms.Gaps in AI accountability
The phrase "algorithmic accountability" is becoming more and more important among academics and companies that study the legal ramifications of introducing and utilising AI algorithms in various spheres of human endeavour. Contrary to popular belief, the phrase "algorithmic accountability" actually refers to the process of trying to hold the algorithm itself accountable: In particular, since AI systems themselves cannot be held morally or legally responsible, it emphasises the fact that algorithms are created through a combination of machine learning and human design, and that errors or wrongdoings in algorithms originate from humans developing, introducing, or using the machines (Kaplan et al., 2018).
Since it will influence medical AI's acceptability, reliability, and potential acceptance in society and healthcare, accountability is especially crucial. For instance, physicians who believe they are routinely held accountable for all AI-related medical errors—even when the algorithms were created by other people or businesses—are unlikely to incorporate these cutting-edge AI solutions into their regular practices. In a similar vein, people will lose faith in citizens and patients if they believe that no one, not even the creators or users of the AI tools, can be held responsible for any potential harm. In order to manage claims, compensations, and sanctions when needed, as well as to guarantee proper responsibility in medical AI, new frameworks and methods are required.
Images are not included in the reading sample.
Figure 8 – Current limitations in accountability and recommendations to fill in these gaps
The determination of liabilities for AI-related medical errors that could cause patient harm is now seriously unclear due to the novelty of medical AI and the lack of legal precedent (Figure 8). The rapidly evolving and expanding domain of medical artificial intelligence presents novel obstacles for lawmakers, regulators, and policymakers. It forces existing laws, rules, and policies to modify their long-standing approaches to defining accountability and liability in light of the realities of AI-assisted healthcare.
Identifying responsibilities among the various parties involved in the development, implementation, and use of medical AI and algorithms (e.g., AI developers, data managers, clinicians, patients, healthcare organisers, etc.) is challenging. Another challenge is pinpointing the exact cause of any AI-related medical error, which can be attributed to the AI algorithm, the data used to train it, or improper use and understanding in clinical practice. Lastly, there are numerous governance frameworks.
While historically the interaction between the patient and the clinician has stood at the core of issues involving medical malpractice and carelessness, the introduction of AI tools into healthcare introduces a new layer with numerous actors into the patient–physician dynamic (Smith, 2020). These players could include AI developers, researchers, and manufacturers, in addition to the patient, clinician, healthcare facility, and healthcare system. All of these parties are now involved in the medical decision-making process in one way or another. The complexity of the problem is increased by the introduction of all these new players and the ambiguity surrounding the allocation of decision-making responsibilities and the operation of the AI tools.
While AI developers and technologists typically operate under ethical norms, medical practitioners are typically subject to regulatory requirements to be able to account for their acts, a necessity that is an essential element of their professional undertaking (Whitby, 2015). As a result, failing to take responsibility for one's actions and decision-making processes could result in medical professionals losing their licence to practise; however, under the current system, a technologist's failure to take responsibility for their actions could have far less severe consequences. Because so many diverse developers and researchers work on any one AI system, it might be challenging to pin the blame for a mistake—even if an AI manufacturer is determined to be at fault. Furthermore, a lot of private organisations employ ethical codes and accountability standards, which have frequently come under fire for being imprecise and challenging to put into effect (Raji, 2020).
It is significant to remember that explainability and transparency are intimately related to the topics of AI accountability and liability in the field of medicine and healthcare. The burden of responsibility will likely rest more heavily on the clinician who used a non-transparent medical AI tool and is unable to explain their medical decision or the error that occurred because the more opaque an AI algorithm is, the more difficult it will be to determine who is accountable for an error involving a patient or a medical decision (Maliha et al., 2021). This is particularly true for helpful AI technologies, which are designed to help clinicians make decisions and may be compared to speaking with a knowledgeable clinical colleague.
There are ways to solve the existing medical AI accountability problem. When humans are harmed by AI-assisted medical judgements, protocols should be put in place to clearly define the roles of AI developers and clinical users. Establishing regulatory organisations devoted to medical AI is also necessary. These will create and implement legal frameworks to guarantee that certain medical AI players, such as AI manufacturers, can be held responsible.
1.2. Implementation barriers in the actual healthcare system
As section 2 summarises, over the past five years, a significant number of medical AI algorithms have been developed and suggested for use in a variety of medical applications. The road to healthcare implementation, integration, and adoption is still paved with unique challenges in the real world, even in cases where medical AI technologies are thoroughly validated, found to be clinically robust and safe, as well as ethically sound and compliant (Shortliffe & Sepúlveda, 2018; Fihn et al., 2019; Nagendran et al., 2020).When it comes to incorporating new technologies into their daily work, healthcare professionals have historically lagged behind other professions (Quaglio, 2018).
Previous experiences in the healthcare industry demonstrate the importance of the implementation phase in the innovation process. Developing and testing new AI technology alone is not enough in practice; other factors that may impede its application in real-world healthcare settings should also be taken into account (Arora, 2020). These factors include the following: (1) the quality and data structure of existing electronic health systems are limited; (2) the nature of the clinician-patient relationship is altered; and (3) challenges associated with clinical integration and interoperability (Figure 9).
Figure 9 – Obstacles for clinical implementation and integration of new AI tools in real- world healthcare practice, together with potential mitigation measures
Images are not included in the reading sample.
First and foremost, enabling the application of medical AI requires high-quality electronic health data in clinical settings. However, the majority of current datasets cannot be used by AI algorithms due to medical data's well-known unstructured and chaotic nature. Moreover, there exists a notable variation in the forms and quality of clinical data among clinical centres and EU member states, as shown by Lehne et al. (2019). The extensive and expensive human editing, quality control, cleansing, and re-labeling of current data would be necessary before developing medical AI technologies could be completely deployed and used on a broad scale. One of the top goals of the European Commission's 2019–2025 plan was to establish a European Health Data Space in order to enhance data interoperability (European Health Data Space). This would facilitate improved cross-EU repurposing of many forms of health data, including data from patient registries, genomics, and electronic health records, among other sources, by using AI algorithms.
It is anticipated that AI technology will change patient-provider relationships in ways that are currently unpredictable. AI has already significantly changed some specialisations, especially those that deal with picture processing (Gómez-González, 2020). Due to enhanced openness and deeper doctor-patient talks, patient-centered AI technologies have the potential to change the traditionally paternalistic clinician-patient relationship into a collaborative collaboration in the decision-making process (Aminololama & Lopez, 2019). But there will be ethical and personal ramifications to sharing knowledge on AI-derived illness risks (such dementia or cancer susceptibility) that need to be clarified (Fihn et al., 2019; Cohen, 2020). It will be necessary to update the clinical guidelines and care models to take the linkages mediated by AI into account.
Clinical guidelines and technical standards are followed by clinicians and care providers. AI technology adoption in daily practice will affect patients and doctors in practical, technological, and clinical ways. Second, without major changes to current clinical practices, care models, and even training programmes, it is unclear how easily medical AI tools will be integrated within existing clinical and technical workflows and how consistently interoperable they will be across clinical sites and health systems (Meskó & Görög, 2020).
To ensure clinical interoperability across different clinical sites and integration across heterogeneous electronic healthcare systems, AI manufacturers will need to work with healthcare professionals and organisations to establish standard operating procedures for all new AI tools. Specifically, it is imperative to build novel AI tools while guaranteeing their future compatibility and exchange with established technologies, such genetic sequencing, electronic health records, and virtual consultations (Arora, 2020).
Methodology for risk assessment
The primary hazards associated with the application of AI in healthcare that have surfaced recently have been covered in earlier sections of this paper. This necessitates a methodical approach to risk assessment and management that targets the ethical, clinical, and technical issues surrounding AI in medicine and healthcare.
AI regulatory frameworks: AI hazards can be categorised and described based on the potential harm they may do, as well as the likelihood and frequency of that impact. Artificial intelligence (AI) poses a wide range of dangers in the healthcare industry, from rare and/or low risks that could cause patients and citizens only minor and manageable harm to frequent and/or high risks that could cause irreversible injury.
Artificial intelligence algorithms have the potential to negatively impact clinical outcomes and patient health. For instance, they may misdiagnose a condition that could be fatal to the patient or fail to accurately identify the boundaries of the heart in a cardiac image volume, requiring manual correction by a cardiologist.
Therefore, for any new AI algorithm and application, it is crucial to identify, assess, comprehend, and monitor the potential dangers in order to avoid risks and enhance benefits in the field of healthcare in the future. The process of risk assessment should include developing a system for grouping the hazards that have been identified into several categories representing different levels and types of risk.
To reduce and address the dangers associated with AI, a set of tests or rules must be provided for each level. Specifically, higher risk classes will necessitate more testing and regulation, while lower risks will result in less risk mitigation measures. A suitable risk classification system for artificial intelligence based on severity and likelihood will allow regulators, manufacturers, and healthcare providers to step in as needed to protect patients' rights and values. At the same time, it's critical that these classifications don't, to the greatest extent possible, stifle innovation in AI for healthcare.
The 2017/745 Medical Devices legislation (MDR) and the 2017/746 In Vitro Diagnostic Medical Devices Regulation (IVDR), both enacted in 2017, are the applicable legislation for medical AI tools in the EU. While the IVDR covers in vitro based diagnostics, including AI-based, the MDR covers software used as medical devices, including AI-based software. Stricter pre-market control, more stringent clinical investigation criteria, enhanced surveillance throughout the device's lifecycle, and enhanced transparency through the establishment of a European database of medical devices were all included in these regulations. Many AI-specific factors, such as the discovery of algorithmic biases and the ongoing learning of AI models, are not taken into account. Specifically, the fact that AI is a very flexible technology that keeps learning.
In 2018, the German Data Ethics Commission made one of the first proposals for risk assessment in the realm of artificial intelligence. They suggested categorising the risks associated with general decision-making algorithms based on their criticality, or the system's potential for harm (German Data Ethics Commission, 2019). A 'criticality pyramid' was suggested, which consists of five risk/criticality levels: 1 being zero or minimal potential for harm, 2 being some potential for harm, 3 being regular or considerable potential for harm, 4 being serious potential for harm, and 5 being untenable potential for harm. Depending on the risk level, this suggestion suggests an adjusted testing or regulatory framework that may include supervision and correction processes, requirements for algorithmic systems' transparency, and outcomes that are understandable and explicable.
In response to concerns about both safety and human rights, the European Commission (EC) released a long-awaited proposal for AI regulation and for harmonising the laws that regulate AI technologies throughout Europe (European Commission, 2021). Comparable to the German Data Ethics Commission's 2018 proposal, the draft EU framework included mandatory rules for high-risk AI systems and a risk-based definition of artificial intelligence. The document specifically suggested categorising AI technologies based on three primary risk levels: (i) unacceptable risk, (ii) high risk, and (iii) low or little risk. The highest level is for AI tools that should be banned since they go against EU ideals.
Some examples of these AI tools are given in the document (Title II, Article 5), including social scoring, real-time biometric identification in public areas (with a few exceptions), exploiting vulnerabilities to cause physical or psychological harm, and subliminal manipulation that results in such harm.
High-risk AI falls into the intermediate group, which is of particular relevance. It can only be approved if the tools meet certain conditions. These high-risk AI tools (Title III, Chapter 1) include certain stand-alone AI systems in areas like the operation of critical infrastructure, access to private services, employment, and worker management, as well as safety components of regulated products (medical devices included). It appears that many medical AI tools, especially those that are autonomous, will be categorised as high-risk. The proposal provides concrete requirements and obligations for adequate risk management in high-risk AI, as listed in Box 1:
Box1–Requirementsandobligations for high-risk AI tools according to the 2021 EC proposal
Images are not included in the reading sample.
The lowest category refers to AI tools with minimal risk, which have no mandatory obligations but the EC encourages drawing up codes of conduct, as well as voluntary application of requirements for high-risk AI systems or other requirements (Article 69).
In addition to these three categories of risks (unacceptable, high and low), the document (Article 52) discusses an additional category of AI systems, such as those that interact with individuals or expose them to emotional or biometric recognition, for which there is an explicit obligation of transparency. In this case, the individuals must be notified that they are interacting with an AI system (Figure 10 ).
Figure 10 – AI risk classification according to the 2021 EU proposal on AI legislation
Images are not included in the reading sample.
Although the draft AI regulation does not expressly address AI in healthcare, it does hint that medical equipment powered by AI will be categorised as high-risk due to related privacy and safety concerns. This means that in addition to the requirements outlined in Chapter II of the AI regulation (use of high quality and representative data, technical documentation and traceability, transparency requirement, human oversight, quality management system, conformity assessment, etc.), future medical AI tools should also meet all the requirements already set forth by the Medical Device Regulation.
It is possible to counter that not all medical AI instruments carry a consistently significant risk. To speed up the contouring of organs and lesions on medical pictures prior to quantification and diagnosis, for instance, numerous AI tools have been developed in radiology (e.g. contouring of the boundaries of the cardiac ventricles or contouring of the boundaries of lung tumours). These AI-powered processing tools are crucial and are now in use in clinical settings, but they are not required to be transparent because the risks are low because the physicians can visually evaluate the outcomes of the automatic contouring and make any necessary corrections. Mechanisms to distinguish between low- and high-risk AI in healthcare may be required in order to continue promoting breakthroughs and investments in this field.
The following forms of CE marking and regulatory approval for medical AI are possible under this new regulatory framework: Find out if the new AI regulation considers the AI technology to be high risk. Verify that the quality control, design, and development of AI systems adhere to the AI regulations. Follow the conformity assessment process to evaluate and prove compliance. Sign a declaration of compliance and add the CE marking to the system. Put the AI technology to use in real-world scenarios or release it for sale.
The EC's proposed regulation of AI is broad and applies to all spheres of society; it ignores the unique characteristics and potential hazards of AI in the healthcare industry. Additionally, the EC plan keeps several of the MDR and IVDR's shortcomings, such as the absence of procedures to handle AI technology' dynamic nature. Continuous learning, which is currently essential to medical AI technologies, might be viewed as a significant change that calls for reevaluating the technology.
Risk reduction by means of risk self-evaluation
Several stakeholders have proposed a structured self-assessment method consisting of predetermined checklists and questions for risk identification in AI. The European Commission, for instance, formed the independent High-Level Expert Group on Artificial Intelligence (AI HLEG), which released the ALTAI assessment checklist for reliable AI. The checklist is divided into seven sections: (1) human agency and supervision; (2) technological security and resilience; (3) privacy and data management; (4) openness; (5) diversity, equity, and nondiscrimination; (6) environmental and social welfare; and (7) responsibility (ALTAI, 2020).
Reliability, privacy, explainability, and fairness self-assessment question examples that were suggested as ways to find possible constraints are given in Box 2:
Box 2: ALTAI checklist self-assessment question examples (ALTAI, 2020)
Regarding dependability:
If the AI system has poor repeatability and/or dependability, might it have detrimental, adversarial, or catastrophic effects (for example, affecting human safety)?
Have you established a clear procedure to check whether the AI system is accomplishing its stated objectives?
Have you investigated if repeatability requires consideration of particular contexts or conditions?
Have you implemented techniques for verification and validation as well as documentation (such as logging) to assess and guarantee various facets of the repeatability and dependability of the AI system? Have you spelt out in detail the procedures you used to evaluate and confirm the AI system's repeatability and dependability?
Have you established a suitable protocol to address situations in which the AI system produces outcomes with a low level of confidence?
Does your AI system make use of continuous learning (online)?
Regarding data privacy:
Have you implemented any of the following measures, or their non-European equivalents, some of which are required by the General Data Protection Regulation (GDPR)?
Impact Assessment on Data Protection (DPIA);
Assign a Data Protection Officer (DPO) and involve them early on in the AI system's development, acquisition, or use phase; measures (such as encryption, pseudonymization, aggregation, and anonymization) to accomplish privacy-by-design and default.
Have you included the right to object, the right to withdraw consent, and the right to be forgotten into the AI system's development process?
Regarding explainability:
Have you informed the users of the AI system's decision or decisions?
Do you often ask users if they comprehend the AI system's decision or decisions?
For evaluating fairness:
Images are not included in the reading sample.
Online at the Publications Office of the European Union, you can get the complete evaluation checklist and questions for every category (ALTAI, 2020). Registered users can access it as an online tool as well. It is crucial to remember that the list was created with AI in mind generally and needs to be customised for each unique application area, including healthcare.
As far as we know, an Australian multidisciplinary research team produced the first AI in healthcare self-assessment checklist in 2021. Its goal was to assist medical professionals in determining whether algorithms are prepared for routine care and to identify any areas that may require additional development and optimisation prior to implementation (Scott et al., 2021). This list was compiled using a few narrative reviews on artificial intelligence in healthcare, which were condensed into a series of evaluation questions grouped into ten broad categories, as shown in Box 3.Box 3 – Questions from the assessment checklist for medical AI tools, as shown in Scott et al., 2021
Images are not included in the reading sample.
The assessment checklist for general AI created by the AI HLEG is more detailed than this self-assessment list, though. Point 10 in Box 3, for instance, is a little ambiguous but does allow one to identify the precise ethical, legal, or social issue (e.g. algorithmic prejudice). Combining the two methods appears to result in a comprehensive and standardised risk assessment checklist for artificial intelligence in healthcare that is produced by consensus and has a thorough set of assessment questions for every risk category.
This is what spurred a network of EC-funded research projects and international multidisciplinary experts to produce consensus recommendations for reliable AI in medicine recently. Titled FUTURE-AI (www.future-ai.eu), these guidelines consist of specific recommendations and a self-assessment checklist to help AI designers, developers, evaluators, and regulators create reliable and moral AI solutions for healthcare and medicine. They are arranged based on six principles (Fairness, Universality, Traceability, Usability, Robustness, Explainability) (Lekadir et al., 2022). Examples of risk assessment questions from the FUTURE-AI self-evaluation checklist are provided in Box 4.
Fairness: Worked with a varied group of stakeholders to create your AI algorithm? Did you gather end-user needs from a variety of sources?
Has fairness been defined for your particular AI application? Have you questioned doctors about potential hidden causes of data imbalance?
Have you assessed your AI algorithm's fairness in-depth? Did you employ appropriate measurements and datasets?
Universality: Did you annotate your dataset in a methodical, repeatable, and consistent manner?
Did you evaluate the performance of your model using measurements and criteria that are universal, transparent, comparable, and repeatable?
Did you test your model on a minimum of one publicly available benchmark dataset that reflects the task of your model and the anticipated real-world data exposure upon deployment?
Traceability: Did you compile comprehensive dataset documentation? Have you added the necessary metadata?
Did you maintain an organised record of the entire input data pre-processing pipeline? Did you describe your pre-processing and data preparation processes' input/output, nature, prerequisites, and requirements?
Have you documented every detail of the training procedure? Have you provided a thorough explanation of input predictors?
Usability: Were users involved in the creation and design of the AI tool?
After integrating your product into the clinical sites' workflows, did you assess its usability?
Robustness: Did you use diverse datasets from various healthcare centres and data processes to train and assess your tools?
Have you tested the AI tool in a variety of real-world situations?
Did you look for any possible anomalies or deviations in the input data using any quality control methods?
Explainability: Did you discuss which explainability techniques work best with the clinicians?
Did you assess the robustness and reliability of the explanations using any quantitative evaluation tests? Have you conducted any qualitative assessment tests with medical professionals?
It has also been argued that AI risk assessment has to be further tailored to particular medical areas. For instance, a number of well-known radiological groups in North America and Europe.
(American College of Radiology, European Society of Radiology, Radiological Society of North America, Society for Imaging Informatics in Medicine, European Society of Medical Imaging Informatics, Canadian Association of Radiologists, and the American Association of Physicists in Medicine) came together to release a statement on the ethical challenges of using AI in radiology.
They stated that 'the radiology community should start now to develop codes of ethics and practice for AI which promote any use that helps patients and the common good' (Geis et al, 2019).
The assessment checklists presented in this section have used different categories of risks, as well as different assessment questions. Standardising, adjusting and validating these approaches through consensus by professional societies and independent groups on a domain-by-domain basis (e.g. radiology vs. surgery) would result in more robust processes for risk identification and management. Furthermore, as more and more healthcare AI algorithms will undergo self-assessment for ethical, legal and technical risks, these checklists should be regularly refined and updated versions will be released for the community taking into account continuous developments in AI methods, processes and regulations.
Risk identification through comprehensive, multi-faceted clinical evaluation of AI solutions
To identify, anticipate and manage risks in medical AI, adequate procedures for evaluating the AI models are of central importance. The majority of AI evaluation to far has been testing the robustness and accuracy of models in controlled environments. Some elements of AI are harder to assess in controlled contexts and have gotten less attention in the scientific literature. These include clinical safety and effectiveness, fairness and non-discrimination, transparency and traceability, privacy and security, and so on.
The United States Food and Drug Administration (FDA) recognised the gaps in the market and in 2021 proposed an action plan to close these gaps by strengthening agency oversight of medical AI software and promoting'regulatory science efforts to develop methodology for the evaluation and improvement of machine learning algorithms' (FDA, 2021). Simultaneously, a number of research teams—particularly those in North America (Larson et al., 2021; Park et al., 2020), Europe, and Asia (Park & Han, 2018), as well as global organisations like the International Association of Medical Informatics (Magrabi et al., 2019)—have looked into and suggested fresh methods for enhanced and thorough evaluation of medical AI algorithms. As shown in Figure 11, we will condense their findings into a set of five key suggestions in this part to facilitate a thorough and multifaceted assessment of upcoming AI software in the healthcare industry.
Figure 11: Suggestions for enhancing the assessment of algorithmic risks and performance in medical artificial intelligence
Images are not included in the reading sample.
standardised terminology for clinical duties
Stanford University researchers have recently recommended standardising the characterization of the clinical tasks that the AI algorithms are addressing in order to facilitate the objective and comparable evaluation of medical AI solutions (Larson et al., 2021). A clinical task, like medical diagnostics, can be defined in a variety of ways in practice. For instance, various strategies have been proposed for the diagnosis and reporting of COVID-19 severity based on chest imaging scans (Larson et al., 2021). These schemes include:
Two groups: The presence or absence of the disease is labelled by the radiologist.
The Radiological Society of North America (RSNA) (Simpson et al., 2020) suggested the following four categories: (1) typical; (2) indeterminate; (3) atypical appearance; and (4) negative for pneumonia.
Six CO-RADS scale-based categories (Prokop et al., 2020): (1) Negative; (2) Low;
(4) high, (5) very high, (6) PCR +, and (3) uncertain.
There are several ways to score the severity of lung lesions: (i) a severity rating ranging from 0 to 4 for each of the six lung zones, giving a total score between 0 and 24; (ii) a severity rating ranging from 0 to 5 for each of the five lung lobes, giving a total score between 0 and 25; (iii) a severity rating ranging from 0 to 7 for each of the five lung lobes, giving a total score of 35.
An AI-based algorithm could integrate any of these diagnostic methods, which makes it more challenging to evaluate the algorithm's effectiveness and related hazards objectively. Because there are different definitions, this also makes it more difficult to compare AI-based algorithms that were created for the same clinical job directly. Clinical task descriptions have, up until now, usually been created with minimal supervision and coordination. The definitions, which are a component of the AI software specifications, should be created in accordance with widely accepted consensus-based standard-setting principles and maintained by nonconflicted entities dedicated to updating the definitions based on new evidence and input from pertinent stakeholders, as these clinical tasks will increasingly be performed based on AI algorithms developed by non-clinical developers. The standardisation of the clinical responsibilities for medical AI in their various domains could be greatly aided by medical associations such as the European Society of Cardiology, European Society of Radiology, or European Society for Medical Oncology. This strategy will assist ensure that AI solutions are generally accepted by relevant stakeholders by limiting the duty of the developers to enhancing the performance of the AI algorithms based on widely agreed and used reference diagnostic task specifications.
multifaceted performance assessment that goes beyond precision
In light of the various hazards and moral dilemmas associated with medical artificial intelligence, it is now well acknowledged that the assessment of these algorithms needs to go much beyond current methods that have mostly concentrated on model correctness. There is a need to establish particular AI performance domains for the healthcare industry, even though researchers continue to disagree on the empirical evaluation of machine learning algorithms. Examples of performance components that have been recently suggested for AI-based radiology diagnostic algorithms are displayed in Table 2 (Larson et al., 2021). In addition to classification accuracy, these also include monitorability, usability, dependability, transparency, and application (see Table 2).
Images are not included in the reading sample.
Examples of imaging AI algorithm performance elements are shown in Table 2 (from Larson et al., 2021).
Nevertheless, it seems that this list is lacking, since certain significant problems associated with AI in healthcare—like algorithmic bias and inequality—have not been taken into account. One recent study that assessed the state-of-the-art deep neural networks on large public chest X-ray datasets with respect to patient sex, age, race, and insurance type—the latter serving as a proxy for socioeconomic status—is worth mentioning among the few that have directly investigated AI fairness in medicine (Seyyed- Kalantari et al, 2020). The study's conclusion was that "large dataset-trained models do not naturally provide equality of opportunity; if deployed without modification, this could lead to potential disparities in care." In this work, the authors used the so-called true positive rates (TPR) as a fairness metric; however, alternative standards, including statistical parity, group fairness, equalised odds, and predictive equality, have also been put forth in the literature (Barocas et al., 2017).
Clinical usability is another component of medical AI that has been suggested for validation with end users, given the existing lack of literacy and trust in AI. The AI algorithm and its visual interfaces should enable the operator to intuitively know how to use the tool with as little training as possible, to impose the least amount of cognitive workload on the user, and to enhance clinical efficiency by decreasing decision-making time in order to maximise clinical acceptance, perceived utility, and future adoption. Questionnaires can be used in usability assessments to collect both qualitative and quantitative data regarding user satisfaction with the AI product (Lewis, 2018). For example, the researchers in Tanguay-Sela et al. (2020) employed certain usability questions, as shown in Box 5, to evaluate the usability of an AI-powered system for depression therapy.
Images are not included in the reading sample.
Additional usability elements that could be assessed through a usability questionnaire are: the degree to which patients and clinicians comprehend diagnosis; the degree to which patients and clinicians comprehend treatment options; the degree to which patients and clinicians perceive the quality of communication between patient and doctor; the degree to which clinicians can interpret the AI-driven predictions; the degree to which patients and clinicians understand technical terminology; the usefulness of error messages/alerts; the overall ease of use; the effect on the productivity of clinicians; and so forth.
An AI's validation as accurate, dependable, equitable, and user-friendly does not guarantee that patients would gain from it. In order to verify clinical utility and facilitate the acceptance and recommendation of AI technology by clinical specialists, academic societies, or independent third-party organisations, researchers from South Korea proposed evaluating the influence on patient outcomes (Park & Han, 2018). Given the significant investments made in medical research, a systematic assessment of cost-efficiency must to be carried out in addition to proving clinical effectiveness. AI that simply assumes the cost savings and efficiency gains that are promised. To determine whether the expenses of further AI solutions are justified given the modelled effect, such as on health-related quality of life (QALY, or quality-adjusted life years), for instance, economic evaluations employing decision analytic modelling (Hill et al., 2020) might be utilised. Crucially, the cost-effectiveness study must account both the upfront expenditure and ongoing expenses associated with a certain AI infrastructure and service (Wolff et al., 2020). Lastly, because AI algorithms learn more over time as new data become available, it's critical to modify current validation frameworks to allow for ongoing performance monitoring of the AI tool in the clinical setting.
dividing the assessment procedure into distinct stages.
A few papers have lately suggested applying a multi-stage strategy in which generated algorithms undergo multiple steps of review with various goals and growing complexity, rather than evaluating medical AI solutions in a single operation. For instance, four stages (phases I through IV) were suggested for the validation of AI in the field of diagnostic imaging.
Testing for feasibility, capability, efficacy, and durability are the first four (Larson et al., 2021). (Fig. 12):
Phase I: Practicality The objective is to carry out a first/pilot evaluation of the algorithm in a lab setting with optimal parameters, usually using one small test dataset. In this phase, results will be directly compared with those received from skilled physicians or with algorithms that currently handle the same clinical activity. Since the only objective at this point is to determine viability, the AI algorithms do not need to be completely robust. Even though the algorithm hasn't been proven to work in a clinical setting yet, the final results might be published in a journal.
Phase II: Capability: The objective of this phase is to replicate real-world circumstances in a lab setting and assess, as well as adjust, the AI algorithm in order to improve its capabilities. The stage may also be known as virtual clinical trials (Abadi et al., 2020) or in-silico validation (Viceconti et al., 2021), which refers to computer simulation. Reliability can be examined during this stage by mimicking the input data and the potential clinical settings for use. The efficacy of the algorithm in mitigating potential harm will be assessed by safety tests, which will include simulate unforeseen scenarios. Furthermore, end users—especially operators and clinicians—should be involved in the implementation of this phase. To assess their actions and choices in light of the AI algorithm's outputs and simulated circumstances.
Phase III: Effectiveness: At this point, local validations are carried out at certain clinical sites and the validation is transferred to the clinical setting to evaluate performance in the real world. Verifying that the algorithm performs as well in the test environment as it does in the real world is the main goal. The AI algorithm will be updated and optimised based on all comments and results from this phase. It will be evaluated again in the controlled environment, just like in earlier phases, before undergoing another round of local clinical evaluation. Clinic evaluation stages can highlight regional flaws with quality control, and AI vendors need to collaborate with regional healthcare locations to address these concerns.
Phase IV: Sturdiness In order to achieve continuous improvement, the manufacturer needs now set up a system for continuing performance assessment and monitoring. Within their AI solution, they might incorporate monitoring or auditing systems to automatically identify, fix, and report faults as well as gather user and clinical input. Furthermore, before being re-used in the clinic, the AI algorithms should be updated and improved based on the mistakes and issues found over time. One way to do this is by using new training data. After that, they should be retested in a controlled setting.
Images are not included in the reading sample.
An illustration of a multi-phase method for evaluating medical AI is shown in Figure 12
As shown in Table 3, researchers from IBM Research have suggested a different partition of the assessment procedure by using comparisons from the testing and drug discovery industries (Park et al., 2020).
Table 3: Snippets of the segmented medical AI evaluation process based on procedures used in the pharmaceutical industry (Park et al., 2020)
Images are not included in the reading sample.
Even so, there are similarities between the two divisions of the medical evaluation process that are shown in this section (see Figure 12 and Table 2 for examples of imaging AI algorithm performance factors from Larson et al., 2021). Separating the environments and populations in which the algorithm is tested—small datasets to show feasibility, simulated environments to test robustness to contextual changes, and clinical settings to show real-world applicability—is the main goal of the first subdivision (Figure 12). The second method (Table 2) is more focused on each step on a specific risk and clinical component, such as safety, effectiveness, usability, and efficacy, but it does not always divide the testing environments (medical settings are employed in both phases 2 and 3).
Each of these stages in both multi-stage evaluation approaches is contingent upon the previous step's successful completion, hence minimising expenses. For instance, algorithms are practically guaranteed to perform poorly in the actual world if they perform poorly in a controlled context. These multi-stage and multifaceted evaluation studies are promising because they take into account the complexity of AI-guided healthcare delivery, which is amplified by user- and context-dependent applications, even though they still need to be further developed and adopted by the relevant stakeholders.
Encouragement of external assessments conducted by independent assessors
Internal validation is the process of assessing an AI model's performance using datasets that are comparable to those used for model development and training. Due to its simplicity of implementation, this approach of algorithm validation was the most frequently reported in the early days of medical AI. Internal validation, however, is limited in its ability to identify all risks associated with changes in the data or clinical environment. It is likely to be biassed and overestimate performance, even when done by developers and manufacturers with a culture of quality and good practices of excellence in medical AI. Over 500 research publications in the field of radiology were reviewed for a 2019 study. Kim et al. (2019) discovered that a mere 6% of the AI algorithms that were presented had undergone an external examination. Thus, a number of academics and public figures have suggested in recent years that the external assessment of AI algorithms in healthcare be encouraged (Park & Han, 2018; Larson et al., 2021).
When AI technologies are evaluated using entirely different, external datasets, this is referred to as external validation. The population's variability and the use of the AI solution should be well-represented by the external datasets. To assess the generalizability of a specific AI algorithm outside of the controlled context in which it was developed, such data should preferably originate from various clinical sites and geographic areas. This method will allow for the evaluation of the AI system, for instance, in situations when the technical characteristics of the data acquisition differ (e.g., differences in imaging scanners and protocols between hospitals). In addition, numerous researchers have suggested. Shared reference datasets obtained from representative real-world populations are used to benchmark and assess AI models outside. These reference datasets allow for easy comparison with related algorithms that have been assessed using the same reference dataset in the past. For instance, the Cancer Imaging Archive (www.cancerimagingarchive.net), established by the National Cancer Institute in the United States in 2010, is a vast collection of cancer imaging collections spanning all cancer types that is frequently and extensively used for external validation and algorithm comparison.
The European Commission has lately financed several research efforts, like the EuCanImage project (https://eucanimage.eu), to create European repositories of reference cancer imaging datasets. Third-party assessors should preferably be used to conduct external validation. Make that the AI algorithm is thoroughly and objectively evaluated in accordance with the performance standards—accuracy, dependability, equity, and usability—described in the preceding section. Research labs, clinical research groups, and independent organisations that create and manage reference standard data sets are a few examples of such third-party evaluators. The highest standards, quality, and objectivity in the assessment and monitoring of AI solutions in healthcare would be made possible by these testing firms' specialisation, which would lower hidden dangers and boost confidence in medical AI for practical use. It is noteworthy that Testing and Experimentation Facilities (TEF) are being developed in Europe as part of new research programmes being prepared by DIGITAL EUROPE. Once formed, TEFs would substantially facilitate external validation of medical AI tools, particularly for enterprises.
Standardised and thorough reporting of the AI assessment process and outcomes
Transparent reporting and documentation of the validation process is crucial to further improving the AI tools' usability and trustworthiness. For developers, researchers, and other stakeholders, this kind of reporting will make the critical evaluation process easier. If needed, it should also aid in replicating the AI algorithm and its outcomes. The need for comprehensive and standardised reporting guidelines for predictive models used in healthcare was recognised by researchers prior to the widespread use of artificial intelligence. One such guideline is TRIPOD, or Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (Collins et al., 2015). Following its initial publication in 2015, the TRIPOD statement was widely embraced by the biomedical community. TRIPOD offers instructions on how to properly report a prediction model's development so that its possible bias and utility can be evaluated. 2015's Collins et al. In specifics, the TRIPOD statement has a checklist of 22 criteria that are thought to be necessary for open reporting of a prediction model study, as shown in Box 5.
Images are not included in the reading sample.
Box 5 – Essential items to be included when reporting a prediction model, according to TRIPOD
While the primary goal of TRIPOD is to enhance reporting, it also makes it easier to comprehend and analyse prediction models more thoroughly. This guarantees that the models may be further examined and utilised to direct healthcare delivery, improving clinical translation, trust, and repeatable research. The TRIPOD declaration has not been widely adopted by the AI communities, despite the fact that many of its features are naturally suited to prediction model studies utilising machine learning techniques. Potential explanations for the low adoption rate could be minor variations in language or a sense of inapplicability because TRIPOD, non its original form, concentrated on regression-based prediction model techniques rather than machine-learning-based ones. An addition to TRIPOD dedicated to health prediction models was made in response to more AI-specific reporting criteria.
Images are not included in the reading sample.
The work done by the CONSORT consortium (Consolidated Standards of Reporting Trials) is another example of reporting and validation criteria. With their CONSORT-AI declaration, they have expanded their 2010 reporting rules to include AI-specific components. The extended CONSORT-AI statement suggests that researchers 'provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human–AI interaction and provision of an analysis of error cases' (Liu et al, 2020), although the original guidelines recommended including elements like title, trial design, participants, interventions, outcomes, and sample size. In addition to the elements listed in the 2010 CONSORT standards, the CONSORT-AI extension lists additional AI-specific items to be used in the reporting process, as seen in Box 6 (Liu et al., 2020).
CONSORT-AI guidelines' Box 6: Reporting components for medical AI in
clinical trials
MINMAR (MINimum Information for Medical AI Reporting) is a new set of
guidelines for reporting AI solutions in healthcare that researchers at
Stanford University developed (Hernandez- Boussard et al., 2020). The
minimal data required to comprehend target populations, evaluation
procedures, model architecture, intended predictions, and hidden biases
is outlined in the MINMAR standards. As indicated in Table 4, the
MINMAR rules are specifically tailored for medical AI and consist of
reporting items in four primary categories.
Reporting components from the MINMAR reporting standards are included in
Table 4.
Images are not included in the reading sample.
By compiling all the important data from the AI evaluation studies into a single comprehensive document and helping publishing editors, AI developers, physicians, and researchers comprehend, interpret, and critically evaluate the calibre of the AI study design, validation, and outcomes, such a reporting model for medical AI evaluation will foster openness, thoroughness, and trust.
Policy alternatives
In order to better create, assess, implement, and utilise technically, clinically, and morally competent AI solutions in future healthcare, this section outlines seven policy possibilities (Figure 13).
Figure 13 – Summary of policy options suggested in this report
Images are not included in the reading sample.
Expand AI codes of conduct and regulatory frameworks to handle risks and regulations unique to the healthcare industry.
As delineated in Section 4.1, the MDR and IVDR regulations, which came into effect in 2017, govern medical AI devices as of right now. A new regulation pertaining to artificial intelligence (AI) was also proposed by the European Commission (EC) in 2021. It imposes additional obligations and requirements on high-risk applications, such as medical AI technologies, including the establishment and implementation of quality management systems in organisations, conformity assessment and possible reassessment of AI systems (should they undergo significant modification), and post-market monitoring.
Although the new framework has been developed for AI technologies in general, medical AI tools are deemed high risk and must be subject to more scrutiny. However, the requirements are laid out in a generic way, despite the fact that AI in healthcare comes with unique, highly risky technical, clinical, and socio-ethical issues, as this paper demonstrates.
Therefore, it is critical that codes of conduct and regulatory frameworks for medical AI be expanded upon and implemented (as outlined in sections 4.2 and 4.3). A number of countries, including the United States (Harvey & Gowda, 2020; Allen, 2019), Japan (Chinzeiet al., 2018; Ota et al., 2020), and China (Robertset al., 2020), have expressed the necessity for updating the regulatory approvals of AI-driven medical devices. In particular, the Artificial Intelligence and Machine Learning (AI/ML) Software as a Medical Device Action Plan (FDA, 2021) published by the U.S. Food & Drug Administration (FDA) in 2021 calls for patient-centered approaches, good machine learning practices, and regulations specifically designed for medical AI.
Multifaceted risk assessment, as described in section 4.2, ought to be a crucial component of the medical AI development and certification process in order to customise current frameworks and AI techniques for the medical domain. Additionally, risk assessment needs to be domain-specific because different fields—such as radiology, surgery, genomics, mental health, child health, and home care—have different clinical, social, and ethical risks and restrictions.In order to evaluate and detect multifaceted dangers and limitations, the validation of medical AI technologies should be standardised and enhanced. This will involve examining not just model accuracy and robustness, but also algorithmic fairness, clinical safety, clinical acceptance, transparency, and traceability.
The introduction and generalisation of third-party external validation by independent bodies specialised in this process is a significant recommendation (shown in section 4.3) for enhanced medical AI validation and certification. This will make it possible to validate medical AI technologies more expertly and objectively while methodically accounting for the variations seen in socio-ethical settings and actual clinical practices.
Encourage collaboration and multi-stakeholder involvement at every stage of the development of medical AI algorithms.
Many stakeholders outside AI developers, including clinicians, patients, social scientists, healthcare managers, and AI regulators, will be crucial to the acceptance and practical application of medical AI systems in the future. Therefore, new strategies are required to ensure that AI tools are fully aligned with the variety of real-world needs and circumstances and to foster inclusive, multi-stakeholder engagement in medical AI.
According to Leonet al. (2021), AI manufacturers should therefore base the development of future AI algorithms on co-creation, which entails close and ongoing cooperation between AI developers and clinical end users as well as with other pertinent experts like biomedical ethicists. These partnerships ought to exist from the conception and development phases onward.
AI algorithms that better represent the requirements and cultures of healthcare professionals can be designed by integrating human- and user-centered techniques across the whole AI development process. This will also help to identify and mitigate any hazards early on. This will reorient the focus on maximising end-users' clinical performance and the public's health advantages while taking current social, ethical, and legal requirements into account.
Future medical AI algorithm implementations will closely analyse the anticipated interactions between end-users and the algorithms (also known as human-computer interaction) through active user involvement (Xu, 2019). To enable human-centered and clinically meaningful presentations of explanations for the machine learning model predictions in healthcare, visual interfaces should be carefully created based on requirements from the clinical end-users (Barda et al., 2020). This would enhance the explainability and acceptability of the predictions and decisions made by AI and enable a reduction in human error.
Lastly, multi-stakeholder engagement and co-creation will address particular social issues pertaining to equity, equality, and fairness. These issues are application-specific and call for knowledge of the clinical tasks, potential confounding variables, and pertinent group differences; therefore, ongoing cooperation between the domain experts, healthcare professionals, social scientists, and members of the real community—especially those from underrepresented groups—is essential.
Create an AI passport and traceability mechanisms for enhanced transparency and trust in medical AI
To improve AI systems' lifetime transparency, new strategies and procedures are required. Transparency, including but not limited to documenting the entire AI development process, is essential to being able to comprehend the specifics of what has happened when something goes wrong in the clinical implementation of medical AI. This kind of documentation and transparency helps eliminate potential ambiguities and lack of accountability (Felzmann et al, 2020).
An 'AI passport' for the uniform description and traceability of medical AI tools is one possibility that regulatory authorities for medical AI may implement (see illustration in Figure 14). At least five areas of information should be covered by such a passport, which should also monitor and characterise important AI technology information:
Information about the model, such as its creators, reviewers, and owners, as well as its intended therapeutic applications, relevant licences, algorithmic specifics, hyper-parameters, essential presumptions, and requirements.
Data-related information (data types, such as images, real vs. simulated datasets, training vs. testing data, and data origins).
Information pertaining to evaluation (accuracy, resilience, biases, constraints, and extreme situations of the model).
Information pertaining to utilisation (such as statistical distributions, disagreements with physicians, failures that have been detected, memory usage, etc.).
Information about maintenance (latest updates, software versioning, dates, last periodic evaluation, etc.).
Standardising the AI passport will allow for uniform traceability between nations and healthcare institutions.
Images are not included in the reading sample.
Figure 14 shows an example of a potential AI passport that could be used to record all relevant information about the AI tools, their intended use, model and data details, evaluation findings, and data from ongoing monitoring and auditing. This could help to increase traceability and transparency in medical AI.
Moreover, medical AI is a very dynamic technology that constantly incorporates new users, devices, and data into its processes. Thus, it is evident that the idea of traceability needs to encompass more than just recording the stages of development and testing an AI model; rather, it should also include the process of keeping an eye on and maintaining the system or AI model in the real world by continuously observing its performance after being implemented in a clinical setting and spotting any errors or deviations from expected behaviour (Lekadir et al, 2022). Therefore, it is crucial that the algorithms be created in tandem with live interfaces that will be used for ongoing monitoring and auditing of the AI tools following their implementation in the relevant clinical setting. A human-in-the-loop mechanism to allow for human oversight and feedback, an alert system to notify clinicians of suspected deviations from previous states or performance degradation (e.g., when new equipment or protocol is introduced), and a periodic evaluation system that can be configured to indicate reference test datasets and periodicity of the evaluations (e.g. monthly vs. quarterly) are all important components of a monitoring tool. Create frameworks that help medical professionals monitor and define accountability.
Accountability continues to be a pressing issue in the field of AI, especially in the high-stake areas of medical AI. This is a particularly significant issue when taking into account scenarios when an AI-based healthcare technology used in actual clinical settings malfunctions, makes mistakes, or has unanticipated adverse effects (Geis et al., 2019). In order to properly assign accountability to all parties involved in the AI workflow in medical practice, including the manufacturers, frameworks and procedures are required. This will encourage the application of best practices and all available precautions to reduce errors and patient harm. Future medical AI products should be subject to the same standards that currently guide the development, assessment, and commercialization of medications, vaccines, and medical equipment. Above all, uniform legal frameworks are required in Europe and beyond to establish accountability and culpability as well as to impose suitable penalties in the field of medical AI. The GDPR, out of all the regulations in place, provides a two-pronged strategy for algorithmic accountability: first, it addresses the matter from the standpoint of individual rights; second, it addresses it from the standpoint of systemic regulatory frameworks (Kaminski & Malgieri, 2019). Specifically, the GDPR makes lawfulness and transparency essential components of the accountability principle (Art. 5 para. 1(a) GDPR), establishing transparency as a fundamental norm for data processing (Art. 5 para. 2 GDPR). Nevertheless, several experts in the field have emphasised that while the GDPR is very flexible when it comes to defining the rights to data privacy and explanation, it is insufficient when it comes to defining algorithmic accountability in medical AI (Barocas, 2019). A single new regulatory agency for AI has been suggested by recognised leaders in the field as a solution to the unresolved legal gap concerning medical AI accountability (Tuut, 2017; Koene et al., 2019). It is expected that in 2022 the EC will propose EU-wide measures adapting existing liability frameworks to the challenges of AI in order to ensure that victims who suffer damages to their life, health or property from an AI technology have access to the same compensation as victims of other technologies (Communication to EU Parliament, 2021). This may include a revision of the Product Liability Directive (Council Directive, 1985) and may require sectorial adjustments such as for AI in healthcare.
Periodic audits and risk assessments, which can be used to determine how much regulatory monitoring a particular AI tool would require, are a crucial step in improving the accountability of AI tools in healthcare (Kaminski & Malgieri, 2019; Reisman et al., 2018). In order to do this, evaluations must be carried out along the entire AI pipeline, including data gathering, development, pre-clinical phases, deployment, and tool usage. As mentioned in the preceding section, future AI systems should provide a mechanism for ongoing monitorability and traceability across time in addition to maintaining an archive of AI-based choices. Artificial intelligence decision-making systems might be held to the same standards as human processes by using audits to evaluate fairness, transparency, accuracy, and safety. While many researchers and civil rights activists advocate for external audits by independent auditing organisations, other businesses and agencies rely primarily on internal auditing procedures.
Launch educational initiatives to improve public literacy and the abilities of healthcare workers.
Future medical professionals must receive sufficient training in this new technology, including its benefits for enhancing care, quality, and access to healthcare, as well as its drawbacks and hazards, in order to maximise acceptance and reduce error (Paranjape et al., 2019). Therefore, it's time to modernise medical education programmes and broaden their interdisciplinary focus. Specialised lectures and hands-on workshops should be included, and they should smoothly include the implications of medical AI for clinical practice in the future (McCoy et al., 2020; Rampton et al., 2020). Additionally, there is a pressing need to raise public awareness of AI in order to better equip patients and citizens to take advantage of new medical AI tools while reducing the possibility of misuse, particularly in the context of remote monitoring and care management. Some nations, like Finland, where the University of Helsinki offers the free "Elements of AI" course (www.elementsofai.com), have already made investments in offering public literacy courses on AI. Encourage more study on medical AI's technological, ethical, and clinical resilience.
The report highlights several risks associated with medical AI and emphasises the need for additional research and development to fully realise its potential while addressing its current clinical, socio-ethical, and technological limitations. These risks come despite significant advancements in AI and machine learning in recent years, as well as in their applications to healthcare and medicine. Future research should focus on issues such as explainability and interpretability, bias prevention and estimation, and secure and privacy-preserving artificial intelligence.
The field of explainable AI studies a new class of AI algorithms that are comprehensible to humans, including medical AI users and physicians. Recent years have seen a surge in interest in it, leading to the development and testing of several strategies. The intricacy and unpredictability of clinical and biomedical data, however, make explainable AI in healthcare extremely difficult to implement, and current approaches have not yet made it into clinical settings. Assessing and ensuring that explainability approaches yield explanations that are clinically meaningful and well-received by end users is crucial to optimising their potential. As AI advances, interdisciplinary approaches are required, and these approaches must begin with an analysis of the demands of the physicians.
Methods have already been studied (Li & Vasconcelos, 2019; Zhang et al., 2018) to explicitly reduce the presence of undesired bias in the data, and certain open-source toolkits have already been released, such as those from Microsoft (Fairlearn (Bird et al., 2020)) and IBM (AI Fairness 360). Still, there are a lot of unanswered questions around the detection of biases, particularly implicit and hidden biases. To reduce bias and improve the fairness of medical AI algorithms, multidisciplinary research and greater diversity in AI development, healthcare, and policy teams are necessary to address qualitative biases such the cognitive biases of doctors creating, interpreting, or annotating the data.
Additional research is required to create adaption strategies that guarantee future AI products will be highly generalizable across demographic groupings, clinical settings, and geographic regions. Furthermore, it's critical to create new validation platforms that can accurately and fairly evaluate AI algorithms in terms of sex/gender, age, race, ethnicity, socioeconomic status, and other sociodemographic factors.
Additionally, uncertainty estimation—a relatively recent field of study that attempts to give clinicians clinically valuable signals on the degree of confidence in AI predictions—should be integrated into future AI solutions for healthcare (Kompa et al., 2021). Whenever there is a significant degree of uncertainty in a particular forecast, the practitioner should ideally be alerted or warned. In the future, the AI system could advise the clinicians on the best course of action to enhance the AI predictions (such as adding more lab tests and predictors, rescanning the patient), as well as explain why there is so much uncertainty in the data (e.g., poor quality image scans, insufficient evidence).
Last but not least, existing cyberattacks on medical AI technologies are still hard to identify since, even though the tools themselves might keep working well, the conclusions the AI system will confidently offer will be false. To create, validate, and use medical AI tools that can defend against security and privacy threats, more research is required. As a result, a new generation of AI algorithms will be developed that can be confidently and robustly utilised in their real-world settings.Put in place a plan to close the medical AI gap in Europe.
Even if the EU has recently made large expenditures in AI, disparities in the development of AI technology still exist among the various European nations (Caradaica, 2020). The structural disparities in research programmes and technological capabilities, as well as the differing levels of investments from the public and commercial sectors, can be used to explain the AI gap, particularly between the Western and Eastern portions of the continent (Quaglio et al., 2020B).
Since advances and discoveries in medical AI heavily rely on access to enormous databases of carefully curated biomedical data as well as technological capabilities, the differences in AI development and implementation between EU countries are especially pronounced in this area. However, studies have indicated that there is a difference in life expectancy, maternal mortality, and other population health indicators between Eastern and Western Europe (Forster, 2018; The World Bank, 2019). These AI disparities may also worsen the already-existing health inequities and disparities throughout the EU.
In this regard, the member states of the European Union, especially those in Eastern Europe, should create specialised initiatives to boost AI in healthcare. Concrete steps to strengthen the economic, scientific, and technological capacities of developing EU nations in the area of AI for healthcare should be part of these. Member states with inadequate data availability and research infrastructures should prioritise establishing infrastructure projects. As a result, the EU as a whole would gain and improve much-needed capabilities in biomedical and health data sharing, storage, curation, and security (ECRIN, 2019). Additional initiatives should be launched to improve the clinical, industrial, and technological capacity of a number of European nations for the creation, evaluation, and use of cutting-edge AI tools in healthcare and medicine, such as high-performance computing, open cloud services.
The implementation of shared norms and approaches could be supported by the European Commission through the implementation of particular coordination and support programmes of actions carried out in this area by various Member States. The establishment of an inclusive European Health Data Space (EHDS) that carefully considers regional and national concerns throughout Europe should be ensured by such collaboration (Marschang, 2021). The training capacities and human capital in medical AI, particularly in emerging EU countries, should be improved by strengthening current education-focused programmes like the Marie-Curie training networks.
Finally, the differences in medical AI between the various European nations, particularly between Eastern and Western Europe, also mirror the more general social, economic, and health disparities throughout the continent. Reducing the disparity in medical AI throughout Europe calls for a strategy that addresses systemic inequality in European society through policy measures rather than only concentrating on the domains of medicine and/or artificial intelligence.
References
Abadi, E., Segars, W.P., Tsui, B.M., Kinahan, P.E., Bottenus, N., Frangi, A.F., Maidment, A., Lo, J. and Samei, E., 2020. 'Virtual clinical trials in medical imaging: a review', Journal of Medical Imaging, 7(4), p.042805.
Abdi J, Al-Hindawi A, Ng T, Vizcaychipi MP. 'Scoping review on the use of socially assistive robot technology in elderly care', BMJ Open. 2018;8(2):e018815.
Abràmoff, M.D., Lavin, P.T., Birch, M., Shah, N. and Folk, J.C.. 'Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices', NPJ digital medicine, 1(1), pp.1-8., 2018
Abràmoff, M.D., Tobey, D. and Char, D.S. 'Lessons learned about autonomous AI: finding a safe, efficacious, and ethical path through the development process,' American journal of ophthalmology, 214, pp.134-142, 2020
Adamson, A.S. and Smith, A., 'Machine learning and health care disparities in dermatology' JAMA dermatology, 154(11), pp.1247-1248, 2018.
Adedinsewo D, Carter RE, Attia Z, Johnson P, Kashou AH, Dugan JL, et al. 'Artificial Intelligence-Enabled ECG Algorithm to Identify Patients with Left Ventricular Systolic Dysfunction Presenting to the Emergency Department with Dyspnea'. Circ Arrhythmia Electrophysiol.;13(8), 2020
Adhikari L, Ozrazgat-Baslanti T, Ruppert M, Madushani RWMA, Paliwal S, Hashemighouchani H, et al. 'Improved predictive models for acute kidney injury with IDEA: Intraoperative data embedded analytics' PLoS One. 2019;14(4).
Ahn J, Connell A, Simonetto D, Hughes C and Shah VH. 'Application of Artificial Intelligence for the Diagnosis and Treatment of Liver Diseases,' Hepatology. 2021;73(6):2546-2563.
Alder, S. 'AI Company Exposed 2.5 Million Patient Records Over the Internet', HIPPA Journal. 21 August 2020.
Allen M, Pearn K, Monks T, Bray BD, Everson R, Salmon A, James M, Stein K. 'Can clinical audits be enhanced by pathway simulation and machine learning? An example from the acute stroke pathway', BMJ Open. 2019;9(9):e028296.
Allen, B., 'The role of the FDA in ensuring the safety and efficacy of artificial intelligence software and devices', Journal of the American College of Radiology, 16(2), pp.208-210, 2019.
Almirall, D., Nahum-Shani, I., Sherwood, N.E. and Murphy, S.A., 'Introduction to SMART designs for the development of adaptive interventions: with application to weight loss research', Translational behavioral medicine, 2014, 4(3), pp.260-274.
Alsharqi M, Woodward WJ, Mumith JA, Markham DC, Upton R, Leeson P. 'Artificial intelligence and echocardiography', Echo Res Pract. 2018;5(4):115–25.
Aminololama-Shakeri S, Lopez E. 'The Doctor-Patient Relationship With Artificial Intelligence', American Journal of Roentgenology. 2019;202(2)
Andrew D Selbst, Julia Powles., 'Meaningful information and the right to explanation', International Data Privacy Law, Volume 7, Issue 4, Pages 233–242, 2017
Angus DC. 'Randomized clinical trials of artificial intelligence', JAMA. 323(11):1043-1045, 2020.
Arora A. 'Conceptualising Artificial Intelligence as a Digital Healthcare Innovation: An Introductory Review', Med Devices (Auckl). 3:223-230. doi: 10.2147/MDER.S262590, 2020.
Attia ZI, Friedman PA, Noseworthy PA, Lopez-Jimenez F, Ladewig DJ, Satam G, et al. 'Age and Sex Estimation Using Artificial Intelligence from Standard 12-Lead ECGs', Circ Arrhythmia Electrophysiol. 12(9), 2019
Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ, et al. 'An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction', Lancet. 394(10201):861–7, 2019.
Azzi, S.; Gagnon, S.; Ramirez, A.; Richards, G. 'Healthcare Applications of Artificial Intelligence and Analytics: A Review and Proposed Framework', Appl. Sci., 10, 6553. https://doi.org/10.3390/app10186553, 2020.
Baetan, R., Spasova, S., Vanhercke, B., Coster, S., Inequalities in access to healthcare: A study of national policies, European Commission, 2018.
Bandivadekar, S.S., 'Online Pharmacies: Global Threats and Regulations', AAYAM: AKGIM Journal of Management, 10(1), pp.36-42, 2020.
Barda, A.J., Horvat, C.M. and Hochheiser, H., 'A qualitative research framework for the design of user- centered displays of explanations for machine learning model predictions in healthcare.', BMC medical informatics and decision making, 20(1), pp.1-16, 2020.
Barocas, S. Legal and Policy Implications of Model Interpretability. This Week in Machine Learning and AI (TWiMLAI), January 2019. https://twimlai.com/twiml-talk-219-legal-and-policy-implications-of-model- interpretability-with-solon-barocas/.
Barocas, S., Hardt, M. and Narayanan, A., 'Fairness in machine learning', Nips tutorial, 1, p.2, 2017. BBC, 'Google DeepMind NHS app test broke UK privacy law', BBC News, 3 July 2017.
Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019;16(11):703-715.
Berlyand Y, Raja AS, Dorner SC, et al. How artificial intelligence could transform emergency department operations. Am J Emerg Med. 2018;36(8):1515-1517.
Bird, S., Dudík, M., Edgar, R., Horn, B., Lutz, R., Milan, V., Sameki, M., Wallach, H. and Walker, K., 'Fairlearn: A toolkit for assessing and improving fairness in AI,' Microsoft, Tech. Rep. MSR-TR-2020-32, 2020.
Birnbaum ML, Ernala SK, Rizvi AF, Arenare E, R Van Meter A, De Choudhury M, Kane JM. 'Detecting relapse in youth with psychotic disorders utilizing patient-generated and patient-contributed digital data from Facebook', NPJ Schizophr.;5(1):17, 2019.
Boniolo F, Dorigatti E, Ohnmacht AJ, Saur D, Schubert B, Menden MP. 'Artificial intelligence in early drug discovery enabling precision medicine', Expert Opin Drug Discov:1-17, 2021.
Campello, V. et al. 'Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation: The M&Ms Challenge.' Medical Image Computing and Computer Assisted Intervention, 2020.
The Cancer Imaging Archive. www.cancerimagingarchive.net, accessed November 2021.
Caplan, R., Donovan, J., Hanson, L. and Matthews, J., Algorithmic accountability: A primer. Data & Society, 18, 2018.
Caradaica, M., 'Artificial Intelligence and Inequality in European Union. Europolity-Continuity and Change in European Governance', 2020, 14(1), pp.5-31.
Challen, R., Denny, J., Pitt, M., Gompels, L., Edwards, T. and Tsaneva-Atanasova, K., 2019. 'Artificial intelligence, bias and clinical safety', BMJ Quality & Safety, 28(3), pp.231-237.
Chaudhuri S, Long A, Zhang H, Monaghan C, Larkin JW, Kotanko P, et al. 'Artificial intelligence enabled applications in kidney disease', Semin Dial. 34:5–16, 2021.
Chekroud AM, Zotti RJ, Shehzad Z, Gueorguieva R, Johnson MK, Trivedi MH, Cannon TD, Krystal JH, Corlett PR. 'Cross-trial prediction of treatment outcome in depression: a machine learning approach', Lancet Psychiatry.;3(3):243-50, 2016.
Chi AC, Katabi N, Chen HS, Cheng YSL. Interobserver variation among pathologists in evaluating perineural invasion for oral squamous cell carcinoma. Head Neck Pathol. 10, 451–464, 2016.
Chinzei, K., Shimizu, A., Mori, K., Harada, K., Takeda, H., Hashizume, M., Ishizuka, M., Kato, N., Kawamori, R., Kyo, S. and Nagata, K., 'Regulatory science on AI-based medical devices and systems', Advanced Biomedical Engineering, 7, pp.118-123, 2018.
Chung Y, Addington J, Bearden CE, Cadenhead K, Cornblatt B, Mathalon DH, McGlashan T, Perkins D, Seidman LJ, Tsuang M, Walker E, Woods SW, McEwen S, van Erp TGM, Cannon TD; North American Prodrome Longitudinal Study (NAPLS) Consortium and the Pediatric Imaging, Neurocognition, and Genetics (PING) Study Consortium. Use of machine learning to determine deviance in neuroanatomical maturity associated with future psychosis in youths at clinically high risk. JAMA Psychiatr,75(9):960-968, 2018.
Clay H, Stern R. 'Making time in general practice', Primary Care Foundation, 1–83, 2015.
Cohen G. 'Informed Consent and Medical Artificial Intelligence: What to Tell the Patient?' Georgetown Law Journal. (108), 2020.
Collins GS, Moons KGM. 'Reporting of artificial intelligence prediction models', Lancet 393: 1577–79, 2019.
Collins, G.S., Reitsma, J.B., Altman, D.G. and Moons, K.G. 'Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement', Circulation, 131(2), pp.211-219, 2015.
Communication from the Commission to the European Parliament, the European Council, the Council, the European Economic and Social Committee and the Committee of the Regions Fostering a European approach to Artificial Intelligence. April 2021.
Cook C, Sheets C. 'Clinical equipoise and personal equipoise: two necessary ingredients for reducing bias in manual therapy trials', J Man Manip Ther. 19(1):55-7, 2011.
Cook GJR, Goh V. What can artificial intelligence teach us about the molecular mechanisms underlying disease? Eur J Nucl Med Mol Imaging. 46:2715–2721, 2019.
Corredor G, Wang X, Zhou Y, Lu C, Fu P, Syrigos K, Rimm DL, Yang M, Romero E, Schalper KA, Velcheti V, Madabhushi A. Spatial Architecture and Arrangement of Tumor-Infiltrating Lymphocytes for Predicting Likelihood of Recurrence in Early-Stage Non-Small Cell Lung Cancer. Clin Cancer Res.;25(5):1526-1534, 2019.
Davenport, T.H., Barth, P. and Bean, R.,. How 'big data' Is different. MIT Sloan Management Review, 2012.
Dawoodbhoy FM, Delaney J, Cecula P, Yu J, Peacock I, Tan J, Cox B. AI in patient flow: applications of artificial intelligence to improve patient flow in NHS acute mental health inpatient units. Heliyon. 7(5):e06993, 2021.
De Fauw, J., Ledsam, J.R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Blackwell, S., Askham, H., Glorot, X., O'Donoghue, B., Visentin, D. and van den Driessche, G., 'Clinically applicable deep learning for diagnosis and referral in retinal disease', Nature medicine, 24(9), pp.1342-1350, 2018.
De Vries L, Baselmans B, Bartels M. 'Smartphone-Based Ecological Momentary Assessment of Well-Being: A Systematic Review and Recommendations for Future Studies', Journal of Happiness Studies. 22:2361– 2408, 2021.
Dijksterhuis A, Bos MW, Nordgren LF, van Baaren RB. On making the right choice: the deliberation- without-attention effect. Science. 2006; 311:1005e1007.
Directive, C., Council Directive 85/374/EEC of 25 July 1985 on the approximation of the laws, regulations and administrative provisions of the Member States concerning liability for defective products. Official Journal L, 210(07/08), pp.0029-0033, 1985.
Du, R., Lee, V. H., Yuan, H., Lam, K. O., Pang, H. H., Chen, Y., Lam, E. Y., Khong, P. L., Lee, A. W., Kwong, D. L.,
& Vardhanabhuti, V. 'Radiomics Model to Predict Early Progression of Non-metastatic Nasopharyngeal Carcinoma after Intensity Modulation Radiation Therapy: A Multicenter Study. Radiology', Artificial intelligence, 1(4), e180075, 2019.
Dusenbery, M. 'Everybody was telling me there was nothing wrong', The Health Gap, BBC News, 2018. www.bbc.com/future/article/20180523-how-gender-bias-affects-your-healthcare
Dwyer DB, Falkai P, Koutsouleris N. 'Machine learning approaches for clinical psychology and psychiatry', Annu Rev Clin Psychol, 14:91–118, 2018.
ECRIN, 'EFPIA, EATRIS, ELIXIR, BBMRI, ECRIN statement on the role of research infrastructures to boost patient-centred research and innovation in Europe', https://ecrin.org/news/efpia-eatris-elixir-bbmri- ecrin-statement-role-research-infrastructures-boost-patient-centred, 24 July 2019.
EGA Consortium (European Genome-Phenome Archive), https://ega-archive.org/datasets, 2021.
Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, et al. 'Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer' JAMA.;318(22):2199-2210., 2017.
Elements of AI. www.elementsofai.com, accessed November 2021.
Ellahham, S., Ellahham, N. and Simsekler, M.C.E., 'Application of artificial intelligence in the health care safety context: opportunities and challenges', American Journal of Medical Quality, 35(4), pp.341-348, 2020.
Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JP, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 11(2):e1001603, 2014.
Emanuel EJ, Wachter RM. 'Artificial intelligence in health care: will the value match the hype?' JAMA. 321(23):2281-2282, 2019.
EuCanImage, https://eucanimage.eu, accessed November 2021.
European Commission, 'A Proposal for Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts', April 2021.
European Commission. Digital Education Action Plan (2021-2027): Resetting Education for the Digital Age, 2020.
European Commission. Employment, Social Affairs & Inclusion Inequalities in access to healthcare. A study of national policies, 2018.
European Commission. The European Pillar of Social Rights in 20 principles. 2021.
European Genome-Phenome Archive, 'Browse datsasets', https://ega-archive.org/datasets, accessed November 2021.
European Health Data Space, https://ec.europa.eu/health/ehealth/dataspace_en, last access November, 2021.
Eurostat. Statistical expanded. Population structure and ageing, 2020.
Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S. and Dean, J. 'A guide to deep learning in healthcare'. Nature medicine, 25(1), 24-29, 2019.
Evans AJ, Henry PC, Van der Kwast TH, Tkachuk DC, Watson K, Lockwood GA, Fleshner NE, Cheung C, Belanger EC, Amin MB, Boccon-Gibod L, Bostwick DG, Egevad L, Epstein JI, Grignon DJ, Jones EC, Montironi R, Moussa M, Sweet JM, Trpkov K, Wheeler TM, Srigley JR. 'Interobserver variability between expert urologic pathologists for extraprostatic extension and surgical margin status in radical prostatectomy specimens', Am J Surg Pathol. 32(10):1503-12, 2008.
Farina, R. and Sparano, A. 'Errors in sonography. In Errors in radiology' (pp. 79-85). Springer, Milano, 2012.
Felzmann, H., et al., 'Towards Transparency by Design for Artificial Intelligence,', Science and Engineering Ethics, 26:3333–3361, 2020.
Fernández García, J., Spatharou, A., Hieronimus, S., Beck, J.P., Jenkins, J. Transforming healthcare with AI: the impact on the workforce and organisations. Executive summary. EIT Health & McKinsey & Company, March 2020.
Ferryman, K. and Pitcan, M., Fairness in precision medicine. Data & Society, 2018.
Fihn SD, Saria S, Mendonça E, Hain S, Matheny M, Shah N, Liu H, Auerbach, A. 'Deploying AI in clinical settings. In artificial intelligence in health care: The hope, the hype, the promise, the peril', Editors: Matheny M, Israni ST, Ahmed M, Whicher D. Washington, DC: National Academy of Medicine, 2019.
Filice, R.W. and Ratwani, R.M. 'The case for user-centered artificial intelligence in radiology', Radiology: Artificial Intelligence, 2020, Vol. 2, No. 3.
Finlayson, S.G., Bowers, J.D. Ito, J., Zittrain, J.L., Beam, A.L., Kohane, I.S., 'Adversarial attacks on medical machine learning', Science, 2019.
Fiorini N, Leaman R, Lipman DJ, Lu Z. How user intelligence is improving. PubMed. Nat Biotechnol. 2018a.
Fiorini N, Canese K, Starchenko G, Kireev E, Kim W, Miller V, Osipov M, Kholodov M, Ismagilov R, Mohan S, Ostell J, Lu Z. 'Best Match: New relevance search for PubMed', PLoS Biol. 2018b;16(8):e2005343.
Firth J, Torous J, Nicholas J, Carney R, Pratap A, Rosenbaum S, Sarris J. 'The efficacy of smartphone-based mental health interventions for depressive symptoms: a meta-analysis of randomized controlled trials', World Psychiatry. 2017;16(3):287-298.
Fitzpatrick KK, Darcy A, Vierhile M. 'Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial', JMIR Ment Health.;4(2):e19, 2017.
Fleming N. 'How artificial intelligence is changing drug discovery', Nature: 557:S55–57, 2018.
Forster T, Kentikelenis A, Bambra, C, 'Health Inequalities in Europe: Setting the Stage for Progressive Policy Action', Foundation for European Progressive Studies. TASC: Think tank for action on social change. 2018.
Freeman, K, Dinnes, J, Chuchu, N, Takwoingi, Y, Bayliss, SE, Matin, RN, Jain, A, Walter, FM, Williams, HC and Deeks, JJ, 'Algorithm based smartphone apps to assess risk of skin cancer in adults: systematic review of diagnostic accuracy studies' BMJ, 2020, 368.
FUTURE-AI: Best practices for trustworthy AI in medical imaging, www.future-ai.eu, accessed November 2021.
Geis JR, Brady A, Wu CC, Spencer J, Ranschaert E, Jaremko JL, Langer SG, Kitts AB, Birch J, Shields WF, van den Hoven van Genderen R, Kotter E, Gichoya JW, Cook TS, Morgan MB, Tang A, Safdar NM, Kohli M. 'Ethics of artificial intelligence in radiology: summary of the joint European and North American multisociety statement', Insights Imaging, 2019 Oct 1;10(1):101. doi: 10.1186/s13244-019-0785-8. PMID: 31571015; PMCID: PMC6768929.
General Data Protection Regulation (GDPR), Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016, https://eur-lex.europa.eu/eli/reg/2016/679/oj, accessed December 2021.
Gerke, S., Minssen, T. and Cohen, G. 'Ethical and legal challenges of artificial intelligence-driven healthcare'. In Artificial intelligence in healthcare (pp. 295-336). Academic Press, 2020.
German Data Ethics Commission, Opinion of the Data Ethics Commission, July 2019, https:// www.bmjv.de/DE/Themen/FokusThemen/Datenethikkommission/Datenethikkommission_EN_ node.html.
Ghassemi, M. 'Exploring Healthy Models in ML for Health', AI for Healthcare Equity Conference, AI & Health at MIT, 2021. https://www.youtube.com/watch?v=5uZROGFYfcA
Gillespie, N., Lockey, S., & Curtis, C. 'Trust in Artificial Intelligence: A Five Country Study', The University of Queensland and KPMG Australia, 2021.
Gillies RJ, Kinahan PE, Hricak H. 'Radiomics: images are more than pictures, they are data,' Radiology.;278:563–577, 2016.
Giulietti M, Cecati M, Sabanovic B, Scirè A, Cimadamore A, Santoni M, et al. 'The role of artificial intelligence in the diagnosis and prognosis of renal cell tumors', Diagnostics, ;11(2):206, 2021.
Golbraikh A, Wang X, Zhu H, Tropsha A. 'Predictive QSAR modelling: methods and applications in drug discovery and chemical risk assessment', In Handbook of Computational Chemistry, ed. J Leszczynski, A Kaczmarek-Kedziera, T Puzyn, MG Papadopoulos, H Reis, MK, 2012.
Gómez-González E, Gómez E. 'Artificial Intelligence in medicine and healthcare: applications, availability and societal impact', EUR 30197 EN. Publications Office of the European Union, Luxembourg, 2020.
Goodfellow, I., Bengio, Y. and Courville, A., Deep learning. MIT Press, 2016.
Graham S, Depp C, Lee EE, Nebeker C, Tu X, Kim HC, Jeste DV. 'Artificial intelligence for mental health and mental illnesses: An overview', Curr Psychiatry Rep;21:116, 2019.
Guo J, Li B. 'The application of medical artificial intelligence technology in rural areas of developing countries', Health Equity, ; 2: 174–81, 2018.
Gupta R, Kleinjans J and Caiment F. 'Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning', BMC Cancer.;21(962), 2021.
Haibe-Kains, B., Adam, G.A., Hosny, A., Khodakarami, F., Waldron, L., Wang, B., McIntosh, C., Goldenberg, A., Kundaje, A., Greene, C.S. and Broderick, T., 'Transparency and reproducibility in artificial intelligence', Nature, 586(7829), pp.E14-E16, 2020.
Hamed S, Thapar-Björkert S, Bradby H, Ahlberg B. 'Racism in European Health Care: Structural Violence and Beyond', Sage Journals.;30(11), 2020.
Harned, Z., Lungren, M.P. and Rajpurkar, P. 'Machine vision, medical AI, and malpractice', Harv. JL & Tech. Dig, 2019.
Harvey, H.B. and Gowda, V., 'How the FDA regulates AI. Academic radiology', 27(1), pp.58-61, 2020.
Hashimoto DA, Rosman G, Witkowski ER, et al. 'Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy', Ann Surg.;270:414e421, 2019.
Hashimoto, D.A., Rosman, G., Rus, D. and Meireles, O.R., 'Artificial intelligence in surgery: promises and perils', Annals of surgery, 268(1), p.70, 2018.
Hermsen M, Bel T, Boer M Den, Steenbergen EJ, Kers J, Florquin S, et al. 'Deep learning-based histopathologic assessment of kidney tissue', J Am Soc Nephrol.;30(10):1968–79, 2019.
Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J.P. and Shah, N.H. 'MINIMAR (MINimum Information for Medical AI Reporting): developing reporting standards for artificial intelligence in health care', Journal of the American Medical Informatics Association, 27(12), pp.2011-2015, 2020.
Hill, N.R., Sandler, B., Mokgokong, R., Lister, S., Ward, T., Boyce, R., Farooqui, U. and Gordon, J., 'Cost- effectiveness of targeted screening for the identification of patients with atrial fibrillation: evaluation of a machine learning risk prediction algorithm', Journal of medical economics, 23(4), pp.386-393, 2020.
Hocking, L., Parks, S., Altenhofer, M. and Gunashekar, S.,. 'Reuse of health data by the European pharmaceutical industry', RAND Corporation, 2019.
Hoffman, K.M., Trawalter, S., Axt, J.R. and Oliver, M.N., 'Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites', Proceedings of the National Academy of Sciences, 113(16), pp.4296-4301, 2016.
Human-Centred Artificial Intelligence Programme, www.dtu.dk/english/Education/msc/Programmes/human-centered-artificial-intelligence, accessed November 2021.
Islam MM, Nasrin T, Walther BA, Wu CC, Yang HC, Li YC. 'Prediction of sepsis patients using machine learning approach: a meta-analysis', Comput Methods Programs Biomed. 170:1-9, 2019.
Jamthikar AD, Gupta D, Saba L, Khanna NN, Viskovic K, Mavrogeni S, Laird JR, Sattar N, Johri AM, Pareek G, Miner M, Sfikakis PP, Protogerou A, Viswanathan V, Sharma A, Kitas GD, Nicolaides A, Kolluri R, Suri JS. 'Artificial intelligence framework for predictive cardiovascular and stroke risk assessment models: A narrative review of integrated approaches using carotid ultrasound', Comput Biol Med.;126:104043, 2020.
Jiang S, Chin KS, Tsui KL. 'A universal deep learning approach for modeling the flow of patients under different severities', Comput Methods Programs Biomed.;154:191-203, 2018.
Jin, J.M., Bai, P., He, W., Wu, F., Liu, X.F., Han, D.M., Liu, S. and Yang, J.K.,. 'Gender differences in patients with COVID-19: focus on severity and mortality', Frontiers in public health, 8, p.152, 2020.
Kaddoum R, Fadlallah R, Hitti E, El-Jardali F, El Eid G. 'Causes of cancellations on the day of surgery at a Tertiary Teaching Hospital', BMC Health Serv. Res. 16, 2016.
Kaissis, G.A., Makowski, M.R., Rückert, D. and Braren, R.F. 'Secure, privacy-preserving and federated machine learning in medical imaging', Nature Machine Intelligence, 2(6), pp.305-311, 2020.
Kamat AS, Parker A. 'Effect of perioperative inefficiency on neurosurgical theatre efficacy: a 15-year analysis', Br. J. Neurosurg. 29: 565–568, 2015.
Kaminski, M.E. and Malgieri, G. 'Algorithmic impact assessments under the GDPR: producing multi- layered explanations. U of Colorado Law Legal Studies Research Paper', (19-28), 2019.
Kaushal, A., Altman, R. and Langlotz, C. 'Geographic distribution of US cohorts used to train deep learning algorithms', Jama, 324(12), pp.1212-1213, 2020.
Kiener, M. ''You may be hacked' and other things doctors should tell you'. The Conversation. 3 November 2020. https://theconversation.com/you-may-be-hacked-and-other-things-doctors-should- tell-you-148946
Kim, D.W., Jang, H.Y., Kim, K.W., Shin, Y. and Park, S.H. 'Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers', Korean Journal of Radiology, 20(3), p.405, 2019.
Kirubarajan A, Taher A, Khan S, Masood S. 'Artificial intelligence in emergency medicine: A scoping review', J Am Coll Emerg Physicians Open.;1(6):1691-1702, 2019.
Koene, A., Clifton, C., Hatada, Y., Webb, H. and Richardson, R., A governance framework for algorithmic accountability and transparency, EPRS, European Parliament, 2019.
Kompa, B., Snoek, J. and Beam, A.L. 'Second opinion needed: communicating uncertainty in medical machine learning', NPJ Digital Medicine, 4(1), pp.1-6, 2021.
Koops, B.J.. 'The concept of function creep. Law, Innovation and Technology', 13(1), pp.29-56, 2021.
Krittanawong, C. 'The rise of artificial intelligence and the uncertain future for physicians', European Journal of Internal Medicine, 48, pp.e13-e14, 2018.
Kulkarni S, Seneviratne N, Baig MS, Khan AHA. 'Artificial Intelligence in Medicine: Where Are We Now?' Acad Radiol. Jan;27(1):62-70., 2020.
Kuo C-C, Chang C-M, Liu K-T, Lin W-K, Chiang H-Y, Chung C-W, et al. 'Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning', NPJ Digit Med.;2(29), 2019.
Lake IR, Colón-González FJ, Barker GC, Morbey RA, Smith GE, Elliot AJ. 'Machine learning to refine decision making within a syndromic surveillance service', BMC Public Health; 19: 559, 2019.
Larson, D.B., Harvey, H., Rubin, D.L., Irani, N., Justin, R.T. and Langlotz, C.P., 2021. 'Regulatory frameworks for development and evaluation of artificial intelligence–based diagnostic imaging algorithms: Summary and recommendations', Journal of the American College of Radiology, 18(3), pp.413-424, 2021.
Leavy, S. 'Gender bias in artificial intelligence: the need for diversity and gender theory in machine learning', GE '18: Proceedings of the 1st International Workshop on Gender Equality in Software Engineering, May 2018.
Lee CS, Lee AY. 'How Artificial Intelligence Can Transform Randomized Controlled Trials', Transl Vis Sci Technol.;9(2):9, 2020.
Lee EE, Torous J, De Choudhury M, Depp CA, Graham SA, Kim HC, Paulus MP, Krystal JH, Jeste DV. 'Artificial intelligence for mental health care: Clinical applications, barriers, facilitators, and artificial wisdom', Biol Psychiatry Cogn Neurosci Neuroimaging.;6(9):856-864, 2021.
Lehne M, Sass J, Essenwanger A, Schepers J, Thun S. 'Why digital medicine depends on interoperability', NPJ Digit Med.;2:79, 2019.
Lekadir, K. et al. 'FUTURE-AI: Best practices for trustworthy AI in medicine', www.future-ai.org, 2022.
Leone, D., Schiavone, F., Appio, F.P. and Chiao, B. 'How does artificial intelligence enable and enhance value co-creation in industrial markets? An exploratory case study in the healthcare ecosystem', Journal of Business Research, 129, pp.849-859, 2021.
Lewis, J.R. 'The system usability scale: past, present, and future', International Journal of Human– Computer Interaction, 34(7), pp.577-590, 2018.
Li, Y. and Vasconcelos, N. 'Repair: Removing representation bias by dataset resampling', In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9572-9581), 2019.
Lindenmeyer MT, Alakwaa F, Rose M, Kretzler M. 'Perspectives in systems nephrology,' Cell Tissue Res, 2021.
Lipton, Zachary C. 'The doctor just won't accept that!' arXiv preprint arXiv:1711.08037, 2017.
Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. 'A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis', Lancet Digit Health.1(6):e271-e297, 2019.
Liu, X., Rivera, S.C., Moher, D., Calvert, M.J. and Denniston, A.K. 'Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension', BMJ, 370, 2020.
Loftus TJ, Filiberto AC, Li Y, Balch J, Cook AC, Tighe PJ, Efron PA, Upchurch GR Jr, Rashidi P, Li X, Bihorac
A. 'Decision analysis and reinforcement learning in surgical decision-making', Surgery.168(2):253-266, 2020.
Loftus TJ, Upchurch GR Jr, Bihorac A. 'Use of Artificial Intelligence to Represent Emergent Systems and Augment Surgical Decision-Making', JAMA Surg. 154(9):791-792, 2019.
Lopez-Jimenez F, Attia Z, Arruda-Olson AM, Carter R, Chareonthaitawee P, Jouni H, et al. 'Artificial Intelligence in Cardiology: Present and Future', Mayo Clin Proc; 95(5):1015–39, 2020.
Lorkowski J, Kolaszyńska O, Pokorski M. 'Artificial intelligence and precision medicine: a perspective', Adv Exp Med Biol. Jun 18. Doi. Epub ahead of print. PMID: 34138457, 2021.
Lundberg, S.M., Lee, S.I.. 'A unified approach to interpreting model predictions', in: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA. p. 4768–4777, 2017.
Lv, Z. and Piccialli, F. 'The security of medical data on Internet based on differential privacy technology', ACM Transactions on Internet Technology, 21(3), pp.1-18, 2021.
Maddox TM, Rumsfeld JS, Payne PRO. 'Questions for artificial intelligence in health care', JAMA. 321(1):31- 32, 2019.
Madine, M.M., Battah, A.A., Yaqoob, I., Salah, K., Jayaraman, R., Al-Hammadi, Y., Pesic, S. and Ellahham, S. 'Blockchain for giving patients control over their medical records' IEEE Access, 8, pp.193102-193115, 2020.
Magrabi, F., Ammenwerth, E., McNair, J.B., De Keizer, N.F., Hyppönen, H., Nykänen, P., Rigby, M., Scott, P.J., Vehko, T., Wong, Z.S.Y. and Georgiou, A. 'Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implication', Yearbook of Medical Informatics, 28(1), p.128, 2019.
Maharana A, Nsoesie EO. 'Use of deep learning to examine the association of the built environment with prevalence of neighborhood adult obesity', JAMA Network Open. 1(4):e181535, 2018.
Maliha, G., Gerke, S., Cohen, I.G. and Parikh, R.B. 'Artificial Intelligence and Liability in Medicine: Balancing Safety and Innovation', The Milbank Quarterly, 2021.
Mamoshina P, Ojomoko L, Yanovich Y, Ostrovski A, Botezatu A, Prikhodko P, Izumchenko E, Aliper A, Romantsov K, Zhebrak A, Ogu IO, Zhavoronkov A. 'Converging blockchain and next-generation artificial intelligence technologies to decentralize and accelerate biomedical research and healthcare', Oncotarget.;9:5665-5690, 2017.
Manne, R. and Kantheti, S.C. 'Application of artificial intelligence in healthcare: chances and challenges'. Current Journal of Applied Science and Technology, pp.78-89, 2021.
Marschang S, 'The European Health Data Space: is there room enough for all?' European Public Health Alliance', https://epha.org/the-european-health-data-space-is-there-room-enough-for-all/, 2021.
Mayerhoefer ME, Materka A, Langs G, Häggström I, Szczypiński P, Gibbs P, Cook G. 'Introduction to Radiomics', J Nucl Med. 61:488-495, 2020.
McCoy, L.G., Nagaraj, S., Morgado, F., Harish, V., Das, S. and Celi, L.A. 'What do medical students actually need to know about artificial intelligence?' NPJ Digital Medicine, 3(1), pp.1-3, 2020.
McKeown, A., Mourby, M., Harrison, P., Walker, S., Sheehan, M. and Singh, I. 'Ethical issues in consent for the reuse of data in health data platforms', Science and Engineering Ethics, 27(1), pp.1-21, 2021.
McKinney, S. M. et al. 'International evaluation of an AI system for breast cancer screening', Nature 577, 89–94, 2020.
Medeiros J, Schwierz C. Efficiency estimates of health care systems in the EU. European Commission. Directorate-General for Economic and Financial Affairs. 2015.
Medeiros, J., Schwierz, C., 'Efficiency estimates of health care systems', Economic Papers, European Commission, 2015.
Menke NB, Caputo N, Fraser R, Haber J, Shields C, Menke MN. 'A retrospective analysis of the utility of an artificial neural network to predict ED volume', Am J Emerg Med.32:614-7, 2014.
Meskó B, Görög M. 'A short guide for medical professionals in the era of artificial intelligence', NPJ Digit Med. 3:126, 2020.
Michel JP, Ecarnot F. 'The shortage of skilled workers in Europe: its impact on geriatric medicine', Eur Geriatr Med. 11(3):345-347, 2020.
Miotto, R, Li L, Kidd BA, Dudley JIT. 'Deep patient: An unsupervised representation to predict the future of patients from the electronic health records', Scientific Reports. 6:26094, 2020.
Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, Velázquez Vega JE, Brat DJ, Cooper LAD. 'Predicting cancer outcomes from histology and genomics using convolutional networks', Proc Natl Acad Sci U S A.; 115(13):E2970-E2979, 2018.
Mohr DC, Riper H, Schueller SM. 'A Solution-Focused Research Approach to Achieve an Implementable Revolution in Digital Mental Health', JAMA Psychiatry; 75(2):113-114, 2018.
Mooney SJ, Pejaver V. 'Big data in public health: terminology, machine learning, and privacy', Annu Rev Public Health;39:95-112, 2018.
Mora-Cantallops, M.; Sánchez-Alonso, S.; García-Barriocanal, E.; Sicilia, M.-A. 'Traceability for Trustworthy AI: A Review of Models and Tool', Big Data Cogn. Comput. 5, 20, 2021.
Morley, J. and Floridi, L. 'An ethically mindful approach to AI for health care', Lancet vol. 395, pp. 254-255, 2020.
Mulcahy, N. 'Recent Cyberattack Disrupted Cancer Care Throughout U.S' WebMD. 20 July 2021. https://www .webmd.com/cancer/news/20210720/recent-cyberattack-disrupted-cancer-care-us
Nagar A, Yew P, Fairley D, Hanrahan M, Cooke S, Thompson I, Elbaz W. 'Report of an outbreak of Clostridium difficile infection caused by ribotype 053 in a neurosurgery unit', J. Infect. Prev. 16: 126–130, 2015.
Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, Topol EJ, Ioannidis JPA, Collins GS, Maruthappu M. 'Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies', BMJ. ;368:m689., 2020.
National Careers Service: The Skills Toolkit, https://nationalcareers.service.gov.uk/find-a-course/the- skills-toolkit, accessed November 2021.
Newman, L.H. 'These Hackers Made an App That Kills to Prove a Point', WIRED. 16 July 2019, https://www .wired.com/story/medtronic-insulin-pump-hack-app/.
NHS England. Clinical audit, https:// www. england.nhs. uk/ clinaudit/, 2021.
NHS Improvement, Good Practice Guide: Focus on Improving Patient Flow, 2017. https://improvement.nhs.uk/documents/1426/Patient_Flow_Guidance_201713_July_2017.pdf
Niazi MKK, Parwani AV, Gurcan MN. 'Digital pathology and artificial intelligence', Lancet Oncol.; 20(5):e253-e261, 2019.
Noseworthy PA, Attia ZI, Brewer LPC, Hayes SN, Yao X, Kapa S, et al. 'Assessing and Mitigating Bias in Medical Artificial Intelligence: The Effects of Race and Ethnicity on a Deep Learning Model for ECG Analysis', Circ Arrhythmia Electrophysiol.;13(3), 2020.
Obermeyer Z, Powers B, Vogeli C, Mullainathan S, 'Dissecting racial bias in an algorithm used to manage the health of populations', Science, vol. 366, no. 6464, pp. 447–453, Oct. 2019.
OECD/European Union, Health at a Glance: Europe 2020: State of health in the EU cycle. OECD Publishing, Paris, 2020.
OECD/European Union. Dementia prevalence. In Health at a Glance: Europe 2018: State of Health in the EU Cycle, OECD Publishing, Paris/European Union, Brussels, 2018.
Okanoue T, Shima T, Mitsumoto Y, Umemura A, Yamaguchi K, Itoh Y, Yoneda M, Nakajima A, Mizukoshi E, Kaneko S, Harada K. 'Artificial intelligence/neural network system for the screening of nonalcoholic fatty liver disease and nonalcoholic steatohepatitis', Hepatol Res 51(5):554–569, 2021.
Ota, N., Tachibana, K., Kusakabe, T., Sanada, S. and Kondoh, M. 'A Concept for a Japanese Regulatory Framework for Emerging Medical Devices with Frequently Modified Behavior', Clinical and translational science, 13(5), pp.877-879, 2020.
Panwar, H., Gupta, P. K., Siddiqui, M. K., Morales-Menendez, R., Bhardwaj, P., & Singh, V. 'A deep learning and grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images', Chaos, Solitons & Fractals, 140, 110190, 2020.
Paranjape, K., Schinkel, M., Panday, R.N., Car, J. and Nanayakkara, P. 'Introducing artificial intelligence training in medical education' JMIR Medical Education, 5(2), p.e16048, 2019.
Parikh RB, Teeple S, Navathe AS. 'Addressing bias in artificial intelligence in health care', JAMA; 322(24):2377-2378, 2019.
Park S, Park BS, Lee YJ, Kim IH, Park JH, Ko J, et al. 'Artificial intelligence with kidney disease: A scoping review with bibliometric analysis', PRISMA-ScR. Medicine (Baltimore);100(14), 2021.
Park, S.H. and Han, K. 'Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction', Radiology, 286(3), pp.800-809, 2018.
Park, Y., Jackson, G.P., Foreman, M.A., Gruen, D., Hu, J. and Das, A.K. 'Evaluating artificial intelligence in medicine: phases of clinical research', Journal of American Medical Informatics Associations Open, 3(3), pp.326-331, 2020.
Peng J, Wang Y. 'Medical Image Segmentation with Limited Supervision: A Review of Deep Network Models', IEEE Access.; 99:, 2021.
Pérez MJ, Grande RG. 'Application of artificial intelligence in the diagnosis and treatment of hepatocellular carcinoma: A review', World J Gastroenterol. 26(37):5617–5628, 2021.
Pickering B. Trust, but Verify: Informed Consent, AI Technologies, and Public Health Emergencies, Future Internet 13(5):132, 2021.
Pinto, A., Pinto, F., Faggian, A., Rubini, G., Caranci, F., Macarini, L., Genovese, E.A. and Brunese, L. 'Sources of error in emergency ultrasonography', Critical Ultrasound Journal, 5(1), pp.1-, 2013.
Ploug, T, Holm S. 'Meta Consent –A Flexible Solution to the Problem of Secondary Use of Health Data', Bioethics, 30 (9), 2016.
Prokop M, van Everdingen W, van Rees Vellinga T, et al. 'CORADS— a categorical CT assessment scheme for patients with suspected COVID-19: definition and evaluation', Radiology, 2020:201473, [E-pub ahead of print, 2020 Apr 27].
Quaglio G, Brand H, Dario C. 'Fighting dementia in Europe: the time to act is now', Lancet Neurol. 15(5):452-4, 2016.
Quaglio GL, Boone R. What if we could fight drug addiction with digital technology?, EPRS, European Parliament, 2019.
Quaglio GL, Pirona A, Esposito G, Karapiperis T, Brand H, Dom G, Bertinato L, Montanari L, Kiefer F, Giuseppe Carrà G. 'Knowledge and utilization of technology-based interventions for substance use disorders: an exploratory study among health professionals in the European Union. Drugs: Education, Prevention and Policy; 26 (5): 437-446, 2018.
Quaglio GL. EU public health policy. 2020. European Parliamentary Research Services (EPRS). European Parliament, Brussels.
Quaglio GL, Millar S, Pazour M, Albrecht V, Vondrak T, Kwiek M, Schuch K. Exploring the performance gap in EU Framework Programmes between EU13 and EU15 Member States. 2020B. European Parliamentary Research Services (EPRS). European Parliament, Brussels.
Quer G, Arnaout R, Henne M, Arnaout R. 'Machine Learning and the Future of Cardiovascular Care: JACC State-of-the-Art Review', J Am Coll Cardiol. 77(3):300–13, 2021.
Raghupathi, W. and Raghupathi, V. 'Big data analytics in healthcare: promise and potential. Health information science and systems', 2(1), pp.1-10, 2014.
Raji, I.D. 'Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing', arXiv preprint, arXiv:2001.00973, 2020.
Rajkomar, A., Hardt, M., Howell, M.D., Corrado, G. and Chin, M.H., 2018. 'Ensuring fairness in machine learning to advance health equity', Annals of Internal Medicine, 169(12), pp.866-872, 2018.
Ram S, Zhang W, Williams M, Pengetnze Y. 'Predicting asthma-related emergency department visits using big data', IEEE J Biomed Health Inform. 19:1216-23, 2015.
Rampton, V., Mittelman, M. and Goldhahn, J. 'Implications of artificial intelligence for medical education', The Lancet Digital Health, 2(3), pp.e111-e112, 2020.
Reardon, S. 'Rise of robot radiologists', Nature, 576(7787), pp.S54-S54, 2019.
Recht, M.P., Dewey, M., Dreyer, K., Langlotz, C., Niessen, W., Prainsack, B. and Smith, J.J. 'Integrating artificial intelligence into the clinical practice of radiology: challenges and recommendations', European Radiology, pp.1-9, 2020.
Redlich R, Almeida JJ, Grotegerd D, Opel N, Kugel H, Heindel W, et al. 'Brain morphometric biomarkers distinguishing unipolar and bipolar depression: A voxel-based morphometry—Pattern classification approach', JAMA Psychiatry; 71:1222–1230, 2014.
Reece AG, Reagan AJ, Lix KLM, Dodds PS, Danforth CM, Langer EJ. 'Forecasting the onset and course of mental illness with Twitter data', Sci Rep. 7(1):13006, 2017.
Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC, 2015.
Reisman, D., Schultz, J., Crawford, K. and Whittaker, M. 'Algorithmic impact assessments: A practical framework for public agency accountability', AI Now Institute, pp.1-22, 2018.
Roberts, H., Cowls, J., Morley, J., Taddeo, M., Wang, V. and Floridi, L. 'The Chinese approach to artificial intelligence: an analysis of policy, ethics, and regulation', AI & SOCIETY, pp.1-19, 2020.
Roski J, Chapman W, Heffner J, Trivedi R, Del Fiol G, Kukafka R, Bleicher Estiri OH, Klann J, Pierce J. 'How artificial intelligence is changing health and health care'. In Artificial Intelligence in Health Care: The hope, the hype, the promise, the peril. Editors: Matheny M, Israni ST, Ahmed M, Whicher D. Washington, DC: National Academy of Medicine, 2019.
Samulowitz A, Gremyr I, Eriksson E, Hensing G. ''Brave Men' and 'Emotional Women': A Theory-Guided Literature Review on Gender Bias in Health Care and Gendered Norms towards Patients with Chronic Pain', Pain Res Manag. 2018;2018:6358624, 2018.
Sapci AH, Sapci HA. 'Innovative assisted living tools, remote monitoring technologies, artificial intelligence-driven solutions, and robotic systems for aging societies: systematic review', JMIR Aging
;2(2):e15429, 2019.
Scheetz, J., Rothschild, P., McGuinness, M., Hadoux, X., Soyer, H.P., Janda, M., Condon, J.J., Oakden-Rayner, L., Palmer, L.J., Keel, S. and van Wijngaarden, P. 'A survey of clinicians on the use of artificial intelligence in ophthalmology, dermatology, radiology and radiation oncology', Scientific Reports, 11(1), pp.1-10, 2021.
Schrider DR, Kern AD. 'Supervised machine learning for population genetics: a new paradigm', Trends Genet. 34:301–12, 2018.
Schwalbe N, Wahl B. 'Artificial intelligence and the future of global health', Lancet; 395(10236):1579-1586, 2020.
Schwartz WB. 'Medicine and the computer: the promise and problems of chang', N Engl J Med. 1970;283(23):1257-1264, 2020.
Scott, I., Carter, S. and Coiera, E. 'Clinician checklist for assessing suitability of machine learning applications in healthcare', BMJ Health & Care Informatics, 28(1), 2021.
Secretary-General of the OECD. Tackling wasteful spending on health, OECD Publishing, Paris, 2017.
Secretary-General of the OECD. Trustworthy AI in health. Background paper for the G20 AI Dialogue, Digital Economy Task Force, 2020.
Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I.Y. and Ghassemi, M. 'CheXclusion: Fairness gaps in deep chest X-ray classifiers' BIOCOMPUTING 2021: Proceedings of the Pacific Symposium (pp. 232-243), 2021.
Sheller, M.J., Edwards, B., Reina, G.A., Martin, J., Pati, S., Kotrotsou, A., Milchenko, M., Xu, W., Marcus, D., Colen, R.R. and Bakas, S. 'Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data', Scientific Reports, 10(1), pp.1-12, 2020.
Shickel B, Loftus TJ, Adhikari L, Ozrazgat-Baslanti T, Bihorac A, Rashidi P. 'DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning' Sci Rep; 9:1879, 2019.
Shin EK, Mahajan R, Akbilgic O, Shaban-Nejad A. 'Sociomarkers and biomarkers: predictive modeling in identifying pediatric asthma patients at risk of hospital revisits', NPJ Digit Med; 1:50, 2018.
Shortliffe EH, Sepúlveda MJ. 'Clinical decision support in the era of artificial intelligence', JAMA.;320(21):2199-2200, 2018.
Shukla, 2016; pp. 2303–40. Dordrecht, Neth.: Springer.
Simpson S, Kay FU, Abbara S, et al. Radiological Society of North America Expert consensus statement on reporting chest CT findings related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA [E-pub ahead of print, 2020 Apr 28]. J Thorac Imaging 2020.
Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. 'Artificial intelligence-enhanced electrocardiography in cardiovascular disease management', Nat Rev Cardiol.;18:465–478, 2021.
Sit, C., Srinivasan, R., Amlani, A., Muthuswamy, K., Azam, A., Monzon, L. and Poon, D.S. 'Attitudes and perceptions of UK medical students towards artificial intelligence and radiology: a multicentre survey', Insights into Imaging, 11(1), p.14, 2020.
Smith, H. 'Clinical AI: opacity, accountability, responsibility and liability', AI & SOCIETY, pp.1-11, 2020.
Sornapudi S, Stanley RJ, Stoecker WV, Almubarak H, Long R, Antani S, Thoma G, Zuna R, Frazier SR. 'Deep Learning Nuclei Detection in Digitized Histology Images by Superpixels', J Pathol Inform; 9:5, 2018.
Stanford University, Human-Centered Artificial Intelligence, https://hai.stanford.edu/, accessed November 2021.
Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. 'Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease.', PLoS One 13(8):e0202344, 2018.
Steiner DF, MacDonald R, Liu Y, Truszkowski P, Hipp JD, Gammage C, Thng F, Peng L, Stumpe MC.' Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer', Am J Surg Pathol. ;42(12):1636-1646, 2018.
Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, Tran VM, Chiappino-Pepe A, Badran AH, Andrews IW, Chory EJ, Church GM, Brown ED, Jaakkola TS, Barzilay R, Collins JJ. 'A Deep Learning Approach to Antibiotic Discovery', Cell. 180(4):688- 702.e13, 2020.
Strianese O, Rizzo F, Ciccarelli M, Galasso G, D'Agostino Y, Salvati A, Del Giudice C, Tesorio P, and Rusciano
M. 'Precision and Personalized Medicine: How Genomic Approach Improves the Management of Cardiovascular and Neurodegenerative Disease', Genes. 11(7):747, 2020.
Stylianou N, Fackrell R, Vasilakis C. 'Are medical outliers associated with worse patient outcomes? A retrospective study within a regional NHS hospital using routine data', BMJ Open 7. e015676, 2017.
Subbaswamy, A. and Saria, S. 'From development to deployment: dataset shift, causality, and shift-stable models in health AI', Biostatistics, 21(2), pp.345-352, 2020.
Sydow D, Burggraaff L, Szengel A, van Vlijmen HWT, AP IJ, et al.' Advances and challenges in computational target prediction', J. Chem. Inf. Model. 59:1728–42, 2019.
Tanguay-Sela, M., Benrimoh, D., Perlman, K., Israel, S., Mehltretter, J., Armstrong, C., Fratila, R., Parikh, S., Karp, J., Heller, K. and Vahia, I. 'Evaluating the Usability and Impact of an Artificial Intelligence-Powered Clinical Decision Support System for Depression Treatment', Biological Psychiatry, 87(9), p.S171, 2020.
The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for Self-Assessment, ALTAI, European Commission, https://op.europa.eu/es/publication-detail/-/publication/73552fcd-f7c2-11ea-991b- 01aa75ed71a1 2020.
The Assessment List for Trustworthy Artificial Intelligence, https://altai.insight-centre.org, accessed November 2021.
The World Bank, 'Maternal mortality ratio (modeled estimate, per 100,000 live births) – European Union', https://data.worldbank.org/indicator/SH.STA.MMRT?locations=EU, last accessed December 2021.
Tjoa, E. and Guan, C. 'A survey on explainable artificial intelligence (xai): Toward medical xai', IEEE Transactions on Neural Networks and Learning Systems, 2020.
Tlapa D, Zepeda-Lugo CA, Tortorella GL, Baez-Lopez YA, Limon-Romero J, Alvarado-Iniesta A, Rodriguez- Borbon MI. 'Effects of Lean Healthcare on Patient Flow: A Systematic Review', Value Health. 23(2):260- 273, 2020.
Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, et al. 'A clinically applicable approach to continuous prediction of future acute kidney injury', Nature. 572(7767):116–9, 2019.
Topol, EJ, 'High-performance medicine: the convergence of human and artificial intelligence' Nature Medicine, 25(1), 44–56, 2019.
TRIPOD, www.tripod-statement.org, accessed November 2021. Tutt, A.. 'An FDA for algorithms', Admin. L. Rev., 69, p.83, 2017.
U.S. Food and Drug Administration (FDA). Proposed Regulatory Framework for Modifications to Artificial Intelligence / Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD), 2019.
U.S. Food and Drug Administration (FDA), 'Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) - Discussion Paper and Request for Feedback'.
U.S. Food and Drug Administration (FDA). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, 2021
United Nations Educational, Scientific and Cultural Organization (UNESCO). Artificial Intelligence and Gender Equality: Key Findings of UNESCO's Global Dialogue, 2020.
United Nations News. 'More women and girls needed in the sciences to solve world's biggest challenges', February 2019.. https://news.un.org/en/story/2019/02/1032221
Viceconti, M, Pappalardo, F., Rodriguez, B., Horner, M., Bischoff, J. Musuamba Tshinanu, F. 'In silico trials: Verification, validation and uncertainty quantification of predictive models used in the regulatory evaluation of biomedical products', Methods 185; 120-127, 2021.
Vijayan V, Connolly J, Condell J, McKelvey N and Gardiner P. Review of Wearable Devices and Data Collection Considerations for Connected Health. Sensors. 2021; 21(16): 5589.
Vyas, D.A., et al. 'Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms', The New England Journal of Medicine (383), pp. 874-882., 2020.
Wager TD, Woo CW. 'Imaging biomarkers and biotypes for depression', Nat Med. 23(1):16-17, 2017.
Walsh CG, Ribeiro JD, Franklin JC. 'Predicting risk of suicide attempts over time through machine learning', Clin Psychol Sci; 5, 457–469, 2017.
Wanless D. 'Securing Good Health for the Whole Population', HM Treasury; 2004.
Westergaard, D., Moseley, P., Sørup, F.K.H., Baldi, P. and Brunak, S. 'Population-wide analysis of differences in disease progression patterns in men and women', Nature communications, 10(1), pp.1-14, 2019.
Whitby B. 'Automating medicine the ethical way', In: Pontier M (ed) Rysewyk Machine Medical Ethics (Intelligent Systems, Control and Automation: Science and Engineering). Springer, Switzerland, 2015.
Wiggers, K. 'Google's breast cancer-predicting AI research is useless without transparency, critics say', VentureBeat, 14 October 2020. https://venturebeat.com/2020/10/14/googles-breast-cancer-predicting- ai-research-is-useless-without-transparency-critics-say/.
Williams, R. 'Lack of transparency in AI breast cancer screening study 'could lead to harmful clinical trials', scientists say', iNews UK, 14 October 2020.
Wolff, J., Pauling, J., Keck, A. and Baumbach, J. 'The economic impact of artificial intelligence in health care: systematic review', Journal of Medical Internet Research, 22(2), p.e16866, 2020.
Wood, A., Najarian, K. and Kahrobaei, D. 'Homomorphic encryption for machine learning in medicine and bioinformatics', ACM Computing Surveys (CSUR), 53(4), pp.1-35, 2020.
World Health Organization (WHO). Depression in Europe: facts and figures, 2021a. https:// www.euro.who.int/en/health-topics/noncommunicable-diseases/mental- health/news/news/2012/10/depression-in-europe/depression-in-europe-facts-and-figures
World Health Organization (WHO). Ethics and governance of artificial intelligence for health: WHO guidance, 2021b.
World Health Organization (WHO). Global strategy on human resources for health: workforce 2030, Geneva, 2016. https:// www.who.int/hrh/resources/pub_globstrathrh-2030/en/
Xu, W. 'Toward human-centered AI: a perspective from human-computer Interactions', 26(4), pp.42-46, 2019.
Yang, G., Ye, Q., & Xia, J. 'Unbox the Black box for the Medical Explainable AI via Multi-modal and Multi- centre Data Fusion: A Mini-Review, Two Showcases and Beyond' ArXiv, abs/2102.01998, 2021.
Yazdavar AH, Mahdavinejad MS, Bajaj G, Romine W, Sheth A, Monadjemi AH, Thirunarayan K, Meddar JM, Myers A, Pathak J, Hitzler P. 'Multimodal mental health analysis in social media', PLoS One; 15(4):e0226248, 2020.
Yu, K.H. and Kohane, I.S. 'Framing the challenges of artificial intelligence in medicine', BMJ Quality & Safety, 28(3), pp.238-241, 2019.
Zange L, Muehlberg F, Blaszczyk E, Schwenke S, Traber J, Funk S and Schulz-Menger J. 'Quantification in cardiovascular magnetic resonance: agreement of software from three different vendors on assessment of left ventricular function, 2D flow and parametric mapping', Journal of Cardiovascular Magnetic Resonance; 21:12, 2019.
Zech, J.R., Badgeley, M.A., Liu, M., Costa, A.B., Titano, J.J. and Oermann, E.K. 'Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study', PLoS Medicine, 15(11), p.e1002683, 2018.
Zhang BH, Lemoine B, Mitchell M. 'Mitigating unwanted biases with adversarial learning', In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335-340, 2018.
Zhang L, Tan J, Han D, Zhu H. 'From machine learning to deep learning: progress in machine intelligence for rational drug discovery. Drug Discov. Today ; 22(11):1680–85, 2017.
Zhao L, Wang W, Sedykh A, Zhu H. 'Experimental errors in QSAR modeling sets: What we can do and what we cannot do', ACS Omega, 2:2805–12, 2017.
Zhu H, Zhang J, Kim MT, Boison A, Sedykh A, Moran K. 'Big data in chemical toxicity research: the use of high-throughput screening assays to identify potential toxicants', Chem. Res. Toxicol; 27:1643–51, 2014.
Zhu H. 'Big Data and Artificial Intelligence Modeling for Drug Discover', Annu Rev Pharmacol Toxicol. Jan 6;60:573-589, 2020.