Evaluation of AI-assisted medical image interpretation: a core outcome set and design guidance for multi-reader multi-case studies

Artificial intelligence (AI) tools to assist medical image interpretation (e.g., radiology, dermatology) are increasingly reaching regulatory clearance, procurement and early deployment in healthcare systems. Almost all are intended for use in concert with a human reader, who retains responsibility for the final clinical decision. Their effect on patient care is therefore mediated by how clinicians respond to AI output, and a tool may improve, leave unchanged, or degrade care depending on that interaction. The multi-reader multi-case (MRMC) study has become the established interim step at which these tools are evaluated before deployment, and forms the principal evidence underpinning many regulatory and procurement decisions.

However, MRMC evaluation currently defaults to measures of diagnostic accuracy, which are necessary but insufficient to characterise the safety of human-in-the-loop AI. Tools of equivalent accuracy may differ in their effect on clinician behaviour in ways that bear on patient safety which are not captured by accuracy-based outcomes.

Currently there is no defined COS of outcomes for MRMC studies of AI-assisted image interpretation, and no accessible guidance for designing studies capable of capturing them. Without these, researchers, regulators, procurers and developers lack a unified framework to assess whether AI-assisted image interpretation improve care. This study seeks to address these gaps by developing a COS and accompanying researcher-facing design guidance for MRMC studies evaluating AI-assisted medical image interpretation.

Aim: To develop a COS and accompanying design guidance, for multi-reader multi-case studies evaluating AI-assisted medical image interpretation.

Objectives and work packages:
1. To identify the outcomes, endpoints and design features currently reported in MRMC studies of AI-assisted medical image inerpretation, and to elicit stakeholder priorities for outcomes not currently captured MRMC research. Work package 1: systematic review and multi-stakeholder qualitative interviews
2. To define a COS for studies evaluating AI-assisted medical image interpretation. Work package 2: modified Delphi study, inclusive engagement of under-served communities.
3. To agree definitions and existing measurement approaches for each core outcome, and to identify where measurement approaches do not yet exist. Work package 2: consensus meeting
4. To produce accessible, researcher-facing design guidance enabling investigators to configure MRMC studies to capture agreed core outcomes. Work package 3: guidance development and end-user validation.

Impact:
This project will establish the first COS, with recommended measurement approaches (and identification when measurement approaches do not currently exist) and accompanying design guidance, for MRMC studies evaluating AI-assisted medical image interpretation. This will aim to reduce outcome reporting heterogeneity, enable synthesis across studies, strengthen the evidence base on the safety and effectiveness of AI-assisted medical image interpretation, and give future researchers, regulators and procurers a common basis for judging whether a tool has been adequately evaluated before it reaches patients.

Related COS:
This work is related to: Development of a Core outcome set and outcome measures for Artificial Intelligence-based conveRsational agents in hEalthcare: CARE study (Ref 3469, ongoing). This related work develops a COS for AI-based conversational agents in healthcare. The present study is distinct in focussing on AI tools that assist clinician image interpretation, and on the MRMC study as the specific evaluation design.


Contributors

PI: Ms Rachel Kuo, Plastic surgery registrar, NIHR Doctoral Research Fellow, Oxford

Study Management Group:
Professor Dominic Furniss, Professor of Plastic and Reconstructive Surgery, Oxford
Professor Gary Collins; Professor of Medical Statistics, EQUATOR Director, University of Birmingham
Professor Alastair Denniston; Professor of Regulatory Science and Innovation, CERSI-AI Director, University of Birmingham
Dr Elizabeth Tutton; Senior Research Fellow in Qualitative Methods, University of Oxford
Dr Sian Rees; Director, Community Involvement and Workforce Innovation, Oxford Academic Health Science Network
Ms Judi Smith; PPI collaborator
Ms Rosie Hill; PPI collaborator

Collaborators:
Ms Michelle Gavin; Head of Development, Friends, Families and Travellers
Dr Beth Fordham; Senior Research Fellow in Psychology, University of Oxford
Dr Viktorija Kaminskaite; NIHR Academic Clinical Fellow, University of Warwick
Dr Shihabul Hassan; Academic Foundation Programme Doctor, University of Oxford

Further Study Information

Current Stage: Ongoing
Date: June 2026 - July 2028
Funding source(s): Rachel Kuo: NIHR Doctoral Research Fellowship (302562) Pending further funding application.


Health Area

Disease Category: Methodological & diagnostic

Disease Name: N/A

Target Population

Age Range: 0 - 120

Sex: Either

Nature of Intervention: Other

Stakeholders Involved

- Clinical experts
- Families
- Governmental agencies
- Methodologists
- Patient/ support group representatives
- Policy makers
- Regulatory agency representatives
- Researchers
- Service commissioners
- Statisticians

Study Type

- COS (Other)

Method(s)

- Consensus meeting
- Delphi process
- Interview
- Nominal group technique (NGT)
- Systematic review

This core outcome set will be developed in accordance with the COMET Handbook and reported following the COS-STAD and COS-STAR standards. The work comprises three stages.

Work package 1:
Here, a candidate list of outcomes will be generated from two sources: a systematic review of multi-reader multi-case (MRMC) studies of AI-assisted medical image interpretation, characterising the outcomes currently measured and how they are defined; and a multi-stakeholder qualitative interview study identifying outcomes of importance not currently captured in routine practice. Outcomes will be grouped into a structured ontology of outcome names and domains.

Work package 2:
Here, the candidate outcomes will be prioritised through an online modified Delphi conducted over three rounds, stratified across six stakeholder groups: patients and the public; clinicians and clinician-academics in image-interpreting specialties; regulators and policymakers; AI researchers from academic and industry settings; methodologists; and NHS management and procurement. Participants will rate the importance of each outcome on the 9-point GRADE scale, with anonymised feedback and re-rating between rounds. To ensure inclusive participation, the priorities of under-served communities will additionally be sought through facilitated focus groups conducted in parallel. Predefined consensus criteria, consistent with COS methodology, will determine which outcomes are provisionally retained.

A final consensus meeting, convening representatives of all stakeholder groups, will use structured discussion (nominal group technique) to ratify the core outcome set and agree a definition and recommended measurement approach for each retained outcome.

Work package 3:,
Here, the agreed core outcome set will inform researcher-facing design guidance for MRMC studies. Patients and the public are involved throughout as stakeholders and co-producers.

Linked Studies

    No related studies


Related Links

    No related links