A new study by CRI Research Fellows Ignacio Atal and his collaborators in Manchester Metropolitan University, UK, was published in the BMJ-Open. The paper shows that researchers developing machine learning tools for medical diagnosis fail to correctly report the methods they used in their research. This result calls for an increase in the standards of reporting of research to further improve replicability and transparency of health-related machine learning.
Over the past decade, access to large amounts of clinical data have led to a rise in the application of machine learning methods to medicine, and particularly to medical diagnosis. Based on large amounts of patient data with a given diagnosis (for instance, skin pictures with the label “skin cancer” or “not skin cancer”), researchers are training machines to automatically conduct these diagnostic tasks. A machine learns to diagnose by mimicking the diagnosis conducted in those large amounts of data. For instance, if you give the trained machine a skin picture from a new patient, the machine will say that the patient has a skin cancer if in the database there is a similar skin picture with a skin cancer diagnosis.
If you would like to rely on such a machine to conduct medical diagnosis, it is necessary to know the characteristics of the data used to train the machine, such as patients’ characteristics, how and who performed the actual diagnosis, where the data comes from, etc. Without this knowledge, it is impossible to 1) replicate those studies, and 2) be sure those results apply to all contexts.
In this systematic review, Ignacio and collaborators analysed published 28 medical research articles reporting the development and evaluation of machine-learning based diagnosis systems. They assessed for each article to which extent authors report the characteristics of the patients’ data used to train their machines. They showed that a large proportion of articles lacked adequate detail on participants characteristics, making it difficult to replicate, assess and interpret study findings.
Diagnostic studies using ML methods have great potential to improve clinical decision-making and take the load off health systems. However, studies with poor reporting can be more problematic than of help. Within biomedical research, there is already established frameworks and guidelines that machine learning researchers can use to aid the reporting of their work, but most of them fail to follow them. This work will hopefully be a call for health-related machine learning researchers to improve the reporting of their research to increase transparency and replicability of research results.
You can find the full text of the study here.