In a recent online article published in AuntMinnie Europe, Dr. Hugh Harvey of Guy’s and St. Thomas’ Hospital in London posits that “medical imaging data isn’t ready for AI.” It is a sobering, clear, and grounded perspective from a practitioner with deep domain expertise in both areas.
Dr. Harvey references a paper by Neil Lawrence titled, “Data Readiness Levels” in which he describes four levels of readiness. Level D data is unverified, inaccessible, in an unknown format, and also not anonymized. As he points out, this is the quality of data that can be found in most hospital’s PACS systems and it is “difficult or impossible for machine learning to do anything with it”. This is the reason why so many AI companies emerging in the healthcare industry today are floundering at this early stage in their evolution, echoing the truism garbage in, garbage out.
According to Dr. Harvey, level A data is “structured, fully annotated, has minimal noise, and, most importantly, is contextually appropriate and ready for a specific machine-learning task.” There is very little imaging data in the world today that can meet these specifications because this data curation task is difficult and time-consuming. There are a multitude of papers that have used limited level A data sets to make some amazing discoveries. The problem with using limited amounts of data is the results don’t have the statistical validity provided by studies powered with large data sets.
One publication discovered a model that can predict the survival of patients with Glioblastoma Multiforme, the most frequent brain tumor in adults, but only used 112 patients in the study. Wilson and Devaraj performed a mini-review of the literature regarding the use of quantitative imaging features, also known as radiomics, in the assessment and management of lung nodules. They cite several publications that have shown that radiomics have the ability to diagnose, i.e. determine malignant nodules from benign with an 82.7% accuracy, and even predict histological subtypes of tumors with an AUC of 0.72. Again these were done with only 127 data points in the former study and 350 in the latter.
While much of the research work currently being done in the use of imaging data is in the oncology space, there are many other areas where this data is being studied and could result in significant changes in clinical practice. These areas include cardiology, pulmonology and neurology to name just a few.
One of the more interesting studies I recently came across was by Chaddad et al in which they discovered radiomic biomarkers for the diagnosis and characterization of subjects with autism spectrum disorder (ASD). The research is based on a limited dataset of only 64 patients, so many will question the statistical validity of the results.
There is only one way to solve the problem of these limited level A dataset clinical trials and that is to have the data curated in the standard clinical workflow. In other words, the clinicians that are reading these images, radiologists in most cases, need to become data curators. Unfortunately, PACS were only designed to allow radiologists to perform rapid reads on digital images not to perform data curation. So, in classic entrepreneurial fashion, the team at HealthMyne analyzed the problem and is stepping up to deliver this functionality and much more.
Given the team’s long experience, they have spent much time making sure the company’s QIDS (pronounced ‘kwids’) platform fits into the radiologist’s and their downstream referrer’s workflows. It launches from any standard worklist and is FDA cleared as a PACS so it has all of the standard tools necessary to do the job. The difference is that HealthMyne’s value-added functionality allows for the creation of Quantitative Imaging (the QI in QIDS) data that is then used to provide Decision Support (the DS). This means clinicians using the platform can do their work more efficiently, create beautiful level A datasets for clinical and research purposes, and provide better care for their patients without arduous, pain-staking effort.
Some sites are reading hundreds of CT, MR, and PET image sets daily. Imagine if every single one of those resided in a database with fully annotated information about the patient, the location of the abnormalities, and hundreds of data points about the features of those abnormalities, from simple diameter measurements, to volumetric information, to complex data about textures and spiculation. In mere weeks there could be thousands upon thousands of level A datasets just ripe for discoveries to made with them using AI or deep learning or whatever data scientists come up with next. What might come of those discoveries? It’s hard to say, but based on the research that has been done so far on limited datasets, you can bet that it will include the ability to diagnose earlier using less invasive methods, choose and design better treatments, and understand much more about how to obtain better results for patients suffering from all sorts of ailments.