by Aurelia Bustos,Antonio Pertusa,Jose-Maria Salinas,Maria de la Iglesia-VayáResearch Only
by Aurelia Bustos,Antonio Pertusa,Jose-Maria Salinas,Maria de la Iglesia-VayáLicense : Research Only
We present a labeled large-scale, high resolution chest x-ray dataset for automated ex-ploration of medical images along with their associated reports. This dataset includes more than 160,000 images from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy mapped to standard Unified Medical Language System (UMLS) terminology. A 27% of the reports were manually annotated by trained physicians and the remaining set was labeled using a supervised method based on a recurrent neural network with attention mechanisms.Generated labels were validated, achieving a 0.93 Micro-F1 score using an independent test set. To the best of our knowledge, this is the first public database of chest x-rays annotated with the largest number of different labels suitable for training supervised on radiographs, and the first one in Spanish containing radiographic reports.