Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings

Sohee Park, Sang Min Lee, Kyung Hee Lee, Kyu Hwan Jung, Woong Bae, Jooae Choe, Joon Beom Seo

Research output: Contribution to journalArticle


Objective: To investigate the feasibility of a deep learning–based detection (DLD) system for multiclass lesions on chest radiograph, in comparison with observers. Methods: A total of 15,809 chest radiographs were collected from two tertiary hospitals (7204 normal and 8605 abnormal with nodule/mass, interstitial opacity, pleural effusion, or pneumothorax). Except for the test set (100 normal and 100 abnormal (nodule/mass, 70; interstitial opacity, 10; pleural effusion, 10; pneumothorax, 10)), radiographs were used to develop a DLD system for detecting multiclass lesions. The diagnostic performance of the developed model and that of nine observers with varying experiences were evaluated and compared using area under the receiver operating characteristic curve (AUROC), on a per-image basis, and jackknife alternative free-response receiver operating characteristic figure of merit (FOM) on a per-lesion basis. The false-positive fraction was also calculated. Results: Compared with the group-averaged observations, the DLD system demonstrated significantly higher performances on image-wise normal/abnormal classification and lesion-wise detection with pattern classification (AUROC, 0.985 vs. 0.958; p = 0.001; FOM, 0.962 vs. 0.886; p < 0.001). In lesion-wise detection, the DLD system outperformed all nine observers. In the subgroup analysis, the DLD system exhibited consistently better performance for both nodule/mass (FOM, 0.913 vs. 0.847; p < 0.001) and the other three abnormal classes (FOM, 0.995 vs. 0.843; p < 0.001). The false-positive fraction of all abnormalities was 0.11 for the DLD system and 0.19 for the observers. Conclusions: The DLD system showed the potential for detection of lesions and pattern classification on chest radiographs, performing normal/abnormal classifications and achieving high diagnostic performance. Key Points: • The DLD system was feasible for detection with pattern classification of multiclass lesions on chest radiograph. • The DLD system had high performance of image-wise classification as normal or abnormal chest radiographs (AUROC, 0.985) and showed especially high specificity (99.0%). • In lesion-wise detection of multiclass lesions, the DLD system outperformed all 9 observers (FOM, 0.962 vs. 0.886; p < 0.001).

Original languageEnglish
Pages (from-to)1359-1368
Number of pages10
JournalEuropean Radiology
Issue number3
StatePublished - 1 Mar 2020



  • Automated pattern recognition
  • Classification
  • Deep learning
  • Thoracic radiography

Cite this