Performance of a deep learning algorithm in detecting osteonecrosis of the femoral head on digital radiography: A comparison with assessments by radiologists

Choong Guen Chee, Youngjune Kim, Yusuhn Kang, Kyong Joon Lee, Hee Dong Chae, Jungheum Cho, Chang Mo Nam, Dongjun Choi, Eugene Lee, Joon Woo Lee, Sung Hwan Hong, Joong Mo Ahn, Heung Sik Kang

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

OBJECTIVE. The objective of our study was to compare the sensitivity of a deep learning (DL) algorithm with the assessments by radiologists in diagnosing osteonecrosis of the femoral head (ONFH) using digital radiography. MATERIALS AND METHODS. We performed a two-center, retrospective, noninferiority study of consecutive patients (≥ 16 years old) with a diagnosis of ONFH based on MR images. We investigated the following four datasets of unilaterally cropped hip anteroposterior radiographs: training (n = 1346), internal validation (n = 148), temporal external test (n = 148), and geographic external test (n = 250). Diagnostic performance was measured for a DL algorithm, a less experienced radiologist, and an experienced radiologist. Noninferiority analyses for sensitivity were performed for the DL algorithm and both radiologists. Subgroup analysis for precollapse and postcollapse ONFH was done. RESULTS. Overall, 1892 hips (1037 diseased and 855 normal) were included. Sensitivity and specificity for the temporal external test set were 84.8% and 91.3% for the DL algorithm, 77.6% and 100.0% for the less experienced radiologist, and 82.4% and 100.0% for the experienced radiologist. Sensitivity and specificity for the geographic external test set were 75.2% and 97.2% for the DL algorithm, 77.6% and 75.0% for the less experienced radiologist, and 78.0% and 86.1% for the experienced radiologist. The sensitivity of the DL algorithm was noninferior to that of the assessments by both radiologists. The DL algorithm was more sensitive for precollapse ONFH than the assessment by the less experienced radiologist in the temporal external test set (75.9% vs 57.4%; 95% CI of the difference, 4.5–32.8%). CONCLUSION. The sensitivity of the DL algorithm for diagnosing ONFH using digital radiography was noninferior to that of both less experienced and experienced radiologist assessments.

Original languageEnglish
Pages (from-to)155-162
Number of pages8
JournalAmerican Journal of Roentgenology
Volume213
Issue number1
DOIs
StatePublished - 1 Jan 2019

    Fingerprint

Keywords

  • Machine learning
  • Osteonecrosis of femoral head
  • Radiography
  • Sensitivity

Cite this