Abstract
This study aimed to develop a model to predict the 5-year risk of developing end-stage renal disease (ESRD) in patients with type 2 diabetes mellitus (T2DM) using machine learning (ML). It also aimed to implement the developed algorithms into electronic medical records (EMR) system using Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR). The final dataset used for modeling included 19,159 patients. The medical data were engineered to generate various types of features that were input into the various ML classifiers. The classifier with the best performance was XGBoost, with an area under the receiver operator characteristics curve (AUROC) of 0.95 and area under the precision recall curve (AUPRC) of 0.79 using three-fold cross-validation, compared to other models such as logistic regression, random forest, and support vector machine (AUROC range, 0.929–0.943; AUPRC 0.765–0.792). Serum creatinine, serum albumin, the urine albumin-to-creatinine ratio, Charlson comorbidity index, estimated GFR, and medication days of insulin were features that were ranked high for the ESRD risk prediction. The algorithm was implemented in the EMR system using HL7 FHIR through an ML-dedicated server that preprocessed unstructured data and trained updated data.
Original language | English |
---|---|
Article number | 11232 |
Journal | Scientific Reports |
Volume | 12 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2022 |
Bibliographical note
Publisher Copyright:© 2022, The Author(s).