Development of early prediction model for pregnancy-associated hypertension with graph-based semi-supervised learning

Seung Mi Lee, Yonghyun Nam, Eun Saem Choi, Young Mi Jung, Vivek Sriram, Jacob S. Leiby, Ja Nam Koo, Ig Hwan Oh, Byoung Jae Kim, Sun Min Kim, Sang Youn Kim, Gyoung Min Kim, Sae Kyung Joo, Sue Shin, Errol R. Norwitz, Chan Wook Park, Jong Kwan Jun, Won Kim, Dokyoon Kim, Joong Shin Park

Research output: Contribution to journalArticlepeer-review


Clinical guidelines recommend several risk factors to identify women in early pregnancy at high risk of developing pregnancy-associated hypertension. However, these variables result in low predictive accuracy. Here, we developed a prediction model for pregnancy-associated hypertension using graph-based semi-supervised learning. This is a secondary analysis of a prospective study of healthy pregnant women. To develop the prediction model, we compared the prediction performances across five machine learning methods (semi-supervised learning with both labeled and unlabeled data, semi-supervised learning with labeled data only, logistic regression, support vector machine, and random forest) using three different variable sets: [a] variables from clinical guidelines, [b] selected important variables from the feature selection, and [c] all routine variables. Additionally, the proposed prediction model was compared with placental growth factor, a predictive biomarker for pregnancy-associated hypertension. The study population consisted of 1404 women, including 1347 women with complete follow-up (labeled data) and 57 women with incomplete follow-up (unlabeled data). Among the 1347 with complete follow-up, 2.4% (33/1347) developed pregnancy-associated HTN. Graph-based semi-supervised learning using top 11 variables achieved the best average prediction performance (mean area under the curve (AUC) of 0.89 in training set and 0.81 in test set), with higher sensitivity (72.7% vs 45.5% in test set) and similar specificity (80.0% vs 80.5% in test set) compared to risk factors from clinical guidelines. In addition, our proposed model with graph-based SSL had a higher performance than that of placental growth factor for total study population (AUC, 0.71 vs. 0.80, p < 0.001). In conclusion, we could accurately predict the development pregnancy-associated hypertension in early pregnancy through the use of routine clinical variables with the help of graph-based SSL.

Original languageEnglish
Article number15793
JournalScientific Reports
Issue number1
StatePublished - Dec 2022


Dive into the research topics of 'Development of early prediction model for pregnancy-associated hypertension with graph-based semi-supervised learning'. Together they form a unique fingerprint.

Cite this