Predicting Postpartum Depression Using Machine Learning and EHRs

This post summarizes a study presented at MEDINFO 2019 exploring how Machine Learning (ML) and Electronic Health Records (EHRs) can be used to identify women at risk of Postpartum Depression (PPD) during pregnancy.

1. Objective

Postpartum Depression (PPD) remains one of the most common maternal health conditions, yet effective screening strategies are still limited.
This study aims to improve early detection by leveraging routinely collected clinical data from EHRs to support prediction and decision-making during prenatal care.

2. Methodology

The study analyzed 9,980 pregnancy episodes from Weill Cornell Medicine and New York-Presbyterian Hospital (2015–2017).

Data Structure

Data was standardized using the OMOP Common Data Model
Included:
- Demographics
- Diagnoses (inpatient & outpatient)
- Medication prescriptions

Algorithms Tested

Six machine learning models were evaluated:

Logistic Regression (L2-regularized)
Support Vector Machine (SVM)
Decision Tree
Naïve Bayes
XGBoost
Random Forest

Feature Engineering

Two types of predictors were considered:

Time-independent: race, BMI, marital status
Time-dependent: diagnoses and medications tracked across the three trimesters

3. Key Findings and Performance

The Support Vector Machine (SVM) achieved the best performance with an AUC of 0.79, showing strong predictive capability.

Main Predictors

Mental Health Factors

Anxiety disorders
Depressive disorders
General mental health conditions (across all trimesters)

Physical & Obstetric Factors

Obesity and abnormal weight gain
Threatened miscarriage
Premature labor
Pain conditions (e.g., back pain, abdominal pain)

Medication Use

Antidepressant use throughout pregnancy
Anti-inflammatory drugs (especially in the second trimester)

Demographics

Single marital status
Certain race groups (within the study population)

4. Conclusion and Future Directions

The study demonstrates that machine learning models can effectively predict PPD using longitudinal clinical data from EHRs.

Limitations

Overfitting risk due to oversampling techniques for imbalanced data
Single-site dataset, limiting generalizability
Potential missing data from external healthcare providers

Future Work

Use of multi-site datasets
Exploration of deep learning models
Integration of causal inference methods

Final Takeaway

Machine learning applied to real-world clinical data shows strong potential to improve early detection of postpartum depression, enabling more proactive and personalized maternal care.