This post summarizes a study presented at MEDINFO 2019 exploring how Machine Learning (ML) and Electronic Health Records (EHRs) can be used to identify women at risk of Postpartum Depression (PPD) during pregnancy.
1. Objective
Postpartum Depression (PPD) remains one of the most common maternal health conditions, yet effective screening strategies are still limited.
This study aims to improve early detection by leveraging routinely collected clinical data from EHRs to support prediction and decision-making during prenatal care.
2. Methodology
The study analyzed 9,980 pregnancy episodes from Weill Cornell Medicine and New York-Presbyterian Hospital (2015–2017).
Data Structure
- Data was standardized using the OMOP Common Data Model
- Included:
- Demographics
- Diagnoses (inpatient & outpatient)
- Medication prescriptions
Algorithms Tested
Six machine learning models were evaluated:
- Logistic Regression (L2-regularized)
- Support Vector Machine (SVM)
- Decision Tree
- Naïve Bayes
- XGBoost
- Random Forest
Feature Engineering
Two types of predictors were considered:
- Time-independent: race, BMI, marital status
- Time-dependent: diagnoses and medications tracked across the three trimesters
3. Key Findings and Performance
The Support Vector Machine (SVM) achieved the best performance with an AUC of 0.79, showing strong predictive capability.
Main Predictors
Mental Health Factors
- Anxiety disorders
- Depressive disorders
- General mental health conditions (across all trimesters)
Physical & Obstetric Factors
- Obesity and abnormal weight gain
- Threatened miscarriage
- Premature labor
- Pain conditions (e.g., back pain, abdominal pain)
Medication Use
- Antidepressant use throughout pregnancy
- Anti-inflammatory drugs (especially in the second trimester)
Demographics
- Single marital status
- Certain race groups (within the study population)
4. Conclusion and Future Directions
The study demonstrates that machine learning models can effectively predict PPD using longitudinal clinical data from EHRs.
Limitations
- Overfitting risk due to oversampling techniques for imbalanced data
- Single-site dataset, limiting generalizability
- Potential missing data from external healthcare providers
Future Work
- Use of multi-site datasets
- Exploration of deep learning models
- Integration of causal inference methods
Final Takeaway
Machine learning applied to real-world clinical data shows strong potential to improve early detection of postpartum depression, enabling more proactive and personalized maternal care.