This post summarizes a study presented at MEDINFO 2019 exploring how Machine Learning (ML) and Electronic Health Records (EHRs) can be used to identify women at risk of Postpartum Depression (PPD) during pregnancy.


1. Objective

Postpartum Depression (PPD) remains one of the most common maternal health conditions, yet effective screening strategies are still limited.
This study aims to improve early detection by leveraging routinely collected clinical data from EHRs to support prediction and decision-making during prenatal care.


2. Methodology

The study analyzed 9,980 pregnancy episodes from Weill Cornell Medicine and New York-Presbyterian Hospital (2015–2017).

Data Structure

  • Data was standardized using the OMOP Common Data Model
  • Included:
    • Demographics
    • Diagnoses (inpatient & outpatient)
    • Medication prescriptions

Algorithms Tested

Six machine learning models were evaluated:

  • Logistic Regression (L2-regularized)
  • Support Vector Machine (SVM)
  • Decision Tree
  • Naïve Bayes
  • XGBoost
  • Random Forest

Feature Engineering

Two types of predictors were considered:

  • Time-independent: race, BMI, marital status
  • Time-dependent: diagnoses and medications tracked across the three trimesters

3. Key Findings and Performance

The Support Vector Machine (SVM) achieved the best performance with an AUC of 0.79, showing strong predictive capability.

Main Predictors

Mental Health Factors

  • Anxiety disorders
  • Depressive disorders
  • General mental health conditions (across all trimesters)

Physical & Obstetric Factors

  • Obesity and abnormal weight gain
  • Threatened miscarriage
  • Premature labor
  • Pain conditions (e.g., back pain, abdominal pain)

Medication Use

  • Antidepressant use throughout pregnancy
  • Anti-inflammatory drugs (especially in the second trimester)

Demographics

  • Single marital status
  • Certain race groups (within the study population)

4. Conclusion and Future Directions

The study demonstrates that machine learning models can effectively predict PPD using longitudinal clinical data from EHRs.

Limitations

  • Overfitting risk due to oversampling techniques for imbalanced data
  • Single-site dataset, limiting generalizability
  • Potential missing data from external healthcare providers

Future Work

  • Use of multi-site datasets
  • Exploration of deep learning models
  • Integration of causal inference methods

Final Takeaway

Machine learning applied to real-world clinical data shows strong potential to improve early detection of postpartum depression, enabling more proactive and personalized maternal care.