Advancing Population Health Segmentation Using Explainable AI in Big Data Environments
Keywords:
Explainable Artificial Intelligence (XAI), Population Health Segmentation, Big Data Analytics, SHAP (SHapley Additive Explanations), Healthcare Risk Stratification.Abstract
Population health segmentation creates cohorts of individuals with similar health needs to help develop targeted healthcare interventions. The U.S. healthcare system faces potential benefits and obstacles when using diverse datasets comprising electronic health records and social determinants for patient segmentation. Although complex machine learning models improve segmentation precision, they function as "black boxes" that obstruct clinical acceptance. XAI methods, especially SHAP (Shapley Additive exPlanations), solve the problem of model opacity by clarifying which features contribute to model decisions. We present a framework that combines Explainable AI methods with big data analytics to create transparent population segmentation. The proposed framework uses Apache Spark MLlib to segment patient populations with diabetes, cardiovascular disease, and chronic respiratory illnesses. Our research shows that SHAP-based explanations effectively reveal main factors (e.g., lab values, comorbidities, social factors) that drive population segments. SHAP-based explanations allow clinicians to understand critical drivers such as lab values, medical comorbidities, and social factors for each patient segment, improving clinical decision-making. Our case studies and realistic examples demonstrate how explainable segmentation leads to optimal resource allocation while allowing for personalized care plans and ethical supervision. bias detection) In large-scale health systems. This discussion presents the technical and clinical benefits of implementing XAI-driven segmentation within U.S. healthcare systems to enhance population health outcomes through transparency and trust-building.