Int J Performability Eng ›› 2023, Vol. 19 ›› Issue (6): 368-378.doi: 10.23940/ijpe.23.06.p2.368378

Previous Articles     Next Articles

Demographic and Clinical Factors Role Identification in Stroke Risk and Subtype Prediction

Deepak Kumara, Chaman Vermab, Purushottam Sharmac,*, Deeksha Kumaria, and Zoltán Illésb   

  1. aApex Institute of Technology, CSE, Chandigarh University, Mohali, Punjab, India;
    bDepartment of Media and Educational Informatics, Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary;
    cAmity School of Engineering & Technology, Amity University, Uttar Pradesh, Noida, India
  • Contact: * E-mail address: puru.mit2002@gmail.com

Abstract: The purpose of this study was to analyze the factors associated with stroke risk in a patient population comprising 4798 individuals. Using k-means clustering analysis, we identified a significant relationship between subpopulations and the degree of paralysis in stroke patients. Furthermore, we developed a machine learning model that utilized demographic and clinical factors to predict stroke subtypes, achieving an impressive overall accuracy rate of 86%. The crucial determinants for classifying the stroke subtype were found to be the patient's neurological condition, consciousness and memory, body mass index (BMI), glucose levels, and risk score. To gain deeper insights into the interrelationships among different variables, we applied principal component analysis (PCA) to the target attribute of stroke (TOS). The PCA analysis revealed five key principal components that shed light on the underlying dynamics. Specifically, age, cholesterol, glucose, diastolic blood pressure, and modified Rankin Scale (MRS) strongly influenced PC2. Conversely, risk score, MRS, systolic blood pressure, not specified abbreviation (nhiss), and diastolic blood pressure had a strong impact on PC1. In summary, this study contributes to the understanding of stroke risk factors by highlighting the relationship between subpopulations and paralysis severity. Moreover, the developed machine learning model demonstrates promising accuracy in predicting stroke subtypes based on key demographic and clinical factors. The findings obtained through PCA provide valuable insights into the interplay among different variables, emphasizing the influence of specific factors on principal components PC1 and PC2.

Key words: K-means clustering, predictive capability, demographic and clinical factors, significant relationship, PCA