Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning

Safiul Haque Chowdhury; Md Shafiul Alam Chowdhury; Mohammed Ibrahim Hussain; Mohammed Sowket Ali; Muhammad Minoar Hossain; Mohammad Mamun

Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning

Files

Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning 276.pdf (98.63 KB)

Date

2025-09-29

Authors

Safiul Haque Chowdhury

Md Shafiul Alam Chowdhury

Mohammed Ibrahim Hussain

Mohammed Sowket Ali

Muhammad Minoar Hossain

Mohammad Mamun

Publisher

2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning, NCIM 2025

Abstract

Hepatitis is a severe liver inflammation that can lead to chronic disease, liver failure, and death if untreated. It is caused by viral infections, autoimmune disorders, or excessive alcohol consumption, with viral hepatitis (e.g., Hepatitis B and Hepatitis C) being particularly fatal. Early diagnosis and accurate prognosis are crucial for effective treatment. To address this, we employ advanced computational techniques for hepatitis classification and risk assessment, collecting a dataset of 615 samples with 12 biochemical features (e.g., Albumin, Bilirubin, and Cholesterol) obtained from blood donors. We apply the Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance and implement Deep Feature Synthesis (DFS) with Aggregation Primitives (AP), Transformation Primitives (TP), and a Hybrid DFS approach to generate three feature-enhanced datasets. Multiple machine learning (ML) models, including Extreme Gradient Boosting (XGB), Random Forest (RF), Gradient Boosting Decision Trees (GBDT), Categorical Boosting (CB), and Adaptive Boosting (AB), are trained with and without DFS. Performance is evaluated using accuracy, precision, recall, F1-score, and specificity, from Confusion Matrix (CM) analysis via 10-fold cross-validation. The GBDT model with Hybrid DFS achieves the highest accuracy of 99.49%, along with 99.56% precision, 99.81% recall, 99.69% F1-score, and 99.12% specificity. To enhance interpretability, we apply Explainable Artificial Intelligence (XAI) techniques, namely Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), to analyze feature importance and model behaviour. The proposed Hybrid DFS-GBDT approach demonstrates high effectiveness and interpretability, offering a robust framework for hepatitis diagnosis and prognosis.

Keywords

Hepatitis C Detection, Blood Donor Data Analysis, Hybrid Deep Feature Synthesis, Medical Data Classification

Citation

Chowdhury, Safiul Haque, et al. "Hepatitis C detection from blood donor data using hybrid deep feature synthesis and interpretable machine learning." 2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM). IEEE, 2025.

URI

http://dspace.uttarauniversity.edu.bd:4000/handle/123456789/1423

Collections

Journal Articles

Full item page

Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By