Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning

Safiul Haque Chowdhury; Md Shafiul Alam Chowdhury; Mohammed Ibrahim Hussain; Mohammed Sowket Ali; Muhammad Minoar Hossain; Mohammad Mamun

Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning

dc.contributor.author	Safiul Haque Chowdhury
dc.contributor.author	Md Shafiul Alam Chowdhury
dc.contributor.author	Mohammed Ibrahim Hussain
dc.contributor.author	Mohammed Sowket Ali
dc.contributor.author	Muhammad Minoar Hossain
dc.contributor.author	Mohammad Mamun
dc.date.accessioned	2026-04-29T06:16:50Z
dc.date.issued	2025-09-29
dc.description.abstract	Hepatitis is a severe liver inflammation that can lead to chronic disease, liver failure, and death if untreated. It is caused by viral infections, autoimmune disorders, or excessive alcohol consumption, with viral hepatitis (e.g., Hepatitis B and Hepatitis C) being particularly fatal. Early diagnosis and accurate prognosis are crucial for effective treatment. To address this, we employ advanced computational techniques for hepatitis classification and risk assessment, collecting a dataset of 615 samples with 12 biochemical features (e.g., Albumin, Bilirubin, and Cholesterol) obtained from blood donors. We apply the Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance and implement Deep Feature Synthesis (DFS) with Aggregation Primitives (AP), Transformation Primitives (TP), and a Hybrid DFS approach to generate three feature-enhanced datasets. Multiple machine learning (ML) models, including Extreme Gradient Boosting (XGB), Random Forest (RF), Gradient Boosting Decision Trees (GBDT), Categorical Boosting (CB), and Adaptive Boosting (AB), are trained with and without DFS. Performance is evaluated using accuracy, precision, recall, F1-score, and specificity, from Confusion Matrix (CM) analysis via 10-fold cross-validation. The GBDT model with Hybrid DFS achieves the highest accuracy of 99.49%, along with 99.56% precision, 99.81% recall, 99.69% F1-score, and 99.12% specificity. To enhance interpretability, we apply Explainable Artificial Intelligence (XAI) techniques, namely Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), to analyze feature importance and model behaviour. The proposed Hybrid DFS-GBDT approach demonstrates high effectiveness and interpretability, offering a robust framework for hepatitis diagnosis and prognosis.
dc.identifier.citation	Chowdhury, Safiul Haque, et al. "Hepatitis C detection from blood donor data using hybrid deep feature synthesis and interpretable machine learning." 2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM). IEEE, 2025.
dc.identifier.issn	979-8-3315-5543-6
dc.identifier.uri	http://dspace.uttarauniversity.edu.bd:4000/handle/123456789/1423
dc.language.iso	en_US
dc.publisher	2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning, NCIM 2025
dc.subject	Hepatitis C Detection
dc.subject	Blood Donor Data Analysis
dc.subject	Hybrid Deep Feature Synthesis
dc.subject	Medical Data Classification
dc.title	Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning 276.pdf
Size:: 98.63 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Journal Articles