Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning
| dc.contributor.author | Safiul Haque Chowdhury | |
| dc.contributor.author | Md Shafiul Alam Chowdhury | |
| dc.contributor.author | Mohammed Ibrahim Hussain | |
| dc.contributor.author | Mohammed Sowket Ali | |
| dc.contributor.author | Muhammad Minoar Hossain | |
| dc.contributor.author | Mohammad Mamun | |
| dc.date.accessioned | 2026-04-29T06:16:50Z | |
| dc.date.issued | 2025-09-29 | |
| dc.description.abstract | Hepatitis is a severe liver inflammation that can lead to chronic disease, liver failure, and death if untreated. It is caused by viral infections, autoimmune disorders, or excessive alcohol consumption, with viral hepatitis (e.g., Hepatitis B and Hepatitis C) being particularly fatal. Early diagnosis and accurate prognosis are crucial for effective treatment. To address this, we employ advanced computational techniques for hepatitis classification and risk assessment, collecting a dataset of 615 samples with 12 biochemical features (e.g., Albumin, Bilirubin, and Cholesterol) obtained from blood donors. We apply the Synthetic Minority Over-sampling Technique (SMOTE) to handle class imbalance and implement Deep Feature Synthesis (DFS) with Aggregation Primitives (AP), Transformation Primitives (TP), and a Hybrid DFS approach to generate three feature-enhanced datasets. Multiple machine learning (ML) models, including Extreme Gradient Boosting (XGB), Random Forest (RF), Gradient Boosting Decision Trees (GBDT), Categorical Boosting (CB), and Adaptive Boosting (AB), are trained with and without DFS. Performance is evaluated using accuracy, precision, recall, F1-score, and specificity, from Confusion Matrix (CM) analysis via 10-fold cross-validation. The GBDT model with Hybrid DFS achieves the highest accuracy of 99.49%, along with 99.56% precision, 99.81% recall, 99.69% F1-score, and 99.12% specificity. To enhance interpretability, we apply Explainable Artificial Intelligence (XAI) techniques, namely Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME), to analyze feature importance and model behaviour. The proposed Hybrid DFS-GBDT approach demonstrates high effectiveness and interpretability, offering a robust framework for hepatitis diagnosis and prognosis. | |
| dc.identifier.citation | Chowdhury, Safiul Haque, et al. "Hepatitis C detection from blood donor data using hybrid deep feature synthesis and interpretable machine learning." 2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM). IEEE, 2025. | |
| dc.identifier.issn | 979-8-3315-5543-6 | |
| dc.identifier.uri | http://dspace.uttarauniversity.edu.bd:4000/handle/123456789/1423 | |
| dc.language.iso | en_US | |
| dc.publisher | 2025 2nd International Conference on Next-Generation Computing, IoT and Machine Learning, NCIM 2025 | |
| dc.subject | Hepatitis C Detection | |
| dc.subject | Blood Donor Data Analysis | |
| dc.subject | Hybrid Deep Feature Synthesis | |
| dc.subject | Medical Data Classification | |
| dc.title | Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning | |
| dc.type | Article |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Hepatitis C Detection from Blood Donor Data Using Hybrid Deep Feature Synthesis and Interpretable Machine Learning 276.pdf
- Size:
- 98.63 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description:
