heart-disease-uci
Heart Disease Classification
Binary Classification with Deep Learning Β· UCI Heart Disease Dataset
I would like to express my sincere gratitude to DoΓ§. Dr. ΓΔr. Γyesi Abdullatif KABAN for his guidance and teaching throughout this deep learning course. The theoretical foundations he provided made it possible to understand and apply the concepts explored in this project.
Overview
This project builds and evaluates a binary classification neural network to predict the presence of heart disease in patients. Using the Heart Disease UCI dataset, it systematically explores different architectural choices β activation functions, optimizers, learning rates, and batch sizes β to understand their real-world impact on model performance.
Academic context: Mid-term project for a Deep Learning course.
Repository Structure
heart-disease-uci/
βββ heart_disease_classification.ipynb # Main notebook (Turkish)
βββ heart.csv # Dataset (UCI Heart Disease)
βββ README.md
Dataset
| Property | Value |
|---|---|
| Source | Kaggle β Heart Disease UCI |
| Samples | 1,025 patients |
| Features | 13 clinical attributes |
| Target | Binary β 1 = heart disease present, 0 = absent |
Features include: age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting ECG results, max heart rate, exercise-induced angina, ST depression, slope of ST segment, number of major vessels, and thalassemia type.
Reference: Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64(5), 304β310.
Model Architecture
The neural network was designed to balance capacity and generalization for a 13-feature tabular dataset. The choice of 3 hidden layers is grounded in the bias-variance trade-off: too few layers leads to underfitting (high bias), while too many layers risks memorizing the training data (high variance). Three layers sit at a reasonable middle ground for this dataset size.
| Layer | Neurons | Activation |
|---|---|---|
| Input | 13 | β |
| Hidden 1 | 64 | ReLU |
| Hidden 2 | 32 | ReLU |
| Hidden 3 | 16 | ReLU |
| Output | 1 | Sigmoid |
- Loss: Binary Cross-Entropy
- Optimizer: Adam
- Regularization: Dropout after each hidden layer
Alternative architectures (1 hidden layer and 5 hidden layers) were also tested to confirm that 3 layers is a reasonable choice for this dataset.
Experiments Conducted
Activation Function Comparison
| Activation | Test Accuracy |
|---|---|
| Sigmoid (hidden layers) | ~83% |
| Tanh | ~95% |
| ReLU | ~97% |
| Leaky ReLU | ~97%+ |
Sigmoid suffers from the vanishing gradient problem in hidden layers β gradients shrink layer by layer and the model struggles to learn. Tanh improves on this but still saturates at extreme values. ReLU avoids saturation entirely for positive values, which is why it performs best here. Leaky ReLU addresses the dying ReLU issue by keeping a small slope for negative inputs.
Optimizer Comparison
| Optimizer | Test Accuracy |
|---|---|
| Adam | ~98.5% |
| SGD | ~97.5% |
| RMSProp | ~97% |
| Adagrad | ~89.7% |
Adam combines the momentum of SGD with the adaptive per-parameter learning rates of RMSProp, making it the most stable and fastest-converging choice here.
Learning Rate Analysis
| Learning Rate | Outcome |
|---|---|
| 0.1 | Model failed to learn (~51% accuracy) |
| 0.01 | 100% test accuracy (likely small-dataset variance) |
| 0.001 | Most stable and generalizable result |
| 0.0001 | Very slow convergence |
Batch Size Analysis
Batch sizes of 16, 32, 64, and 128 were tested. Smaller batches lead to slower but more precise updates; larger batches are faster but may generalize less well.
Results and Analysis
Best-performing configuration:
| Setting | Value |
|---|---|
| Activation | ReLU |
| Optimizer | Adam |
| Learning Rate | 0.001 |
| Loss Function | Binary Cross-Entropy |
| Test Accuracy | ~97% |
Overfitting Analysis
The ~97-98% accuracy figures might raise questions about overfitting. To investigate, training and validation loss curves were examined throughout training. In an overfit model, training loss keeps decreasing while validation loss starts rising, creating a visible gap. This pattern was not observed here β both curves decrease together and stay close, suggesting the model is generalizing rather than memorizing. The relatively balanced class distribution (~51%/49%) and the use of Dropout also contribute to this.
Precision and Recall
Both precision and recall came out high in the classification report. In a medical context, false negatives (predicting a sick patient as healthy) are more dangerous than false positives, so high recall is especially important. High precision means the model is not raising unnecessary alarms either, which matters for usability in a clinical setting.
Key Takeaways
- ReLU decisively outperformed Sigmoid in hidden layers (97% vs. 83%), confirming the theoretical advantage around vanishing gradients.
- Adam optimizer converged fastest and most reliably across all experiments.
- A learning rate of 0.001 was the sweet spot β 0.1 caused divergence and 0.0001 was too slow.
- Dropout was essential: without it, a 1,025-sample dataset is small enough to cause overfitting.
- Seeing theory match experiment in practice β for example, the sigmoid vanishing gradient actually hurting accuracy β made the learning stick in a way that theory alone does not.
Getting Started
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install tensorflow scikit-learn pandas numpy matplotlib seaborn jupyter
jupyter notebook heart_disease_classification.ipynb
License
This project is open-sourced under the MIT License β feel free to use, adapt, and build upon it.
Made with care Β· Deep Learning Course Project