← Back to Portfolio

heart-disease-uci

Jupyter Notebook

Heart Disease Classification

Binary Classification with Deep Learning Β· UCI Heart Disease Dataset

Python TensorFlow Keras Jupyter scikit-learn Kaggle License


I would like to express my sincere gratitude to DoΓ§. Dr. Γ–ΔŸr. Üyesi Abdullatif KABAN for his guidance and teaching throughout this deep learning course. The theoretical foundations he provided made it possible to understand and apply the concepts explored in this project.


Overview

This project builds and evaluates a binary classification neural network to predict the presence of heart disease in patients. Using the Heart Disease UCI dataset, it systematically explores different architectural choices β€” activation functions, optimizers, learning rates, and batch sizes β€” to understand their real-world impact on model performance.

Academic context: Mid-term project for a Deep Learning course.


Repository Structure

heart-disease-uci/
β”œβ”€β”€ heart_disease_classification.ipynb   # Main notebook (Turkish)
β”œβ”€β”€ heart.csv                            # Dataset (UCI Heart Disease)
└── README.md

Dataset

PropertyValue
SourceKaggle – Heart Disease UCI
Samples1,025 patients
Features13 clinical attributes
TargetBinary β€” 1 = heart disease present, 0 = absent

Features include: age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting ECG results, max heart rate, exercise-induced angina, ST depression, slope of ST segment, number of major vessels, and thalassemia type.

Reference: Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64(5), 304–310.


Model Architecture

The neural network was designed to balance capacity and generalization for a 13-feature tabular dataset. The choice of 3 hidden layers is grounded in the bias-variance trade-off: too few layers leads to underfitting (high bias), while too many layers risks memorizing the training data (high variance). Three layers sit at a reasonable middle ground for this dataset size.

LayerNeuronsActivation
Input13β€”
Hidden 164ReLU
Hidden 232ReLU
Hidden 316ReLU
Output1Sigmoid
  • Loss: Binary Cross-Entropy
  • Optimizer: Adam
  • Regularization: Dropout after each hidden layer

Alternative architectures (1 hidden layer and 5 hidden layers) were also tested to confirm that 3 layers is a reasonable choice for this dataset.


Experiments Conducted

Activation Function Comparison

ActivationTest Accuracy
Sigmoid (hidden layers)~83%
Tanh~95%
ReLU~97%
Leaky ReLU~97%+

Sigmoid suffers from the vanishing gradient problem in hidden layers β€” gradients shrink layer by layer and the model struggles to learn. Tanh improves on this but still saturates at extreme values. ReLU avoids saturation entirely for positive values, which is why it performs best here. Leaky ReLU addresses the dying ReLU issue by keeping a small slope for negative inputs.

Optimizer Comparison

OptimizerTest Accuracy
Adam~98.5%
SGD~97.5%
RMSProp~97%
Adagrad~89.7%

Adam combines the momentum of SGD with the adaptive per-parameter learning rates of RMSProp, making it the most stable and fastest-converging choice here.

Learning Rate Analysis

Learning RateOutcome
0.1Model failed to learn (~51% accuracy)
0.01100% test accuracy (likely small-dataset variance)
0.001Most stable and generalizable result
0.0001Very slow convergence

Batch Size Analysis

Batch sizes of 16, 32, 64, and 128 were tested. Smaller batches lead to slower but more precise updates; larger batches are faster but may generalize less well.


Results and Analysis

Best-performing configuration:

SettingValue
ActivationReLU
OptimizerAdam
Learning Rate0.001
Loss FunctionBinary Cross-Entropy
Test Accuracy~97%

Overfitting Analysis

The ~97-98% accuracy figures might raise questions about overfitting. To investigate, training and validation loss curves were examined throughout training. In an overfit model, training loss keeps decreasing while validation loss starts rising, creating a visible gap. This pattern was not observed here β€” both curves decrease together and stay close, suggesting the model is generalizing rather than memorizing. The relatively balanced class distribution (~51%/49%) and the use of Dropout also contribute to this.

Precision and Recall

Both precision and recall came out high in the classification report. In a medical context, false negatives (predicting a sick patient as healthy) are more dangerous than false positives, so high recall is especially important. High precision means the model is not raising unnecessary alarms either, which matters for usability in a clinical setting.


Key Takeaways

  • ReLU decisively outperformed Sigmoid in hidden layers (97% vs. 83%), confirming the theoretical advantage around vanishing gradients.
  • Adam optimizer converged fastest and most reliably across all experiments.
  • A learning rate of 0.001 was the sweet spot β€” 0.1 caused divergence and 0.0001 was too slow.
  • Dropout was essential: without it, a 1,025-sample dataset is small enough to cause overfitting.
  • Seeing theory match experiment in practice β€” for example, the sigmoid vanishing gradient actually hurting accuracy β€” made the learning stick in a way that theory alone does not.

Getting Started

python3 -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install tensorflow scikit-learn pandas numpy matplotlib seaborn jupyter
jupyter notebook heart_disease_classification.ipynb

License

This project is open-sourced under the MIT License β€” feel free to use, adapt, and build upon it.


Made with care Β· Deep Learning Course Project