Soham Joshi
- Dec 11, 2022
- 2 min read

Reflection on Heart Attack Risk Predictor Project

The project was an enriching experience, as it provides the basic foundation into Machine Learning algorithms and various prediction models.

This is the most basic ML project, which reads data from csv file using panda library, plots the heat maps and various histograms. The scikit learning library creates model in a standard scalar format from pandas object.

Thereafter, this model is subjected to various ML prediction models such as -

Logistic Regression
Decision Tree
Random Forest
K Nearest Neighbour
SVM

The best performing Model is known to be XGBoost, but I deliberately did not use it and use the other models listed above, for measuring their prediction accuracy or can be serialized using pickle object. For deployment REST-API based Flask app is used to provide users flexibility to upload the patient data.

The added values are plotted and checked in which area new patient is coming under.

If calculated out put is 0 then it is No risk of heart attack.

If calculated output is >0 then there is a moderate to high risk of heart attack.

Project Phase 1

During the first Phase, I researched about health care data necessary to be collected.

1.a

Data Analysis :

Kaggle DataSet: was obtained in .csv format that enlisted following information

Age : Age of the patient

Sex : Sex of the patient

exang: exercise induced angina (1 = yes; 0 = no)

ca: number of major vessels (0-3)

cp : Chest Pain type chest pain type

Value 0: typical angina

Value 1: atypical angina

Value 2: non-anginal pain

Value 3: asymptomatic

trtbps : resting blood pressure (in mm Hg)

chol : cholestoral in mg/dl fetched via BMI sensor

fbs : (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)

rest_ecg : resting electrocardiographic results

Value 0: normal

Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)

Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

thalach : maximum heart rate achieved

target : 0= less chance of heart attack 1= more chance of heart attack

1.b

Feature Engineering

In this stage, the necessary libraries are downloaded and installed in the conda virtual

environment. The jupyter notebook was used as local editor. pandas to read .csv data file

and seaborn wrappers for mathplotlib to plot the histograms. Scikit Learning library for

various ML prediction models was used. Numpy array was used to convert the data read

in pandas library in array format to be fed to various prediction models.

Project Phase 2

When the numpy array objects are fed to the regression based prediction models, it requires to train and fit the model into X_train and y_train. The y_pred object holds the prediction train which indexed as per the serial input value for each row from the .csv data file.

y_pred variable needs to be pickled or serialized

Final Phase

Accuracy of various models were compared and Logistic regression is found to be the best performing. The deployment models used the prediction done by Logistic regression library and then saved in to pickle format to be used by Flask web app.