Data Science: Parkinson’s Disease with XGBoost
Parkinson’s disease is a progressive disorder of the central nervous system affecting movement and inducing tremors and stiffness. It has 5 stages to it and affects more than 1 million individuals every year in India. This is chronic and has no cure yet. It is a neurodegenerative disorder affecting dopamine-producing neurons in the brain.
What is XGBoost?
XGBoost is a new Machine Learning algorithm designed with speed and performance in mind. XGBoost stands for eXtreme Gradient Boosting and is based on decision trees. In this project, we will import the XGBClassifier from the xgboost library; this is an implementation of the scikit-learn API for XGBoost classification.
Lets code it:
Importing the libraires that are needed in the next steps
Here we are using the XGBClassifier for building the model.
Loading the dataset through the pandas library
Getting the top 5 through the head command.
Getting the bottom 5 through the tail command
Getting the more information which is related to the dataset.
Data.descibe() wil give you more information about the dataset.
We need to divide the dataset into the Features and the labels for that in the features we are taking all columns except the status. and in the labels, we are taking only the status column.
MinMaxScaler
is a data preprocessing technique commonly used in machine learning to scale and normalize the features of a dataset. Its purpose is to transform the data within a specific range, typically between 0 and 1, but it can also be customized to scale the data to any desired range.
We are taking the dataset and dividing it into the x and y we are taking the fit_transform and y as the only labels.
Now, spliting the data into X_train, x_test, y_train and y_test the train_test_split with taking the x values, y values, test_size and random_state.
Now, finally, we are taking the XGBClassifier for building the model then we are fiting the model with the help of X_train and y_train.
We need to predict the values for the model so, we are taking the model.predict for the datatset which is x_test and printing the accuracy for it.
After taking the dataset with the new values we are taking the accuracy score for that we got teh 94.87%.
Thanks for reading,
Mohammed Muqafamuddin.