Drug Classifier

The goal of this project is to predict the type of drug a patient should be prescribed based on various medical attributes using a Decision Tree Classifier. The model is trained and evaluated on a dataset containing patient information such as age, sex, blood pressure levels, cholesterol levels, and sodium-to-potassium ratio.

Project Sections

Dataset

Source: drug200.csv, https://www.kaggle.com/code/caesarmario/drug-classification-w-various-ml-models

Age: Patient age (numerical)

Sex: Gender of the patient (F, M)

BP: Blood Pressure (LOW, NORMAL, HIGH)

Cholesterol: Cholesterol level (NORMAL, HIGH)

Na_to_K: Sodium to Potassium ratio (numerical)

Target:Drug prescribed (drugA, drugB, drugC, drugX, drugY)

Data Preprocessing

Categorical variables (Sex, BP, Cholesterol, Drug) were encoded into numerical values using replace(). Feature set X and target variable y were defined for training the model.

Modeling Approach

A Decision Tree Classifier from sklearn was used. The primary model used entropy as the criterion for splitting and was limited to a max_depth=4 to prevent overfitting. 10-fold Cross-Validation was performed to evaluate model performance.

Model Evaluation

Cross-Validation Scores were printed for each fold. Mean accuracy and standard deviation were calculated to assess the model's consistency. The decision tree was visualized using matplotlib.

Hyperparameter Tuning

Criterion Comparison: Evaluated model performance using different split criteria: gini, entropy, and log_loss.

Max Depth Variation: ested different values of max_depth: None, 3, 5. Helped in understanding the trade-off between underfitting and overfitting.

Decision Tree Visualization

A graphical representation of the trained decision tree was generated to show how features are used to split data and aid in model interpretability.

Skills Demonstrated

Data Cleaning & Encoding - Feature Selection - Supervised Learning (Classification) - Model Evaluation using Cross-Validation - Decision Tree Tuning and Visualization - Python (Pandas, NumPy, Matplotlib, Scikit-learn)

Conclusion

This project demonstrates a solid understanding of supervised machine learning, decision trees, and model evaluation. The decision tree classifier performs well on the dataset and can be a useful model for making drug prescription predictions based on patient characteristics.

Technologies Used

Programming Language: Python

Libraries: Pandas, NumPy, Matplotlib, Scikit-learn

Machine Learning Model: Decision Tree classifier

Repository Link

Explore the code and Data in the GitHub repository: GitHub - Drug Classification