Drug Classifier
The goal of this project is to predict the type of drug a patient should be prescribed based on various medical attributes using a Decision Tree Classifier. The model is trained and evaluated on a dataset containing patient information such as age, sex, blood pressure levels, cholesterol levels, and sodium-to-potassium ratio.
Project Sections
Dataset
Source: drug200.csv, https://www.kaggle.com/code/caesarmario/drug-classification-w-various-ml-models
Age: Patient age (numerical)
Sex: Gender of the patient (F, M)
BP: Blood Pressure (LOW, NORMAL, HIGH)
Cholesterol: Cholesterol level (NORMAL, HIGH)
Na_to_K: Sodium to Potassium ratio (numerical)
Target:Drug prescribed (drugA, drugB, drugC, drugX, drugY)
Data Preprocessing
Categorical variables (Sex, BP, Cholesterol, Drug) were encoded into numerical values using replace(). Feature set X and target variable y were defined for training the model.
Modeling Approach
A Decision Tree Classifier from sklearn was used. The primary model used entropy as the criterion for splitting and was limited to a max_depth=4 to prevent overfitting. 10-fold Cross-Validation was performed to evaluate model performance.
Model Evaluation
Cross-Validation Scores were printed for each fold. Mean accuracy and standard deviation were calculated to assess the model's consistency. The decision tree was visualized using matplotlib.
Hyperparameter Tuning
Criterion Comparison: Evaluated model performance using different split criteria: gini, entropy, and log_loss.
Max Depth Variation: ested different values of max_depth: None, 3, 5. Helped in understanding the trade-off between underfitting and overfitting.
Decision Tree Visualization
A graphical representation of the trained decision tree was generated to show how features are used to split data and aid in model interpretability.
Skills Demonstrated
Data Cleaning & Encoding - Feature Selection - Supervised Learning (Classification) - Model Evaluation using Cross-Validation - Decision Tree Tuning and Visualization - Python (Pandas, NumPy, Matplotlib, Scikit-learn)
Conclusion
This project demonstrates a solid understanding of supervised machine learning, decision trees, and model evaluation. The decision tree classifier performs well on the dataset and can be a useful model for making drug prescription predictions based on patient characteristics.
Technologies Used
Programming Language: Python
Libraries: Pandas, NumPy, Matplotlib, Scikit-learn
Machine Learning Model: Decision Tree classifier
Repository Link
Explore the code and Data in the GitHub repository: GitHub - Drug Classification