Posts

Model Tuning Using Pipeline and GridSearchCV

December 23, 2021

After selecting my initial model and obtaining baseline results, I often look to improve my model’s time performance and targeted metrics by tuning my model’s features and parameters. This can become a tedious task of finding the optimal number of features and parameter sets that will provide the best results on the training data. For example, the Random Forest estimator allows me to select parameters such as the number of trees, maximum tree depth, and minimum samples for a leaf node. Luckily, scikit-learn’s

Methods to Preprocess Your Data

October 10, 2021

Before building my model, I explore and preprocess my data to ensure that the information I input into my model is valid. This step is just as important as building the model itself and is often overlooked. I will be discussing a couple of tools that I use to preprocess data.