A few years ago, I began to grow old of working with dataset pre-processing scripts/"libraries" for machine learning and began to create my own sort-of "pipelines." Last year, I stumbled upon `sklearn.compose`, a relatively newer module within the scikit-learn ecosystem.
I have had a lot of success with this module since then, and wanted to share a tutorial I put together which touches on the idea of managing your machine learning dataset creation steps completely via a configuration.
I have had a lot of success with this module since then, and wanted to share a tutorial I put together which touches on the idea of managing your machine learning dataset creation steps completely via a configuration.