Hacker News new | ask | show | jobs
by nicholast 2001 days ago
The brittleness of mainstream ML to out of distribution data is one of the most fundamental channels for error. There are very few domains where a static environment can be depended on over the long term. If machine learning is to be approached as an engineering discipline there will need to be practices established for validating models throughout their life cycle. One potential resource that can support this type of systematic evaluation is the Automunge open source library for assembling data pipelines, which has automatic support for evaluating data property drift in feature sets serving as basis for a model. (disclosure I am founder of Automunge)