|
|
|
|
|
by ktrnka
1387 days ago
|
|
I once froze a subset of the training data and used it in the test suite for the training pipeline. It was pretty handy as a quick test while refactoring. I don't think I got it integrated into our Jenkins pipeline so it wasn't used by the rest of the team and eventually got outdated as the training data changed. We didn't do the "at least as good" thing though. In prior jobs I'd seen too many legitimate situations in which a metric declines even though there's no regression, such as bug fixes in metrics or occasional updates to the test data. Instead we committed the model evaluation to git and had to review and approve model updates. I wish we'd done more testing for that pipeline, particularly the parts that fetched and preprocessed data. I think we had a couple bugs there over the years, or partial missing data. |
|
> I'd seen too many legitimate situations in which a metric declines even though there's no regression, such as bug fixes in metrics or occasional updates to the test data.
I guess you could keep both the old and new versions of the test data or metrics in parallel for a few cycles of model update to get a feel for how the old/new settings differ