Hacker News new | ask | show | jobs
by gidim 1207 days ago
Excited to see more people building in this space. From what we've seen with customers it's critical to be able to compare what you're seeing in production to what you trained on (rather than historical period). That's almost the textbook definition of drift. Do you have a sense on how to approach that?

At Comet.com (disclaimer: i'm the CEO/Co-founder) we provide experiment tracking and artifacts management so we have the training distributions for comparison. I'm always curious how it looks like for a monitoring only solution

2 comments

Completely agree! We have also seen our users more concerned about comparing the real-world distribution against the training data as compared to previous month's data (we found latter is more useful for PMs and setting alerts).

We currently allow users to specify their training data in the config which is used to initialise the UpTrain framework (in form of json file but are planning to support pytorch/tf data-loaders). In the background, the tool does all the binning and clustering to convert these continuous variables into discrete buckets to later calculate divergence, which is then used to quantify drift.

Thanks for the very relevant comment :) We provide users the option to attach their training data from csv/json (working to support loading from cloud storage provider or data lakes). We have illustrated this in some of our examples, such as the human orientation classification: https://github.com/uptrain-ai/uptrain/blob/main/examples/hum...