Hacker News new | ask | show | jobs
by citilife 1712 days ago
It would be possible to build a similar system via a library my team has built: https://github.com/capitalone/dataprofiler

Effectively, you can monitor changes between profiles:

data1 = dp.Data("file_a.csv") # Load a CSV file

profile1 = dp.Profiler(data1) # Generate a profile

data2 = dp.Data("file_b.csv") # Load another CSV file

profile2 = dp.Profiler(data2) # Generate another profile

diff_report = profile1.diff(profile2)

print(json.dumps(diff_report, indent=4))

The system we have generates reports, it might be worth adding it OP.

1 comments

What does this have to do with model monitoring?
You can pass the output of the model to the profiling system to monitor if things are drifting.

It's also possible to monitor the input data and link back.

There's quite a few ways to do this, but effectively you can monitor drift by identifying which inputs have the greatest impact in accuracy. Then tying that back to predict the drift over time.