Hacker News new | ask | show | jobs
by citilife 1797 days ago
Telm.ai (YC S21) - Real-time data quality monitoring

Looks interesting! I worked on https://github.com/capitalone/DataProfiler

We are looking to monitor correlation changes over time, see if sensitive data gets entered, track schema changes, etc and see the impact of down stream modeling, etc

I'm curious how heavy the input is? because usually these systems take a lot of effort to setup. Any idea?

1 comments

Thanks for your feedback and the link, it's indeed a very nice open source profiler. The complexity of initial analysis of the data in search for anomalies was one of the main drivers for us. Our approach is based on providing interactive experience through which you can see the impact of various statistical distributions, ML suggestions, narrow down the important criteria and explore actual data associated with it. All this helps in building much more accurate models of data correctness to be applied for the new data. And do it much less time. However as of now we don't do data classification, it's one of the future topics of interest