Hacker News new | ask | show | jobs
Ydata-Profiling on Neurocomputing: Top% on Artificial Intelligence Research (sciencedirect.com)
3 points by wisebumblebee 1052 days ago
3 comments

There's been a growing amount of research on the topic of data-centric ai, now with software being dedicated to it. This one is super fresh in Neurocomputing, which is a Q1 publication.

In short, ydata-profiling is a Python tool that generates a detailed report about the data, including missing values, distribution of data, correlations, and data quality alerts, etc.

I work specifically in data quality (imbalanced and missing data) so I've been following the project for a while, but I'm curious whether you make a case of really exploring your data characteristics beforehand and how serious do you consider these alerts.

Do you think this shift towards a "data-centric" approach in AI is really set to be the "next big paradigm" in AI? It's cool to see it valued, but idk...

This comes handy because most organizations have no idea if their data has quality or not...
Thanks for sharing the article!

How do you see Data-Centric AI now evolving with LLMs around?

I guess LLMs have a huge potential, but they're super dependent on high-quality data, so in that perspective, its imperative to guarantee best practices.

Especially taking into account the new regulations and anti-bias concerns.

It's a bit scary to think of the widespread of LLMs just with random, untreated data :/