Hacker News new | ask | show | jobs
Ask HN: Data Catalog with SQL-To-Text. Will it help business users?
1 points by garrrikkotua 1661 days ago
Hi there, my name is Igor.

I am testing different ideas in data governance space.

Sometimes business users have problems with data trust - a dashboard looks strange and they don't know whether it's a technical problem or a business one. They want to understand, how particular metric is calculated. Usually, these metrics are calculated via SQL, so I think SQL-To-Text might be usefull here to automatically generate descriptions and explanations for dashboards, reports, etc.

Do you think it's a big and important problem which is worth solving? Will natural language processing (SQL-To-Text) really help here?

Any thought are welcome :)

1 comments

Explainability is always helpful :)

Simple SQL statements are pretty well self-documenting.

Where things get hard to explain is in:

  * complicated JOINs
  * queries that select from views, derived tables, CTEs, etc
  * poorly named objects (tables, columns)
  * dynamically constructed queries (e.g. really ugly WHERE clauses generated in a web app)
  * queries generated by ORMs
Those are just off the top of my head. I'm sure there are several other sources of pain.

The other big question is - why do you want to devote your time and effort to it? I don't want to discourage you. But I don't see any real path to a marketable product here, particularly not as a standalone product.

Thanks for you coomment!

> The other big question is - why do you want to devote your time and effort to it? I don't want to discourage you. But I don't see any real path to a marketable product here, particularly not as a standalone product.

This is a really good question, that's why I am collecting feedback on the idea :)

I've been thinking about building data catalog product, but it seems there are so many of them out where. So I thought I shoud somehow make if diffirent (and better), that's why I came up with SQL-To-Text idea.

Probably, you are right, I also don't see it as a standalone product. Perhaps, it can be a feature in data catalog / lineage product.

Sorry. My knowledge of the data governance space is almost nil, and my knowledge of data catalog products isn't much better.

I guess my larger point was that explaining SQL statements replicates much of the work that query optimizers do. And even they don't always get it right. It's a hard problem.

If you want a hard problem with potential for a marketable product, look into data cleaning.

Ideally, I need some very narrow idea and use case to start with, because building a platform with many features is defenitely not a right thing for the startup.

May be you have some suggestions where should I dig in data governance space?

Would really appreciate it :)