Hacker News new | ask | show | jobs
by nprateem 883 days ago
I know I'm replying to a troll comment, but:

> “I have data and I know SQL. What is it about your database that makes retrieving it better?”

Because my data comes from a variety of unstructured, possibly dirty sources which need cleaning and transforming before they can be made sense of.

2 comments

> Because my data comes from a variety of unstructured, possibly dirty sources which need cleaning and transforming before they can be made sense of.

Seattle data guy had a great end of year top 10 memes post recently and one of them went like this

> oh cool you’ve hired a data scientist. so you have a collection of reliable and easy to query data sources, right?

> …

> you do have a collection of reliable and easy to query data sources, right?

—-

Like, most of the time in businesses… if the data can’t be queried with SQL then it’s not ready to be used by the rest of the business. Whether that’s for dashboards, monitoring, downstream analytics or reporting. Data engineers do the dirty data cleaning. Data scientists do the actual science.

That’s what I took from the parent at least.

YMMV obviously depending on your domain. ML being a good example where things like end to end speech-to-text operates on wav files directly.

That's true. With dbt (=SQL+Jinja-Templating in an opionated framework) a large SQL codebase actually becomes maintainable. If in any way possible I'll usually load my raw data in an OLAP table (Snowflake, BigQuery) and do all the transforms there. At least for JSON data that works really well. Combine it with dbt tests and you're safe.

See https://www.getdbt.com/

It's amazing that you think I'm trolling! The #1 way to get more customers of something as extreme as a new database is to use the tool that potential customers already know and have integrated into their systems. That's SQL. The same logic is for any new paradigm.

Ignore that statement, and fight the uphill battle.