Hacker News new | ask | show | jobs
by monero-xmr 875 days ago
The correct language for querying data is, as always, SQL. No one cares about the implementation details.

“I have data and I know SQL. What is it about your database that makes retrieving it better?”

Any other paradigm is going to be a niche at best, likely outright fail.

2 comments

Spark is really failing, all right.

SQL lacks type safety, testability, and composability.

It’s crazy to think how old I am now. But give it 20 more years and you’ll come around.
Agree/disagree. I wish (and maybe this is "the answer") that you could take a basic SQL query and "invert it" into composable components.

A common thing I ended up doing for some "small data" hack projects was extremely liberal usage of SQLite: SELECT ... UNION ( SELECT ... ... GROUP BY ... ( UNION ... etc ) ) ... absolutely terrible SQL, but it got the job done to return the 100 or so records I was interested in.

It'd be great if I could write me some SQL then pop it out into: fn_group001, fn_join(g1, g2, cols(c1, c2)), ...etc...

...and then have composable sub-components of what the janky SQL-COBOL syntax supports, but in a group().chain().join(...) style.

I think I keep running across DataLog as something that's recommended, and of course ProLog has some similarities.

Nothing has been compelling enough to warrant jumping off of SQL, but I really do agree with the grandparent comment: SQL (aka: COBOL) is pretty clunky and non-composable in a way that complicates what you'd think would be straightforward for interactive, non-programming usage.

20 years ago SQL lacked type safety, testability, and composability. Today the same is true. I doubt it will be different 2 decades from now.

SQL is powerful. It is also very old and has very large warts.

I think one of the biggest missed opportunities in language design is the integration of powerful relational database and query models directly into a modern language. Not as a bolt-on or an ORM but as a first class part of the language in the same way as maps and arrays. Make a language that deals with data relationally and where relational queries are a core part of the language and relational query execution is baked into its runtime.

If persistence hooks were also baked in then you'd have something a little bit like stored procedures in databases but far more powerful and with a modern syntax. Couple this with a distributed database layer supporting either eventual consistency built on CRDTs or synchronization via raft/paxos and you'd have an amazing application platform.

It's always seemed dumb to me that data, which is in the very center of everything we do, feels like a bolted-on second class citizen from the perspective of pretty much all programming languages and runtime environments. "Oh, you want to work with your data? Well we didn't think about that..." Accessing the data requires weird incantations and hacks that feel like you're entering a 1970s time warp into a PDP-11 mainframe.

Instead the language and runtime environment should be built around the data. Put the data in the center like Copernicus did with the sun.

Why has nobody done this? Has anyone even tried?

Data access is of primary concern to all programming languages, using either memory or disk. All files on disk can be considered a form of database. Reading and writing files is standard in all languages.

Once you start getting fancier in your files, and the data grows large, you need special ways to read it. A Postgres database can be considered a single big file on disk. It is the Postgres server that is required to access the file in the most efficient way to store and randomly access enormous amounts of general data.

SQLite is interesting in that there is no server, it's just a special library that enables efficient random access of a single file, which can be thought of as a black box that only SQLite knows how to interpret.

Unless you mean, making something like SQL built directly into the language as a first class citizen. Mumps did something like this https://en.wikipedia.org/wiki/MUMPS

Like language integrated queries in c#?

https://learn.microsoft.com/en-us/dotnet/csharp/linq/

All data query languages eventually reduce themselves into SQL, or something equivalent to it.
Ok? Even if that were true — and I’m not entirely sure how you would even prove that — SQL still lacks type safety, testability, and composability.
SQL has type safety. It's primary purpose is to have typed schemas. I can't imagine why you would say this, other than some pedantic reason. SQL implementations like Postgres will happily throw errors when your types are off.

Testability - you use a general purpose language to execute SQL. Again, I don't know what you mean.

Composability - I suppose, but remember SQL is a language to retrieve data. I reuse fragments everywhere in a general purpose language.

So use a better language, and let the compiler optimize it??
The query planner optimizes it. Why would you want a compiler to optimize SQL? The nature of your data affects how it is optimized! The declarative statement to retrieve data must be interpreted based upon the nature of that data. You can't pre-optimize without knowing something about your data, in which case, you are basically storing some of the information outside of the database.
Doubtful, given my decades of experience and expected retirement age.

Working with 1000+-line SQL scripts written by other people is no fun. Why wouldn't you want to decompose that into legible, testable functions using an expressive language like Scala?

Being old doesn’t automatically make you more right. You don’t get wisdom as a birthday gift.
But if you did a lot of things you might know what does not work and why. Similar to science articles, nobody talks enough about "X technology does not work for Y domain", so a lot of people try to "reinvent the wheel" only to realize "X does not work for Y". Occasionally there is a surprise (because of some technology advancement) but being old definitely gives you more insight in what can go wrong.
On the other hand you often gain wisdom with experience, and often experience is proportional to age.
It does depend on experience, which everyone gets one way or the other.

I don't know about automatically, but definitely more likely.

At my current job I work with several people who have one to three years of experience repeated over the span of twenty. It might be less likely than some parts of HN would like it to be.
no, but you do get experience, as in, when last somebody “invented” this idea thirty years ago, it was a total crapshoot, so wonder what’s changed?

oh right, new language. that’ll definitely fix it. :eyeroll:

So you're still using COBOL, FORTRAN and BASIC, since new languages don't fix things?
I know I'm replying to a troll comment, but:

> “I have data and I know SQL. What is it about your database that makes retrieving it better?”

Because my data comes from a variety of unstructured, possibly dirty sources which need cleaning and transforming before they can be made sense of.

> Because my data comes from a variety of unstructured, possibly dirty sources which need cleaning and transforming before they can be made sense of.

Seattle data guy had a great end of year top 10 memes post recently and one of them went like this

> oh cool you’ve hired a data scientist. so you have a collection of reliable and easy to query data sources, right?

> …

> you do have a collection of reliable and easy to query data sources, right?

—-

Like, most of the time in businesses… if the data can’t be queried with SQL then it’s not ready to be used by the rest of the business. Whether that’s for dashboards, monitoring, downstream analytics or reporting. Data engineers do the dirty data cleaning. Data scientists do the actual science.

That’s what I took from the parent at least.

YMMV obviously depending on your domain. ML being a good example where things like end to end speech-to-text operates on wav files directly.

That's true. With dbt (=SQL+Jinja-Templating in an opionated framework) a large SQL codebase actually becomes maintainable. If in any way possible I'll usually load my raw data in an OLAP table (Snowflake, BigQuery) and do all the transforms there. At least for JSON data that works really well. Combine it with dbt tests and you're safe.

See https://www.getdbt.com/

It's amazing that you think I'm trolling! The #1 way to get more customers of something as extreme as a new database is to use the tool that potential customers already know and have integrated into their systems. That's SQL. The same logic is for any new paradigm.

Ignore that statement, and fight the uphill battle.