Hacker News new | ask | show | jobs
by nendroid 2075 days ago
>The lack of really natural integration into modern day programming languages (and data model mismatches) is a much bigger issue imo

The SQL data types consist of a few primitives like ints strings and chars placed in higher order data types that are tables. These types are easily isomorphic to data structures and primitive types in traditional programming languages. There is zero mismatch here, in fact the data structures in application programming languages tend to be much richer than SQL.

See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems. If the programming language enables the creation of sum types like rust or haskell than there can never be a mismatch as these languages can produce virtually any type.

>SQL queries are generally really short. Rarely more than a few lines (might be different for people doing ad-hoc analysis instead of making a db backed application). I don't need modularity for a program that is only a few lines long.

For complex applications this is not true. In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app. The web app is suppose to serve as something that routes IO the bulk of your code/logic and heavy lifting should be shifted to the database. Simple apps can avoid this but in general complex apps cannot.

Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

>I don't really feel like composability/modularity is all that important in SQL.

You're not completely wrong. The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder. For C++ optimization is built into the syntax itself, you make choices while coding to optimize things. For SQL you open up the black box and look into the query planner to see what your High level code is compiling too. SQL is a really bad interface for doing optimizations but that's a topic for another day. The topic of this post is modularity and he's not wrong... SQL is not a composable language and there's no performance loss in making it more composeable.

1 comments

> in application programming languages tend to be much richer than SQL.

Hence why i say why there is a data model mismatch.

E.g. i wouldn't say that assembly and haskell have a compatible data model just because assembly is a sequence of bytes, and haskell is a superset of that.

Not that a data model is solely about the types involved.

> See orms for reference. ORMS have their own set of problems but data mismatches are not part of those problems.

ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

> In general the classic model for web development is to place as much logic as possible into the SQL query and as little logic as possible into your heavy Python web app.

When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

> Case in point do you put your where clause in the application program than download the entire table? Or do you throw as much logic as possible into the query so the database outputs a result as close as possible to what you need? The later statement is the right answer.

This is a strawman.

> The bigger issue is SQL optimization. Because Databases form the backbone of computation for a standard web app SQL provides a layer of indirection that makes optimization harder

I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.

>Hence why i say why there is a data model mismatch.

There is no data mismatch if the data model is richer, than it can cover it. Unless you're saying SQL is not rich enough to cover the PL.

Well you can implement a database that does that, it just wouldn't be table based data storage anymore. Postgresql supports JSON and enums if you want it but the minute you use these types things become less predictable in terms of performance.

>ORMs are notorious for being a leaky abstraction. Largely because the object and relational data model dont entirely match.

Wrong ORMs are notoriously leaky because the query translation is extra indirection. You need to compile to SQL and the SQL needs to compile to a query plan. This is why ORMS are bad.

As for the data types, Objects easily mimic types in SQL. You basically rarely ever encounter problems with incompatible type systems.

>When was the last time you wrote a 500 line sql query? A thousand line? My point is that sql queries are short enough, that further abstraction is not really missed. That doesn't mean you should put 0 logic in your query.

When did you last write a 500 line function? When did you last write a 100 line function? You shouldn't be doing that if you are. Case in point, the total lines of SQL written in any source code usually well exceed over 500 and that is a case for composability. T

This is no different than regular source code. If you're writing 500 line functions in your source code than you're not even taking advantage of modularity in programming languages outside of SQL so of course for you it doesn't matter. Because composition doesn't matter for you period.

>This is a strawman.

No it's a statement written in the english language. A strawman is something that doesn't EXIST, because men are made out of flesh not straw.

It's sort of similar to the non-existence of what was suppose to be written in response to my statement: an actual counterpoint or an argument. But then how can you write such a thing if it doesn't exist?

>I disagree. At scale you have data that has mixed cardinality. The indirection allows the db to chose the best algorithm given the size of underlying data at runtime. Sometimes that doesn't work properly, but the vast majority of time it is a significant benefit. Its sort of like how sometimes compilers dont work properly and you need to hand optimize, but in practise that is rare and you wouldn't throw out the compiler because the other 95% of time its better and lower effort than if you had to always do it by hand.

Sure but when it comes time to hand optimize SQL is extremely bad and you need to hand optimize SQL all the time. Not necessarily most of the time, but enough times that it's a real problem. That is the point and that is what I said. It's basically the biggest headache with SQL. If you've ever done analytics you'd know this is a huge problem.

It's better to have a language that allows explicit decorators that allow the programmer to choose optimization procedures when needed. Instead in SQL often optimizations come in the form of hacks. Case in point: SELECT * FROM A is worse then SELECT A.column1 FROM A. The later optimization comes in the form of a language hack and not explicit syntax.

Not saying the alternative mentioned in the article would solve this problem nor am I saying a solution exists... but if there's any big problem with SQL today it's hand optimization for sure.