Hacker News new | ask | show | jobs
by Aqueous 4806 days ago
Why not just write the SQL?
6 comments

When ORMs first started becoming popular during the 2000s, especially within the Java world, their proponents drummed up a lot of animosity toward SQL.

While a lot of us who had started working with SQL in the 1980s, if not earlier, were perfectly fine with using it, many younger developers were scared away from it by these claims.

So we've had a generation of software developers who were essentially raised to hate SQL, and to embrace ORMs, even after it became clear that ORMs do come with some pretty serious trade-offs, and do not necessarily increase productivity.

Not having a solid grasp of SQL, a lot of these developers just don't realize what they're missing out on. I've seen this first-hand many times before. These developers will spent hours upon hours trying to get their ORM to perform a moderately complex query that could be easily written by hand within a few minutes, including any code necessary to perform the query and to retrieve the result. The time and effort expended on these sorts of queries will very quickly negate any time and effort the ORM may have saved for simpler queries. And these moderately-complex or complex queries always arise in real-world software.

I think that education is the only way to really solve this problem, but a lot of developers are quite set against this. Learning SQL isn't that much of an investment, but the returns it offers can be huge.

I came from that same time, and I think you are partially right. I think a self-perpetuating cycle has occurred where people know less about sql, so they use it less, so they know less about it.

But I also think that people don't like the boilerplate that comes with direct sql access. I think they don't like the impedance that comes in reading and understanding code. And I think they want to hand off annoying, but critical, things like caching to a lower level they don't have to think about.

SQL is an important tool, and any developer, especially one who is using a framework like Django or Rails, would be wise to learn it, but ORMs still have value and it isn't all about "I don't know sql".

I disagree. There's a lot of boilerplate ontop of raw SQL that can and should be abstracted away. At some point you'll have to parse out result and build an object graph anyway, it would be nice if it was done for you already. You can also plug-in things like a caching engine easily and transparently.

Most ORM systems will give you options. My experience was with Hibernate, which had let you do Object queries, Criteria queries, HQL queries, and finally raw SQL. You hardly ever needed to go down to raw SQL. It's nice not to worry about the particulars of the underlying SQL engine, and certainly you want serialization to be handled for you.

The OP is right though, you can't treat ORM framework as a total blackbox. You need to be aware of what it's doing else you can really get yourself in trouble.

hibernate lets you do object queries, criteria queries, hql queries, and finally raw SQL

So you have to learn

   Object Queries
   Criteria queries
   HQL Queries
   Raw SQL
   HIBERNATE
And this has made your life easier has it? Hibernate is massively complicated, and you're right, it doesn't insulate us from the database, Not even slightly. So the amount of shit I now how to know has quintupled, just to persist an object!

But hey, BOILERPLATE, right?

>And this has made your life easier has it?

Yes it has. Belive me, it has. And it isn't nearly as bad as you make it out, certainly better than the alternative. If you have relatively simple relational data, and query requirements, you can get away with just Object and Criteria queries. Your code will thank you. For more complex queries, HQL will get you down almost to the bare metal. Why do that in lieu of raw SQL? I mentioned several reasons. One of which, is that the ORM layer does abstract the boilerplate of query to-and-fro serialization. More than that, it enforces constraints. Your DBMS doesn't give a shit whether ages should be in some valid range, or have some sort of valid format, or whatnot. Those constraints are in your object model, which is then automagically transferred to your SQL commands. Another reason is that you can now substitute SQL backends, trivially. Going from MySQL to Postgres is a one-liner. Another reason, caching is completely transparent. You can now plug-in any kind of caching engine and strategy with a one-line config change, and it's all completely transparent to your application. Another reason, your framework (JEE or Spring) probably has hooks to your ORM, which makes the integration completely seamless.

The thing is, if you didn't go with an ORM framework, you'd probably roll your own abstraction layer to take care of some (all?) of the above mentioned use-cases. You don't want serialization code, or caching code, or constraint-enforcing code littering your business logic. So this abstraction is good. There may be reasons to forgo an ORM framework, but I sincerely believe the vast majority of use cases will benefit from it.

If you insist, because I really want the months of debugging work we've spent on hibernate to pay dividends, I really really want the technical debt we've accumulated capitulating to the abstractions limitations in our design to be paid, and I really really really want to believe that we haven't wasted our time on an silver bullet that fits only trivial cases. I want to believe that serialization, caching, and constraint enforcement are far bigger nightmares than the one I'm going through.

I just don't think that will happen.

"At some point you'll have to parse out result and build an object graph anyway"

This type of thinking has to stop! Sure, some times it may be necessary to extract data from SQL and turn it into some type of object. However, most of the time it's enough just to get the data and work with the data directly.

Be serious. I made this argument in another thread: I don't believe for a second that it's good practice to just work with SQL directly in your business logic. You don't want to litter your code with serialization logic. You don't want litter your code with constraint checking everytime you make a query. Even if you don't use an official ORM framework, you will write an abstraction layer that will duplicate some of the ORM functionality.

I worked with redis extensively, and i got to the point where it was too dangerous to simply assume that none of the other guys on the team (or me) wouldn't put some garbage data in a field because from redis' perspective, every key looks the same and every value is as good as the next. We rolled our own abstraction layer, in which keys and values were wrapped in domain specific objects.

Programming languages, and databases are too general to be useful. If you don't 'constrain' them to your domain, you're going to get destroyed once your product or team scales to a certain size.

Not converting data into object graphs does not necessitate working with SQL in your domain model. I'm currently working on an application right now which favors simple data structures (hashes, arrays) over custom entities. It favors a business layer that operates on those data structures over composing dozens of "Model" based classes. That does not mean that persistence logic is littered within my domain.
* grabs OO sword and shield and prepares for battle * .
Ha! I have no problem with OO based programming. I do have a problem with OO based solutions for every problem. Sometimes it's worth pursuing alternative solutions, whether procedural or functional or a combination of what makes sense.

Not too mention, what many believe as an OO solution is often times far removed from anything remotely OO.

I don't mind writing queries, but I hate dealing with raw results. For complex queries I write SQL but have Django's ORM map relational data to objects for me.
The only problem I see, is we can't rewrite SQL.

I.e., in pseudocode:

    query = SQL("SELECT * FROM entities WHERE owner = ?", owner=me)
    ...
    if some_condition:
        query = query + SQL.WHERE("OR public = TRUE")
    ...
    if other_condition:
        query = query + SQL("LEFT JOIN things AS t"
                            " ON t.entity_id = entities.id") \
                      + SQL.WHERE("things.value > 0")
    ...
    my_nice_list_of_results = run(query + SQL("LIMIT ?", count))
This should be technically possible, but I haven't seen any library that does it.
SQLAlchemy lets you do precisely that. Give it a try, I haven't written a line of SQL since I've been using it.
Yes, I know and use it, but SQLAlchemy is not SQL. It's completely another (although, SQL-inspired and compiled-to-SQL) language, which rises learning barriers. I find myself frequently thinking SQL then mentally transforming the queries to SQLAlchemy syntax, which is somehow pointless activity, considering the fact computers excel at transforming formal languages.

I believe, If someone'll take an SQL SELECT statements parser and create a library that'd generate SQLAlchemy query/statement object from them, such library will make development more productive.

this is more or less what HQL does (http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html...). However the huge advantage to composing SQL as an actual object construct is that you get reusable constructs which serve as components to composing a larger structure. String-based SQL OTOH means you're going to be concatenating strings together which is error-prone, verbose, and even hazardous from a security point of view as it discourages the usage of bound parameters.
I have always wondered why people have an aversion to stored procedures. They are a programming language in themselves, so you could pass all of your parameters (owner, some_condition, other_condition) into a proc and, depending the logic, return a different cursor to a different query.
it's because the stored procedure development model lacks the tools in order to make integrating with an application-level domain model simple. You end up needing to write not just one persistence layer, that of marshaling your object model to and from SQL statements, but two - all the SQL statements behind your stored procedure layer, and a second to move all the data between the SPs and your object model. To make matters worse, the stored procedures must be written completely by hand without the benefit of any in-application schemas to help compose statements.

One reason for the variety of opinion on this is that different developers make more or less use of domain models in the first place. Those who are accustomed to writing all SQL completely by hand with no helpers at all, and not working with a domain model tend to view the stored procedure approach as equivalent. Those who are accustomed to having at least some simple marshaling layers like a construct that generates an INSERT statement given a list of column names see the SP approach as more tedious since simple techniques like that are usually not easily available, at least in more old school SP languages like TRANSACT-SQL and PL/SQL.

All of that said, I do think this is a problem that can possibly be solved. Postgresql allows SPs to be written in many languages, including Python. I have an ongoing curiousity about the potential to integrate a Python-based object relational system into a stored procedure system. But it might end up looking like EJBs.

The problem with stored procedures is managing it. You have to maintain another, separate codebase. You need to make sure those procedures are "installed" and up to date. Plus, DBMSs don't have a concept of "versions", and those procedures are treated as data. In short, it's hell.

That's one of the pain points RethinkDB is trying to solve, since you write your queries in whatever application language you use and it's parsed and executed in the cluster.

This sounds like a situation where it'd just be better to use separate, yet similar, queries, with each handling a particular case or set of conditions.

Views, stored procedures and functions can be used to help isolate duplication, parameterize the queries, or otherwise hide the SQL.

Code like you've posted is the result of taking DRY too far, to the point where avoiding a small amount of repetition ends up bringing in far more complexity and problems than the repetition might cause.

CLSQL has a functional interface which can do this (http://clsql.b9.com/manual/).
"The only problem I see, is we can't rewrite SQL."

I believe that the phrase you're looking for is "lack of compositionality".

I fail to see why one can't manipulate parsed SQL's AST. SQLAlchemy shows thas manipulating SQL-inspired objects is perfectly possible.

It's only bridge between sqlparse and SQLAlchemy that's missing. I guess, just because nobody had a wish, will and time to finish and share one.

I fail to see why one can't manipulate parsed SQL's AST.

I think that this is the "every problem in CS can be solved by another layer of indirection" part. You're basically sidestepping the issue of SQL not providing the functionality in the first place.

There are libraries which build up a model of a query - selections, constraints, etc, and only turn it into a string at the point of executing it.

SQLAlchemy was already mentioned. In the JavaScript world, there's node-sql. https://github.com/brianc/node-sql

That kind of code often ends up being worse to deal with than the SQL it is replacing.

I've always found it kind of odd how there are some people who despise SQL merely for its syntax, yet they'll turn around and advocate the use of libraries which mimic a SQL-like syntax in some other programming language (but do an absolutely terrible job at it).

The node-sql examples are atrocious, for example. It's even more obvious with the SQL so close by. The SQL statements are clear and concise, while the JavaScript version is nowhere near as easy to read.

At least LINQ gives the option of not having to directly deal with the method calls, which makes it marginally nicer to work with. Anything less than that, like we see with basically all other systems, is far less usable.

> I've always found it kind of odd how there are some people who despise SQL merely for its syntax, yet they'll turn around and advocate the use of libraries which mimic a SQL-like syntax in some other programming language (but do an absolutely terrible job at it).

No, I use ORM because I love SQL. ORM doesn't replace SQL. ORM helps to generate the exact SQL I want with much less code.

I have seen application with thousands of stored procedures, most of them boilerplates, and only supports one particular flavor of RDBMS. I have seen too much hand-crafted SQL in the form of "@param_xxx IS NULL OR field_xxx = @param_xxx".

I used to think that Tom Kyte was right, that everything should be in stored procedures. Now, I am thankful for ORM (more specifically, SQLAlchemy).

The complaint was not about SQL's syntax, but that SQL statements as strings are inflexible.

One often wants to have several variants of a SQL statement, beyond simple placeholders for arguments. I've seen several projects that grow a lame templating syntax on top of their SQL strings, to the point that the SQL then becomes incomprehensible.

If this really bugs you, perhaps the ultimate solution would be to actually parse SQL.

   query = sqlParse("SELECT foo FROM bar WHERE quux = 1")
   query2 = query.clone().constraint("quux = 2")
Some ORMs support the Query Builder pattern which does the exact thing you want. http://www.yiiframework.com/doc/guide/1.1/en/database.query-...
Just my opinion of course, but it seems that designers who think primarily in terms of the application will prefer to use an ORM to abstract away details of the storage layer; whereas a designer who thinks primarily in terms of data will prefer to write SQL, with the application being just one of many possible applications of that data.
Because (at least for the kind of queries common to most web applications) you can develop faster using an ORM. Less boilerplate, less repeated code, less time thinking about how to convert between SQL and application-level representations of your data - and most importantly, no time at all spent thinking about how to dynamically generate SQL queries using string concatenation (a sure-fire way to introduce bugs and security vulnerabilities in to your code).
Poor tool integration, nonexistent library ecosystem, and the advantages of a single-language codebase.
Because you end up writing the same basic SQL over and over and over again?