Hacker News new | ask | show | jobs
by zoomerang 4342 days ago
The problem with raw SQL queries is that they don't compose. Using an ORM, I can do two things.

1. Pass around query objects, and build queries derived from others. I can also combine multiple queries into one and split the results.

2. Update model records in multiple places passing them through several layers of business logic before serializing.

This is on top of the other obvious benefits of ORMs, such as abstraction over my storage engine. I can write a single 'query' that be be executed against a variety of SQL servers, Salesforce, MonoDB, an in-memory cache, or whatever else I want to do with it.

As a real-world example of why this matters - On my current project, I have a heuristic algorithm that allocates pending jobs to workers that meet the required skillset. As part of this, I have individually defined rules that must be met. As part of the operation, each rule can query the database for relevant results.

Each rule is a standalone module, and by design cannot have any knowledge of other rules. Rules can be written by third parties and are plugged in and loaded at runtime. To make this work, we can either

1. Run one query per rule, hitting the database more than needed 2. Compose all the rules into one query, hitting the database once.

Using an ORM, I'm able to take a base query and fold it through all the available rules, allowing each one to return an updated query to be merged. Some rules conflict - they may require results that another rule has filtered out. To solve this, the ORM will automatically detect the conflict and build a single 'Query' object that compiles to a broader query behind the scenes, takes the results, stores them in an in-memory cache, and then runs the individual rules in-memory against them to get a final resultset. In the worst case scenario where this is not possible, it will compile to the minimum possible number of SQL queries to satisfy all of the individual callers.

As a result, each rule can run a different query against the database, getting the individual resultset it wanted, while not hitting the database so heavily.

Why not just query multiple times? In this case, we're running this against a Salesforce database. On top of the fact that you pay per API call, there's anywhere up to 2 seconds of latency before getting a result. Composing the queries means we take an operation that might have taken a few minutes and used a lot of expensive API calls into an operation that takes a few seconds and uses 1 API call.

At the end of this I get a resulting immutable object. I can perform business logic on this in multiple places, accumulating changes, at the end of which I have an `update` object containing an original version and a delta. I can then just update the things that actually need updating, and can run additional business rules intelligently based on which fields were changed. If there are multiple records that need updating, the ORM will batch them for me to further reduce API calls.

Using raw SQL, it would be possible to implement a rough hack that approximates this, but it would be nowhere near as efficient or scalable.

1 comments

> this is on top of the other obvious benefits of ORMs, such as abstraction over my storage engine. I can write a single 'query' that be be executed against a variety of SQL servers, Salesforce, MonoDB, an in-memory cache, or whatever else I want to do with it.

This is a trade-off, not an obvious benefit. In programming to a lowest-common-database API, one loses the ability to use any actual features of the specific database technology being used. It would be very interesting to know what proportion of projects need to smoothly change the underlying DB technology (it is of course debatable whether ORM's actually let you do this), vs what proportion find themselves hampered by the less-powerful-than-SQL database manipulation API offered by an ORM.