Hacker News new | ask | show | jobs
by palotasb 1149 days ago
As a counterpoint, Derek Sivers says Simplify: move code into database functions <https://sive.rs/pg>

I think they are both right, Alex Kondov for keeping his domain logic mostly in the application layer, and Derek Sivers for mostly keeping his in the database layer.

Why? Because that's what they know well, and they can keep the domain logic mostly in one place. If Alex can keep most if his domain logic in the application layer that he knows better, that's the better choice for him. If Derek can keep most of his domain logic in the database layer that he knows how to utilize well, that's the better choice for him.

(I can't find it now but I thought someone very smartly reflected on the Why Perl? <https://news.ycombinator.com/item?id=35646612> post today that the main reason why anyone will choose any programming language is that they know how to use it well. I think the same applies here. This is why many programming languages are valid choices for different people and project, and this is why different architectures are valid choices too.)

7 comments

Derek is right. Tooling is the main reason not to put code in DBs. It’s a pain, that’s all. Tools are like ‘70s calls to say never forget me’. But logically it makes eminent sense. At the end of the day, writing code in the dB is not that different than writing components for an app server.

There is also another reason. Designing effective schemas and bundled logic is not junior level work. Microservices, imo, all song and dance aside, really were trying to deconstruct the “monolith” of the schema: a schema that serves the requirements of today, is modular, and is possible to sanely extend in the future is not commodity work. So a practical reason not to do it is the required labor (expertise).

Both are implementations of the same idea: not splitting the logic into two services deployed separately. Either store data directly on disk or in a general-purpose piece of infrastructure that needs no configuration.
The line of reasoning seems valid, do what you do best; anyone who has had to deal with stored procedures in legacy systems will however disagree. Store procedures are to be avoided.
I used to deal with stored procedures in legacy systems and honestly it was not bad.

The first concern was getting the stored procedures into version control and creating a mechanism to update the systems based on the things in version control upon deployment. After that it was smooth sailing.

Why are stored procedures to be avoided?

I think most people are reacting to the impedance mismatch of normal software lifecycles versus sticky databases. Naive or legacy users and documentation use the database as the system of record for the custom code, rather than as an execution environment provisioned from a real system of record like a source control system.

Where it gets even more tricky is not just stored procedures but application-specific functions embedded in views, or triggers running custom functions. It's no longer just a library of functions you can choose to call (or not) durng a query, but code that runs on its own based on clients queries that never directly mention the functions.

The same goes for schema management, and I think that is a big reason why so many developers fixate on "schemaless" approaches. They want to pretend that the database exists in a static way outside the software lifecycle, just like they ignore the filesystem and operating system and treat it as an unchanging abstraction.

What is wrong with stored procedures ? They are not fashionable nor especially expressive, but they perform reliably and they do pretty much what anything that might be expected from relational logic.

But maybe that's my enterprisey bias.

They're turing-complete and modular so it's not really about what they can or cannot do.

Testability, tooling and the open-source ecosystem and either bad or non-existent. Writing PL/SQL is the worst environment I've worked in. That database sent emails, processed CSVs scheduled jobs, etc. yet there was still a web app to maintain next to it.

They're OK for certain things like essential triggers or performance-sensitive functions, but I would never deliberately put app logic in there. Major red flag.

Yep. Releasing, testing, debugging, etc are all more difficult in stored procs than in a “regular” language. Stored procs have other down sides:

  - often unique to that DB, so locks you in
  - Scaling that code is now tied to scaling your DB tier
  - Tooling is often very inadequate
  - Versioning and backwards compatibility of code can be a challenge
Some of those concerns apply to any database. Your query could slowdown if the database picks a bad plan, so you could say you will never trust the db to scale. That's separate from scaling the stored proc - just using the db can run into a scaling issue.
No, what I mean is your code scaling is now directly tied to how your DB scales. Your SP code can be impacting the rest of your DB, and vice-versa. I have seen large SP based systems to require Oracle boxes to be scaled up at enormous cost (hundreds of thousands or even millions of lines of SP).

Not because of slow queries, but just the cost of executing the stored procs themselves.

> Testability, tooling and the open-source ecosystem and either bad or non-existent

If you're properly testing the code in your application that exercises persistence, that means your test harness runs a real database like the one you're running in production and thus you can also write the database logic tests using your own application's testing facilities.

Of the things you listed, "the database sends e-mail" is the only one where I'd think you'd have to change the code at all, and have the database go through a mockable middle-man so that it becomes testable; but everything else can be comfortably tested from a test suite that is able to talk to a real database.

> That database sent emails, processed CSVs scheduled jobs, etc.

That's really a bad, very bad use of SP. They should only deal with and care about the data, not doing any interaction with any external systems.

I've never worked on a system where we didn't make an effort to make sure we were NOT dependent on a specific DB vendor/implementation and avoiding SP's has always been a part of that.
And in how many systems you ever ended switching from one database engine for another?
We went through and got Oracle out of our enterprise after a particularly nasty licensing negotiation and audit with them some years ago. So it does happen.

I expect at some point we'll have a similar initiative around cloud providers.

Yes, a very commendable goal, I've seen this effort undertaken many times. Curiously enough, I've never actually seen anyone switch databases.
Depends on the product in question. For integration products, its very common to support multiple database vendors with the same core engine. So any SQL written and db data-types is generally SQL-92/99.

This is pretty good since one can generally fast unit test DB code with an embedded database.

I moved the same system from MSSQL server to Oracle, then to MySQL.

The Oracle version had a lot of logic written as SP. I migrated them to MySQL SP.

So the conclusion is: SP don't make it impossible to migrate from one database to another, and second: yes database migrations do happen.

Also: all database migrations I have known involve Oracle in one way to another. Oracle salespeople are way too nosy.

I’ve done it twice. Mass migration off Oracle and before that DB2 and Sybase. Current system is also occasionally sold on premise to customers who have their own db preference and we can support them all.
Once of the major lifts at my last job was going from Oracle to post Fred's. The speed procedures were the worst part of it all.
Store procedures and functions are very good for many things. I am not claiming you should put all your business logic in the database, like old-time Oracle consultants wanted (for self-serving reasons), but some calculations are better handled in a single place, instead of being reimplemented in different parts of the stack, or stacks, given that it is common to write a project using one framework and language, and then migrate to another one.

The only reason to avoid SP is that you don't know any SQL and your ORM can't write the function calls, so you can't call them.

Which is a general complaint I have about modern software development: many things are done in convoluted ways because some developers don't know SQL and don't want to learn it.

When I was choosing jobs in the real world[1], I would avoid any company that depends heavily on stored procedures for business logic and where things were ruled by “database developers” and all the developers were doing were calling stored procedures.

Stored procedures are harder to unit test, do automated rollbacks, version control, etc.

I’ve never seen a system that 5 years in anyone said “I’m so glad we decided to use stored procedures everywhere”

[1] I work in consulting now and have no allusions that I face the same issues that day to day developers do. I get to lead development “solutions” and move on.

I would agree, but I think it's worth checking out one of Siver's postgres repos where I thought he tested things pretty well[0]. Also things like Postgres are often run as a "server" on local dev or CI but could easily be run more ephemerally[1]. That should help with testing various states and behavior of the database.

I still would actually follow what you said thought because often times places which rely on stored procedures are, as you say, ruled by developers with the database access to manage those procedures and from what I've seen often do not have anything close to a reproducible setup. Besides the fact that often it's some proprietary database which is much more difficult to run locally than sqlite3 or PostgreSQL.

Still I keep that separate from considering if views/stored procedures are actually a good solution. I think sometimes it really is. Honestly Sivers' experiments really convinced me of that.

[0]: https://github.com/sivers/store

[1]: https://jamey.thesharps.us/2019/05/29/per-project-postgres/

It's worse than that.

Your database will be the first major thing to cave under load.

The traditional tools most folks use (postgres/mysql/maria/etc) want to scale vertically, not horizontally (you can do it, but it's hard, and has lots of drawbacks).

Eventually - You will hit hardware limits. Providers only have instances that are so big. There is a ceiling on how much you can scale your DB without having to do major, major work.

I've seen this pattern play out now at 3 high growth companies I've worked at. The one that was using lots of stored procs and materialized views got hit the fastest and the hardest.

More generically, scaling stateful things is generally hard than scaling stateless things (because you have to figure out what to do with the state).

Even horizontally scaling systems you either have to redistribute the data on scale actions or keep the data in place and send new data to new infra (which may or may not be helpful)

When it happens it is time to actually use the big boys databases.
I’ve worked with the big boys and seen plenty of times where databases fell over no matter how much data you through at it because of locking issues and other database contention issues.

It’s much easier to horizontally scale a bunch of application servers than database servers.

That is why big boys databases do clusters.
“clusters” only help when you can deal with eventual consistency (which is often admittedly more than most businesses realize). Sure you can scale your read only instances.

BTW: I work in the consulting department of a little company I am sure you have heard of that knows something about dealing with “the big boys” at scale.

MySQL doesn't have materialized views, but it has better replication (at least using Aurora in AWS) than the big boys databases.

Adding more replicas and changing the instance size scales much better than fine-tuning the big complexity of Oracle and its materialized views.

So, maybe it is not the tool, but how to use it what matters. And fuck Oracle if by big boys databases you mean Oracle.

> When it happens it is time to actually use the big boys databases.

For years Oracle had a very small limit of how many CPU's stored procs could use baked into the DBMS engine. IIRC, it was something like 4.

Maybe that's changed now, I don't know. I do know Oracle qualifies for most as being one of "the big boys databases."

It is a matter of database tooling, Oracle and SQL Server provide quite good IDEs, graphical debugging, merge tooling, testing infrastructuring.

I can tell you that at least a pharmaceutical is quite happy to have plenty of stored procedures into their Oracle databases, including making use of Apex, and it is at least several decades in production.

I’ve worked on a codebase that had 300K lines of pl/sql and I didn’t find it all that fun to work with. The problem is that you are working in a very constrained environment, starved of language features and libraries, and with a deployment model that is deeply tied to the data and therefore more cumbersome. The IDE’s helped but didn’t matter that much. Also, oracle has per-cpu licenses, so the more code runs in the database the more expensive it becomes.

I think the best design is an API in a language designed for building API’s, on top of a dumb database, with all access gated through the API.

If you're going that approach, I think a lot of databases have facilities for extending outside plain SQL. Afaik Oracle can bind to Java and Postgres has an extension interface

I do tend to agree with API-outside-the-DB pattern. I guess Oracle supports some form of QoS but I don't think MySQL and Postgres do. That means having multiple apps hit the same schema can cause starvation issues

Also not sure how hard it is to monitor user resource usage. That's more important for billing (even if it's "fake" money from 1 department to another)

And when you get ready to rollback, can you just revert all of your code and deploy? Can you just do a “git branch” and work on your own isolated code in your dev account?” How well does merging work when you do a pull request and have to merge your code and make sure your stored procedures are up to date?
Yes, because there is such thing as CI/CD pipelines, and test environments, for database code as well.
If there are ten developers with 10 feature branches are they going to be running 10 instances of Oracle with their own “branches” of the stored procedures?
Yes,install the RDMS into their computer just like any other development tool.

Yes, learn to use multiple schemas and instances, just like using multiple deployments into Apache, Tomcat, IIS,...

Or even better, join the hype, and use containers with volumes configured for each feature branch.

Okay, how about "Either put your domain logic in the database, or put it in the code, but don't randomly scatter your shit throughout the code and database"?
Tom Kyte, who for a long time was the "ambassador to the world" for Oracle, makes essentially the same arguments

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUEST...

For data that will either :

+ 'outlive' the user interface - this data will be used forever but we are not sure where e.g. financial docs, records, etc etc

+ 'extend' past the user interface: this data started on an installed desktop app / green screen but we are not sure how we may want to extend it - eg to an API or to web

the DB is the right place to put business logic.

Coupling the data with business logic, in the DB, allows almost complete flexibility in how it is access and interacted with at the expense of complete lock in to the DB platform. Which may or may not be a bad thing. But if you have a skilled SQL team and some resources to pay the DB licensing and support piper, this is a good direction to ensure app support longevity.

For example, you can do the same thing on desktop or phone or web client or even hardware switches: say you have a DB stored_procedure to indicate that the process of manufacturing this part has reached stage X

update_item_status( Item 11111, 'Manufacturing Stage X' )

(these is obviously really contrived but it is an example to make things clear)

You can run this, if it is in the DB by:

- hitting this button on the assembly line

- scanning a barcode can do the same thing

- having a user change a status in a UI

- have a batch job run off a script with a list of parts to apply the status to

And even better, if you want to update how the stored proc works, you update once in the DB and it is available to all 'interfaces' instead of having to update hardware switch code + user screen UI code + batch script code etc etc...

I know that an argument can be made that you have an app server layer separate from the DB so this is not a 'real' problem but this approach just puts the app server IN the DB

One closing thought: it is often easy to forget that, in most cases, the data is the product of software development and the tooling is just support. Only counterpoints I can think of here are games and interactive demo systems where the process ( of playing the game or using the software ) is the product.

Positioning the database and front-end as being similar is a mistake if business goals are thought about from a "the data is the product" perspective. If the rules about how to handle, manage and interpret the data are a core component of that data product, then the argument that the right place for business logic is in the DB is made even stronger.

I think a really important aspect is keeping the logic in your head (and others learning it) is helped by mirroring this single source of truth with your implementation.