Hacker News new | ask | show | jobs
by big_whack 896 days ago
A lot of the commenters seem like database fans instinctively jumping to defend databases. The post is talking about contexts where you are dealing with petabytes of data. Building processing systems for petabytes has a separate set of problems from what most people have experienced. Having a single Postgres for your startup is probably fine, that's not the point here.

There is no option to just "put it all in a database". You need to compose a number of different systems. You use your individual databases as indexes, not as primary storage, and the primary storage is probably S3. The post is interesting and the author has been working on this stuff for a while. He wrote Apache Storm and used to promote some of these concepts as the "Lambda architecture" though I haven't seen that term in a while.

4 comments

So what you're saying is that this article is irrelevant for 99.999% of developers. The instinctive jump to defend databases is completely understandable given that context.

> You use your individual databases as indexes, not as primary storage, and the primary storage is probably S3.

Which is a perfectly valid use for a database. Our company's document management system uses a big database for metadata and then, of course, stores the actual files on disk.

I think the complexity gets really crazy at high scale, but the complexity caused by databases is still significant at low scale as well. For example, needing to use an ORM and dealing with all the ways that can leak is pure complexity caused by not being able to index your domain model directly.
I feel like there are a couple different points -

* The immutability, lambda architecture points I agree with. I think the separation of the immutable log from the views is important. Databases are frequently used in ways that go against these principles.

* I am not sold that being unable to express the domain model correctly is really a fair criticism of databases. Most businesses in my experience have a domain that is modeled pretty well in a relational DB. I haven't seen a better general solution yet, though I haven't checked out Rama.

At the low end of the scale, there are a lot of companies (or projects) for which the entire dataset fits in a single managed Postgres instance, without any DBA or scalability needs. They still suffer from complexity due to mutable state, but the architectural separation of source of truth vs "views" can be implemented inside the one database, using an append only table and materialized views. There are some kinds of data that are poorly modeled this way (e.g images) but many kinds that work well.

So I don't really view the architectural ideas as repudiating databases in general, more as repudiating a mutable approach to data management.

I've never known anyone to _need_ to use an ORM, it's always been out of convenience.
The article should probably just explicitly say that, to avoid all this arguing.
> Having a single Postgres for your startup is probably fine

It’s also probably fine for about 95% of companies, and that figure is rising.