| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vrtx0 3552 days ago

I partially agree here, but want to throw in the configuration aspect. In order for any database to be optimized of all workloads, it either has to adapt with minimal impact (something I haven't seen successfully implemented yet, but may only be a matter of time), or it has to be told how to behave (in other words, it has to be configured properly).

One of the things I like about the original MongoDB approach is that very little configuration was required to tune the database; instead you just configure the OS so the FS cache and scheduling best matches your workload. At first glance this may seem lazy, but kernel engineers have been solving this problem pretty well for years. Also, misconfiguring one or both systems results in a lot of confusion of the end-user; they must understand internals of an OS and DB, and make sure they don't conclict.

Implementing proper scheduling (not just IO, but CPU) isn't so much difficult as it is ensuring your end-users have the right tools to understand how configuring an application impacts performance vs. a kernel configuration. For example, conflicting trade-offs between the OS FS cache size and the DB's buffer cache size could easily result in a system using less than half of the available ram for useful purposes.

Anyway, I think your assessment is mostly spot on, except the part about closer source engines. I'm also not sure I totally agree with Postgres being much better than other database systems by default, but I do think it's a solid piece of software that does a good job with configuration values.

I've worked on 3 database systems in my career. One was closed source, and it was my least favorite. But that had more to do with the design decision to run in the JVM, which adds a 3rd level of configuration complexity...

Anyway, just my $0.02.