Hacker News new | ask | show | jobs
by yid 3553 days ago
So the root cause is that WiredTiger locks up and SIGTERMs when it fills the cache? If this is indeed the cause, I must say this does shake my faith in WiredTiger. That's a pretty basic scenario that a company like 10gen should be testing for regularly, certainly before releases.

And before the Mongo haters come out, remember that WiredTiger was written by about as stellar a database team as you can have.

2 comments

This definitely should be caught by any reasonable testing regimen for a database but the underlying issue appears (upon casual inspection of the architecture) to be something different.

Most people design storage engines without a true I/O scheduler (WiredTiger appears to be such a case), mostly because it requires a huge amount of expert code work that doesn't payoff for narrow use cases. The caveat is that it is difficult-to-impossible to design a storage engine that has very high performance and is well behaved under diverse workloads without a proper I/O scheduler.

The side of "generalized good behavior" or "very high performance" tradeoff a storage engine falls on depends on the original goals of the developer and there are many such storage engines that explicitly optimize for excellent behavior under diverse loads over performance (PostgreSQL is such an example). In the case of WiredTiger, it was marketed as "very high performance" but is now being used under increasingly diverse workloads that exposes this tradeoff. Without making changes that saddle performance, behavioral edge cases are largely unavoidable; you can move them around but not completely eliminate them.

This remains, in my opinion, the most important difference between open and closed source storage engines in practice. In closed source, most of your high-performance storage engines implement true I/O schedulers, usually by the same few specialists that float around between companies.

I partially agree here, but want to throw in the configuration aspect. In order for any database to be optimized of all workloads, it either has to adapt with minimal impact (something I haven't seen successfully implemented yet, but may only be a matter of time), or it has to be told how to behave (in other words, it has to be configured properly).

One of the things I like about the original MongoDB approach is that very little configuration was required to tune the database; instead you just configure the OS so the FS cache and scheduling best matches your workload. At first glance this may seem lazy, but kernel engineers have been solving this problem pretty well for years. Also, misconfiguring one or both systems results in a lot of confusion of the end-user; they must understand internals of an OS and DB, and make sure they don't conclict.

Implementing proper scheduling (not just IO, but CPU) isn't so much difficult as it is ensuring your end-users have the right tools to understand how configuring an application impacts performance vs. a kernel configuration. For example, conflicting trade-offs between the OS FS cache size and the DB's buffer cache size could easily result in a system using less than half of the available ram for useful purposes.

Anyway, I think your assessment is mostly spot on, except the part about closer source engines. I'm also not sure I totally agree with Postgres being much better than other database systems by default, but I do think it's a solid piece of software that does a good job with configuration values.

I've worked on 3 database systems in my career. One was closed source, and it was my least favorite. But that had more to do with the design decision to run in the JVM, which adds a 3rd level of configuration complexity...

Anyway, just my $0.02.

Not quite; the author states the MongoS nodes were SIGGERM'd. This is unrelated to WiredTiger (which runs in MongoD), and more likely caused by the Linux OOM killer or another resource limit being exceeded (like the number of threads or sockets MongoS tries to open). This can happen if the application is configured with larger connection pools than the limits imposed upon MongoS; e.g by calling ulimit).

The above scenario can easily happen if an end user's application doesn't take all resources into account (e.g. a web applicstion that accepts as many requests as possible and as fast as possible, and opens as many database connections to MongoDB as possible). In that situation, if MongoDB can't keep up, the application logic may keep accepting HTTP requests and generate even more DB requests. If this was indeed the case, MongoS would have been the bottleneck, thus SIGTERMs.

The crux of the actual problem they're having with wired tiger may have been due to a misunderstanding of performance expectations or configuration.

EDIT: it appears to have been a bug in this case; not the expectations or configuration I initially implied. My point about application design still stands; an HTTP 500 message is much better than submitting more requests to an already overloaded DB.

I don't know... look at the bug report and the fix [1]. This does look like a bug in WiredTiger. It preallocates all of its cache, so it doesn't look like raw capacity problems. It looks like the internal MongoD issues pushed the problems downstream to the individual MongoS instances.

Comment from the author:

> We looked for an OOM in our logs but couldn't find it. Also, the log lines are from mongos nodes. Data nodes continues to run with severly reduced throughput.

[1] https://jira.mongodb.org/browse/WT-2924

We're in agreement. I'm just offering another way to prevent the SIGTERMs, regardless of the cause of the slowdown.