Hacker News new | ask | show | jobs
by vrtx0 3553 days ago
Not quite; the author states the MongoS nodes were SIGGERM'd. This is unrelated to WiredTiger (which runs in MongoD), and more likely caused by the Linux OOM killer or another resource limit being exceeded (like the number of threads or sockets MongoS tries to open). This can happen if the application is configured with larger connection pools than the limits imposed upon MongoS; e.g by calling ulimit).

The above scenario can easily happen if an end user's application doesn't take all resources into account (e.g. a web applicstion that accepts as many requests as possible and as fast as possible, and opens as many database connections to MongoDB as possible). In that situation, if MongoDB can't keep up, the application logic may keep accepting HTTP requests and generate even more DB requests. If this was indeed the case, MongoS would have been the bottleneck, thus SIGTERMs.

The crux of the actual problem they're having with wired tiger may have been due to a misunderstanding of performance expectations or configuration.

EDIT: it appears to have been a bug in this case; not the expectations or configuration I initially implied. My point about application design still stands; an HTTP 500 message is much better than submitting more requests to an already overloaded DB.

1 comments

I don't know... look at the bug report and the fix [1]. This does look like a bug in WiredTiger. It preallocates all of its cache, so it doesn't look like raw capacity problems. It looks like the internal MongoD issues pushed the problems downstream to the individual MongoS instances.

Comment from the author:

> We looked for an OOM in our logs but couldn't find it. Also, the log lines are from mongos nodes. Data nodes continues to run with severly reduced throughput.

[1] https://jira.mongodb.org/browse/WT-2924

We're in agreement. I'm just offering another way to prevent the SIGTERMs, regardless of the cause of the slowdown.