Hacker News new | ask | show | jobs
by shepardrtc 1831 days ago
Really great article! I have a question: in it you say to keep an eye on RollbackSegmentHistoryListLength, and I want to do that, but I don't know at what number does it become something to worry about. There doesn't seem to be any guidance on AWS' site. I'm seeing ranges of 1,000 to 5,000 and sometimes 100,000.
1 comments

Great question, although I'm not sure there's a concrete answer to it other than "it depends". You can think of that metric as representing the number of logs that haven't been garbage collected, so as it goes up, performance will get worse.

If you're seeing spikes in RollbackSegmentHistoryListLength that coincide with dips in DB performance, you've probably identified the culprit. In the scenario described in our post, that metric would have grown monotonically for the duration of the long-lived ETL query – probably a more overt problem than what you're describing with short spikes to 100,000.

A number of our 100k spikes spanned about a day, and a cluster of them seem to coincide with serious performance issues we have encountered. We "solved" the problem by increasing the instance size, but I'm starting to see spikes that get larger and larger, so I suspect we will run into this issue again. But now I have something to report on and watch out for. Thank you!