|
|
|
|
|
by singron
2323 days ago
|
|
Keeping GC off for a long running service might become problematic. Also, the steady state might have few allocations, but startup may produce a lot of garbage that you might want to evict. I've never done this, but you can also turn GC off at runtime with SetGCPercent(-1). I think with that, you could turn off GC after startup, then turn it back on at desired intervals (e.g. once an hour or after X cache misses). It's definitely risky though. E.g. if there is a hiccup with the database backend, the client library might suddenly produce more garbage than normal, and all instances might OOM near the same time. When they all restart with cold caches, they might hammer the database again and cause the issue to repeat. |
|
CloudFront, for this reason, allocates heterogeneous fleets in its PoPs which have diff RAM sizes and CPUs [0], and even different software versions [1].
> When they all restart with cold caches, they might hammer the database again and cause the issue to repeat.
Reminds me of the DynamoDB outage of 2015 that essentially took out us-east-1 [2]. Also, ELB had a similar outage due to unending backlog of work [3].
Someone must write a book on design patterns for distributed system outages or something?
[0] https://youtube.com/watch?v=pq6_Bd24Jsw&t=50m40s
[1] https://youtube.com/watch?v=n8qQGLJeUYAt=39m0s
[2] https://aws.amazon.com/message/5467D2/
[3] https://aws.amazon.com/message/67457/