| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thyrsus 1804 days ago

Are there recommendations for learning about Linux kernel memory management? Two anecdata:

* I had some compute servers that were up for 200 days. The customers noticed that they were half as fast as identical hardware just booted. Dropping the file system cache ("echo 3 | sudo dd of=/proc/sys/vm/drop_cache") brought the speed back up to the newly deployed servers. WTF? File system caches are supposed to be zero cost discards as soon as processes ask for RAM - but something else is going on. I suspect the kernel is behaving badly with overpopulated RAM management data (TLB entries?), but I don't know how to measure that.

* If that is actually the problem, then a solution might be to decrease data size by using non-zero hugepages ("cat /proc/sys/vm/nr_hugepages"). I'd love to see recommendations on when to use that.

5 comments

citrin_ru 1804 days ago

I don’t remember details now, but I’ve seen a situation when a Java app was working slower and a box with more RAM (and probably a bigger heap size), compare to a box with the same CPU but 2x less RAM. I suspected that TLB cache was the reason, but didn’t have time to test this.

link

Tostino 1804 days ago

Could have also been compressed OOPs

link

jeffbee 1804 days ago

Explicit hugepages on x86 are difficult to manage. Most people using off-the-shelf software can only take advantage of it by configuring, for example, innodb buffer pools to use them. However if your compute server really is a database, then you'll find the performance benefit is well worth the configuration.

For other processes you'll need a hugepage-aware allocator such as tcmalloc (the new one, not the old one) and transparent hugepages enabled. Again, the benefits of this may be enormous, if page table management is expensive on your services.

You will find a great deal of blogs on the web recommending disabling transparent hugepages. These people are all mislead. Hugepages are a major benefit.

link

jashmatthews 1804 days ago

THP is a net loss for many workloads, including PG https://www.percona.com/blog/2019/03/06/settling-the-myth-of...

For workload using forking and CoW sharing like Redis or CRuby it negates the entire benefit of CoW since flipping a single bit copies the entire huge page.

link

jeffbee 1804 days ago

That's what used to happen but since kernel 5.8, anonymous shared pages that are dirtied by child processes are instead divided into normal pages, in the same way they would be if they were named (file-backed) mappings.

link

thyrsus 1804 days ago

3rd party closed source software; I think it's using the C library malloc - which uses sbrk for small things, but uses mmap for >= 128k. Fun historical fact: the Red Hat/CentOS 5 kernel ulimit didn't limit mmap allocations :-/

link

mixmastamyk 1804 days ago

Memory fragmentation? Dropping the cache and restarting high mem services at the same time might clear things up.

link

DoomHotel 1804 days ago

The kernel uses the sysctl vm.vfs_cache_pressure to determine whether to evict cache vs. process memory.

link

SteveNuts 1804 days ago

Are you using any swap? If so, check the swappiness setting

link

thyrsus 1804 days ago

No swap. These are large RAM (400G to 1000G) Kubernetes nodes.

link

sargun 1804 days ago

This is likely due to a kernel bug that was caused by the way cgroup slab management is handled. Upgrade to 5.10 or later, and it should be fixed. I’d be interested to see if the problem continues.

link