Hacker News new | ask | show | jobs
by contingencies 3249 days ago
Well yes, I think RAM has been the new disk for awhile now, and not because (anecdote about database disk structures) or (any recent change to cost of RAM).

If you use Linux, the fastest way to test how much faster your application is off disk is to simply make a filesystem in RAM, and run the whole thing from there. Because library-chasing to build a chroot is a hassle, I would recommend simply putting a container on a RAM-backed block device, then installing your application on the container.

I have personally designed, built and managed large clusters of diskless machines and find that the mix of RAM-only and PXE[1] boot is an excellent one for maintaining state (and security) across well managed infrastructure. Disks be damned. For permanent storage, consider sharing a DRBD[2] cluster from dedicated nodes.

[1] https://en.wikipedia.org/wiki/Preboot_Execution_Environment

[2] https://en.wikipedia.org/wiki/DRBD

2 comments

I don't recommend this anymore. With a typical developer machine containing 16GB of RAM, and especially on Linux, you will that all of your daily-touched files are in FS cache after a few minutes of work. Even with default kernel settings Linux is pretty good with eating up all of your unused RAM for speeding up disk access.

Here's my anecdote based on 16GB workstation with NVMe SSD (Samsung 960 Pro):

Watching my project compile I occasionally open iotop in another terminal and don't see anything above occasional write flushes. To confirm, I did create a tmpfs volume and did not observe any improvement. `free` reported my buffers to be at ~4.7GB, which is basically all of my /bin, /usr and all of Golang sources+libs.

One memory leak and your cache is gone.

[edit] Not sure if ramdisks are pinned though.

> Not sure if ramdisks are pinned though.

Ramdisks will go to swap. A memory leak will force the entire ramdisk into swap, and reading it back into memory afterward is 10 to 100 times slower than reading normal files off of a disk.

> Ramdisks will go to swap.

Assuming that you have swap. I don't; I want my SSD to stay alive.

If you have a memory leak, and a ramdisk, and no swap then the OOM killer will trigger.

Hopefully it will target the program with the memory leak, but this is not guaranteed.

Swap is useful because you can shift unused memory onto disk. There are many programs that allocate (and write) a lot of memory that they never afterward use.

By having swap you make more room for cache in memory.

SSD doesn't matter here - this is not swap thrashing, but rather occasional writes.

Does anyone have actual data on swap on SSDs in 2017?
Every network access to disk based storage that can't be cached|copied to the your ram filesystem for the lifetime of your container is a performance hit and you can't scale with RAM backed storage without more phys memory. As a matter of fact when you run out of RAM in your (maybe incomplete) scenario you no longer have a working node. Zero sum.

DRBD is fine (used it for years) but it's not something that is one size fits all.

Every network access to disk based storage...

Rare for most services. Stuff like logs can be shuffled off elsewhere for a write, requiring no commit validation. Only DB/fileservers really require permanent storage with commit validation, writes are typically rare, and 100Gbps+ LAN on a PXE-based diskless cluster is not going to be introducing massive latency, especially if you prioritize the VLAN or link multiple ports. Reads are typically cheap and cacheable.

that can't be cached|copied to the your ram filesystem for the lifetime of your container

IMHO most services and their dependencies will come in well under 512MB, so that's a non-issue.

you can't scale with RAM backed storage without more phys memory

By definition, one could say the same about anything... although to be fair you could still scale via compression, sharding, or another established strategy.

when you run out of RAM...

In a managed scenario a service container or VM would terminate or a significant degradation in response time would be detected, it would be taken out of the service pool and stop having traffic routed to it, be restarted, then be re-introduced to the pool. Ditto extra CPU load, broken network policies, anomalous block IO, etc. Leaving modern service-level architecture aside, basic heartbeat-style IP monitoring with reliable node-level failover has existed in open source since the 90s. There's really no excuse to wing this stuff on production systems today.

it's not something that is one size fits all

Nothing fits all!

<sarcasm> I find your production model very attractive and your assumptions about other usage(s) persuasive. You must be an expert! </sarcasm>