Well yes, I think RAM has been the new disk for awhile now, and not because (anecdote about database disk structures) or (any recent change to cost of RAM).
If you use Linux, the fastest way to test how much faster your application is off disk is to simply make a filesystem in RAM, and run the whole thing from there. Because library-chasing to build a chroot is a hassle, I would recommend simply putting a container on a RAM-backed block device, then installing your application on the container.
I have personally designed, built and managed large clusters of diskless machines and find that the mix of RAM-only and PXE[1] boot is an excellent one for maintaining state (and security) across well managed infrastructure. Disks be damned. For permanent storage, consider sharing a DRBD[2] cluster from dedicated nodes.
I don't recommend this anymore. With a typical developer machine containing 16GB of RAM, and especially on Linux, you will that all of your daily-touched files are in FS cache after a few minutes of work. Even with default kernel settings Linux is pretty good with eating up all of your unused RAM for speeding up disk access.
Here's my anecdote based on 16GB workstation with NVMe SSD (Samsung 960 Pro):
Watching my project compile I occasionally open iotop in another terminal and don't see anything above occasional write flushes. To confirm, I did create a tmpfs volume and did not observe any improvement. `free` reported my buffers to be at ~4.7GB, which is basically all of my /bin, /usr and all of Golang sources+libs.
Ramdisks will go to swap. A memory leak will force the entire ramdisk into swap, and reading it back into memory afterward is 10 to 100 times slower than reading normal files off of a disk.
If you have a memory leak, and a ramdisk, and no swap then the OOM killer will trigger.
Hopefully it will target the program with the memory leak, but this is not guaranteed.
Swap is useful because you can shift unused memory onto disk. There are many programs that allocate (and write) a lot of memory that they never afterward use.
By having swap you make more room for cache in memory.
SSD doesn't matter here - this is not swap thrashing, but rather occasional writes.
Every network access to disk based storage that can't be
cached|copied to the your ram filesystem for the lifetime of your container is a performance hit and you can't scale with RAM backed storage without more phys memory. As a matter of fact when you run out of RAM in your (maybe incomplete) scenario you no longer have a working node. Zero sum.
DRBD is fine (used it for years) but it's not something that is one size fits all.
Rare for most services. Stuff like logs can be shuffled off elsewhere for a write, requiring no commit validation. Only DB/fileservers really require permanent storage with commit validation, writes are typically rare, and 100Gbps+ LAN on a PXE-based diskless cluster is not going to be introducing massive latency, especially if you prioritize the VLAN or link multiple ports. Reads are typically cheap and cacheable.
that can't be cached|copied to the your ram filesystem for the lifetime of your container
IMHO most services and their dependencies will come in well under 512MB, so that's a non-issue.
you can't scale with RAM backed storage without more phys memory
By definition, one could say the same about anything... although to be fair you could still scale via compression, sharding, or another established strategy.
when you run out of RAM...
In a managed scenario a service container or VM would terminate or a significant degradation in response time would be detected, it would be taken out of the service pool and stop having traffic routed to it, be restarted, then be re-introduced to the pool. Ditto extra CPU load, broken network policies, anomalous block IO, etc. Leaving modern service-level architecture aside, basic heartbeat-style IP monitoring with reliable node-level failover has existed in open source since the 90s. There's really no excuse to wing this stuff on production systems today.
I was the third engineer at VoltDB and spent six years making that bet. It's not a good bet.
Maybe there are other factors, but if VoltDB could page out cold data to disk I think it would be at least 2x if not more successful. No one agreed with me so it never happened.
I saw so many use cases go out the door because hey you know what? RAM is expensive and it's cheaper to page out cold data. The scale where that cost starts to matter is not that big.
Really a sort of odd choice though. I'd have gone the other way around with column store in memory, and row store on disk.
Not that I think that's ideal either though, having both in memory for hot used data, and the rest on disk is ideal. With an extremely easy to use setup that makes it essentially automatic, but with rules engines for finer grained tuning.
MemSQL actually adds an in-memory rowstore to each columnstore for rapid ingest of new rows until they get compacted into a new segment. Columnstore data is pretty fast so it works well off disk compared to row stores which aren't as efficient.
SQL Server similarly has the hekaton in-memory tables + columnstore indexes and the latest version allows combining both for in-memory columnstores.
Do I really know? I'm not sure. I have opinions, but for the most part I was kept in the dark. There was a roadmap handed to me and what was on it was already decided.
We did work on equally important things also, but we also split focus with IMO unimportant things.
A combination of me not having a seat at the table (literally was told this after a year or so) and IMO non-technical leadership driving focus by chasing what they thought were the important factors.
Given a time-series history of DRAM/SSD prices, by "what value" do you need to be able to buy 1TB RAM (or anything approaching RAM speeds) in order to make VoltDB and in-Memory Databases competitively advantageous?
So, given this insider knowledge of yours, can we make a prediction by what date predicively DRAM/SSD-NVMe prices may make in-Memory Database Startups lucrative again?
--
Offtopic:
I feel empathy for you, being overrun in decisions as an engineer in a field you're the expert in market and technology by decision makers can be heart-breaking.
So what happened that isn't shown in most analysis of RAM costs is that RAM didn't go down in cost that much for many people. For instance RAM in the cloud is still very expensive.
What is also not shown is that cold data is everywhere. You need to have it, but paying to put it in RAM generates zero value for a profit seeking business. So if you don't page out cold data you effectively throw yourself out of the running for a huge swath of use cases.
For a small deployment sure it's dwarfed by engineering costs. But infrastructure per engineering head count is trending towards more infrastructure per head and infrastructure cost matters to more businesses.
The other thing is that data volumes are also increasing at a rate competitive with RAM is decreasing in price. This is because there are new opportunities to make money using more data and this is a trend you can't really beat. The more data you can have the more use cases and lines of business get invented.
This is not a scientific analysis it's just conjecture based on anecdata from my time in the industry.
The price of disk has dropped at nearly the same pace as ram. As has the cost of compute. At the same time data growth has increased faster than either has dropped... so I'm not really sure the price argument holds water. If I can buy ram at 1/100th the cost but I need to store 500x more data... that isn't a net win on cost.
It would be interesting to see that chart updated to 2017 data. It appears the downward slope becomes significantly less steep around 2009 (looks like the price dropped as much from 2006-2008 as it did in the five years 2009-2014), and I’d be interested in seeing how recent SSD prices affect this. As far as I can see, rotational HDD technology is at the end of its S-curve, whereas SSD technology is still relatively new.
Data Growth is faster then RAM price decrease, and at the moment it is actually increasing.
While I dont believe in Infinite growth of data, I still think a RAM only DB isn't as good if we have SSD that is ridiculously fast. My thinking is that RAM / SSD should always be 1:5 or 1:10.
The problem with asking a programmer to keep track of the locality of their data, is that most modern programming languages make reasoning about locality hard to do. With the exception of C and C++. Even for those languages, unless all relevant data is in simple arrays, making assertions about locality is hard.
For interpreted languages like Python or Javascript, figuring out RAM storage and access patterns of data is very hard. So we probably need programming language mechanisms to help with understanding the locality patterns of our programs and probably tooling to help change it.
As you see, in practice, a good linked list is same or slightly faster than std::vector. And it’s consistently 2-3 times faster than equally linked std::list.
That’s not just synthetic tests. Recently, I’ve got 2.5x performance improvement in my app just by switching from std::unordered_map to CAtlMap with the same keys/values.
Theoretically, C++/11 fixes that with stateful allocators. Practically, I’ve not seen good open source ones with the performance comparable to CAtlPlex that powers these ATL node-based collections. I’m not even sure it’s possible. STL is too standardized and too old. It might be there’s no room in its allocators API for sufficient level of integration between a collection and it’s backing stateful allocator.
Lots of developers are blind to issues of local reasoning. It becomes a self fulfilling prophecy, because the code they write is often inscrutable. The people who generally can follow things like this can't anymore.
And 99% of developers spend their entire careers without giving as much as a passing thought to cache locality. Quick, how long, in cycles, does it take to retrieve data from RAM? About 200 cycles. 200 cycles is a very long time if you miss cache often. Scattered RAM reads can be _slower_ than sustained linear disk reads (that is, once the disk actually gets around to reading, which takes a while).
90% of developers are working in languages where you can't really do much about cache misses, or doing so will at least involve some very non-idiomatic code. If you can't do much about the problem it's not really helpful to be thinking about it much.
I don't disagree. And for 90% of them worrying about cache locality or branch mispredictions on a daily basis would be a waste of time. It's fine to deliberately ignore such concerns. It's somewhat less fine to know absolutely nothing about how programs are actually executed, and what makes them go fast.
It's even worse when you get into interpreted languages like Python and Ruby. Bad efficiency at that level which translates to dreadful if not totally broken efficiency at the cache level.
If a terabyte is a lot of data to you -- and it is for many, many things, then this post is right; you should buy as much RAM as you have data, and access it accordingly.
The commenters who are saying disk has a different price/performance trade-off that is still valuable are also right, but that applies to large data sets.
I worked on a petabyte in memory hana cluster. It all depends on what you're doing, and how important it is to you.
I don't even know what a large data set is anymore. I think my general definition is one you won't put into memory, whatever your threshold is for that.
May be with very high-end servers it is. But generally it's not. I can buy 4TB HDD For $200. I think, I'll have to add 2-3 zeros for 4TB RAM machine, and I'm not even saying only about RAM, I need some server motherboard, some server processor, while I can use 4TB RAM HDD with pretty much any computer. And SSD isn't going to be even with HDD in near future as well for $/Byte. So optimizing software for HDD won't go anywhere. But, of course, it's awesome to have some alternatives if you have money and need more performance.
many NVMe drives on the market are useless jokes. try some from the now biggest semiconductor company, test their fsync() performance and don't get a heart attack for seeing those ugly numbers. ;)
hi olavgg, I was searching for fast fsync & low cost NVMe SSD a few months ago, so I looked into consumer NVMe SSDs. Samsung 960 Pro was the first I tested, the results were just shockingly bad. It was so bad to the extent that I started to question whether my kernel/installation caused the slowness issue. Searched online and found a few of your posts talking about the exact same problem you saw. That saved me quite a bit time. :)
Yes, totally agree with the conclusion in your link above, consumer SSD (NVMe or not, high end or cheap) doesn't worth a dime. Cheers!
That's very interesting. The estimate of 100ns came from here: https://people.eecs.berkeley.edu/~rcs/research/interactive_l.... and is probably not very precise (maybe because it is only capturing rough order of magnitude). I have now updated the post. Thanks for the feedback! Specific constant aside, the point about latency not improving much still holds.
There are these things called speed of light and virtual memory. The latency of the physical RAM is completely irrelevant unless its embedded directly on the cpu
Ive wondered in estimation how much of "my" data is stored in various types of memory. CPU cache if I am actively browsing a website, those last couple IMs stored in the RAM of some server? Any content i have ever uploaded to the internet is probably on a hard disk ready to be brought to cache at a moments notice. Then there are all the backups on tape.
And old is new again :). The ratio of ram to disk cost has historically varied wildly. The pendulum will come around again in a few years when huge SSD's are cheap.
I have a feeling with multi terabyte SSD's at cheaper prices we'll be shuffling all our data back to "disk" again :).
This might be the single most important article I have read in the past one week, because Adrian Colyer is on vacation! May I add that even SAP HANA is designed for in memory computing? As far as disks are concerned, NVM should soon replace them.
That's not how it works; programs are kept "warm" for some time after each requests, or indefinitely (e.g. in App Engine you can choose dynamic or resident instances).
Not sure i agree, on Server spec the ram cost 10% more than CPU like if CPU cost 2000 then ram cost would be like 2200. Also it is not scalable for amount of data and not sure if I agree on laptop as well , 8gb ddr3 is about $80 while I can get 128gb ssd or 1tb magnetic disk so really can't use memory instead of disk. Except in few cases
> If anything, I would suspect that the developers have become costlier over time, at least in the last 10 years or so.
Really? Have developer costs actually increased in real terms in the last 10 years? Have your developer costs (if you're outside the VC/SV bubble) increased in real terms? And how much?
The real point is that developer time has not kept up with the rate of RAM price decrease, and unless you plan on seriously defending the claim that developers only cost 1/6000th of what they used to twenty years ago, the points in the blog post stand.
If you use Linux, the fastest way to test how much faster your application is off disk is to simply make a filesystem in RAM, and run the whole thing from there. Because library-chasing to build a chroot is a hassle, I would recommend simply putting a container on a RAM-backed block device, then installing your application on the container.
I have personally designed, built and managed large clusters of diskless machines and find that the mix of RAM-only and PXE[1] boot is an excellent one for maintaining state (and security) across well managed infrastructure. Disks be damned. For permanent storage, consider sharing a DRBD[2] cluster from dedicated nodes.
[1] https://en.wikipedia.org/wiki/Preboot_Execution_Environment
[2] https://en.wikipedia.org/wiki/DRBD