Hacker News new | ask | show | jobs
by arielweisberg 3249 days ago
...

I was the third engineer at VoltDB and spent six years making that bet. It's not a good bet.

Maybe there are other factors, but if VoltDB could page out cold data to disk I think it would be at least 2x if not more successful. No one agreed with me so it never happened.

I saw so many use cases go out the door because hey you know what? RAM is expensive and it's cheaper to page out cold data. The scale where that cost starts to matter is not that big.

6 comments

Have spent 6 years I think working on SAP HANA. The one feature I've always asked for is seamless paging of even warmish data to disk.

In memory is fast and awesome, but it doesn't have to be as mind boggling expensive as it is. Why are we all making the same mistakes?

I would really like to hear something about your experience with SAP HANA. Do you have a blog or anything you could share?
Yes, it's why we at MemSQL added a column store on disk in 2014. Memory-only is too limiting and has evolved to a notion of "memory-first."
Really a sort of odd choice though. I'd have gone the other way around with column store in memory, and row store on disk.

Not that I think that's ideal either though, having both in memory for hot used data, and the rest on disk is ideal. With an extremely easy to use setup that makes it essentially automatic, but with rules engines for finer grained tuning.

MemSQL actually adds an in-memory rowstore to each columnstore for rapid ingest of new rows until they get compacted into a new segment. Columnstore data is pretty fast so it works well off disk compared to row stores which aren't as efficient.

SQL Server similarly has the hekaton in-memory tables + columnstore indexes and the latest version allows combining both for in-memory columnstores.

I've used MemSQL, and it's rapid ingest by the default isn't ACID compliant, so it sort of depends on how you compare it.

The results of the columnstore data was pretty fast, and it's even faster in memory. Depends on what you're doing, and what the requirements are.

Was really impressed by MemSQL, and loved the wire compatibility with mysql, so don't take this as just a knock on MemSQL in anyway.

> by the default isn't ACID compliant

What do you mean?

I'm surprised you wouldn't understand this as it's an absolutely requirement given you are a user of MemSQL.

But then I went to their docs to link you to the details, and it feels like they intentionally avoid stating clearly the problem.

Essentially, they allow committed transactions to hit memory and not disk. They allow you to configure it so that's not the case, but it isn't the default, and looking over the current documentation they certainly aren't clear about it like an open source project would be.

transaction-buffer needs to be set to 0 for durability, but the way the docs are explaining it is trying to confuse not being durable, as a different kind of durability.

I'm not interested in getting into a long discussion about this though, but it's difficult to explain the literal issue when they do such a marketing job of trying to hide the specifics.

Now I'm far less surprised a user wouldn't know this. Apologies for my forward initial statement.

Curious - why did people not agree with you? Was it an ideological belief or were they betting on some assumptions that turned out to be incorrect?
Do I really know? I'm not sure. I have opinions, but for the most part I was kept in the dark. There was a roadmap handed to me and what was on it was already decided.

We did work on equally important things also, but we also split focus with IMO unimportant things.

A combination of me not having a seat at the table (literally was told this after a year or so) and IMO non-technical leadership driving focus by chasing what they thought were the important factors.

The company survives and does OK though.

Given a time-series history of DRAM/SSD prices, by "what value" do you need to be able to buy 1TB RAM (or anything approaching RAM speeds) in order to make VoltDB and in-Memory Databases competitively advantageous?

So, given this insider knowledge of yours, can we make a prediction by what date predicively DRAM/SSD-NVMe prices may make in-Memory Database Startups lucrative again?

--

Offtopic:

I feel empathy for you, being overrun in decisions as an engineer in a field you're the expert in market and technology by decision makers can be heart-breaking.

EDIT:

removed irrelevant personal experience

So what happened that isn't shown in most analysis of RAM costs is that RAM didn't go down in cost that much for many people. For instance RAM in the cloud is still very expensive.

What is also not shown is that cold data is everywhere. You need to have it, but paying to put it in RAM generates zero value for a profit seeking business. So if you don't page out cold data you effectively throw yourself out of the running for a huge swath of use cases.

For a small deployment sure it's dwarfed by engineering costs. But infrastructure per engineering head count is trending towards more infrastructure per head and infrastructure cost matters to more businesses.

The other thing is that data volumes are also increasing at a rate competitive with RAM is decreasing in price. This is because there are new opportunities to make money using more data and this is a trend you can't really beat. The more data you can have the more use cases and lines of business get invented.

This is not a scientific analysis it's just conjecture based on anecdata from my time in the industry.

+1, this ended up being a major con when we did our comparison considering how much data we needed available and the cost overhead.
this is a feature that MySQL Cluster had to add over time, and then it had limitations that were slowly lifted.