Hacker News new | ask | show | jobs
by wueiued 3668 days ago
I think graph operations are not fair comparison. It is notoriously difficult to scale.

On other side AWS now offers 2TB RAM machine. And single huge machine has smaller per GB cost than several smaller machines. I think clustered computing as we know will be soon gone. Only reason for multiple machines will be availability.

1 comments

Do you think our datasets will stop growing? It seems to me that data is growing faster than RAM, and has been for years. How do we find the upper limit of data? The Human Genome is finite, it will only get so big. What you did on facebook? Seems near infinite....
I'm sure the upper end of our datasets won't stop growing for the foreseeable future. But a huge proportion of problems has growth rates well below the growth rate of RAM.

And for that matter, even when we can't stuff it in RAM, the boundaries of what we can do on a single server is also constantly pushed back thanks to SSDs. It's just a few years ago since I was unable to get read speeds of more than 6GB/sec out of a RAM disk. Today I have servers that easily do 2GB/sec out of NVMe SSDs.

It's not that we never need to go beyond a single server. But people often really have no concept of when they'll need to.

For businesses? Sure there's gonna be a lot of limits. Only 7 billion humans so that sorta limits any user tables. Only so many things those people can buy each day, so that limits your orders table.

VISA does what, 150M transactions per day? I was doing more volume than that for telephone calls a couple years ago with a $5K server and a default install of MSSQL. (Full ACID, updating balances -- yes I know a CC tx is probably heavier than a VoIP call but still.) At 4KB per tx, VISA could use VoltDB and store all tx's in RAM for a week for like a million or two.

An 150M a day today is not that much more than it was 15 years ago it seems. (50-100% more?).

For many, many, businesses, data size just isn't an issue any more cost wise, and soon won't be a technical challenge at all either. Yet 10, 20 years ago, we're talking 8-digit+ implementations.

Sure there's some things that grow faster, like all this increased tracking. But in general?

I can't remember where it was I saw it recently, but it was pointed out that there are diminishing returns on using more data. Your accuracy with something like Google's pagerank is good with 1 billion data points for input, but doesn't improve much with more data.