Hacker News new | ask | show | jobs
by ksec 819 days ago
>As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

I have been stating this since at least 2020 if not earlier.

We are expecting DDR6 and PCI-E 7.0 Spec to be finalised by 2025. You could expect them to be on market by no later than 2027. Although I believe we have reach the SSD IOPS limits without some special SSD with Z-NAND. I assume ( I could be wrong ) this makes SSD bandwidth on Server less important. In terms of TSMC Roadmap that is about 1.4nm or 14A. Although in server sector they will likely be on 2nm. Hopefully we should have 800Gbps Ethernet by then with ConnectX Card support. ( I want to see the Netflix FreeBSD serving 1.6Tbps update )

We then have software and DB that is faster and simpler to scale. What used to be a huge cluster of computer that is mentally hard to comprehend, is now just a single computer or a few larger server doing its job.

There is 802.3dj 1.6Tbps Ethernet looking at competition on 2026. Although product coming through to market tends to take much longer compared to Memory and PCI-Express.

AMD Zen6C in ~2025 / 2026 with 256 Core per Socket, on Dual Socket System that is 512 Core or 1024 vCPU / Thread.

The future is exciting.

2 comments

> As DuckDB’s manifesto “Big Data is Dead” suggests, the era of big data is over.

yet, their db can't handle many cases where data doesn't fit into memory, and PgSQL always does large writes in single thread..

I'm not sure what your point is here, it seems like you are just listing off announced hardware.
The threshold for Cassandra / Dynamo scaling is increasing is probably the only point. "Big data is dead" is pretty stupid to say, typical clickbait marketing by a database that will probably be chucked away by something else trendy in another year.

But at a certain point, a 10,000 core 5 petabyte single megamachine starts to practically encounter CAP from the internal scale alone. It already ... kind of ... does.

And no matter how big your node scales, if you need to globally replicate data ... you have to globally replicate it over a network, and you need Cassandra (DynamoDB global replication is shady last I looked at it, I have no idea how row-level timestamps can merge-resolve conflicting rows updated in separate global regions)

The point is Big Data or Hard to Scale aren't as much of a thing with Hardware technology moving forward.