Hacker News new | ask | show | jobs
by jbs40 2958 days ago
The architectural pendulum is starting to swing away from co-location of storage and compute (the trend of the last 10+ years) to decoupling of storage and processing to avoid exactly these issues, but legacy architectures hang on for a while.

In the streaming and messaging space, Apache Pulsar (pulsar.apache.org) is a more recent solution that has an architecture that decouples processing and storage. That gives you nice properties like independent scaling of storage and processing, infinite data retention, dynamic resizing and others.

1 comments

What pendulum do you see? From here, architectural patterns are clearly converging on a "distributed mainframe" model between containerization and lambda/kappa architectures...

I think Joyent's Manta was ahead of its time in colocating compute and storage and I suspect we'll see more along this vein with the recent open sourcing of FoundationDB.

I was thinking more specifically of the internal architectures of data processing platforms, especially the categorizations that emerged from the MPP database world. The "shared nothing" architecture has been dominant in databases (and is also the core architecture of Hadoop), designed around "co-locating data and compute". Kafka largely follows that architecture as well, using local disk on the compute nodes as its persistent storage layer.

A lot of new data processing platforms, from Snowflake in the data warehouse world to AWS Athena to Apache Pulsar in the broader data processing world, have moved to decoupled architectures.

Containerization and container management frameworks (e.g. Kubernetes) certainly do change the meaning of "local" storage, will be interesting to see how that plays out.