| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by djtango 502 days ago
	This is really exciting. Is anyone familiar with this space able to point to prior art? Have people built similar frameworks in other languages? I know different people have worked on dataflow and remember thinking Materialize was very cool and I've used Kafka Streams at work before, and I remember thinking that a framework probably made sense for stitching this all together

3 comments

benrutter 502 days ago

From first glance it looks conceptually pretty similar to some work in the data-science space, I'm thinking of spark (which they mention in their docs) and dask.

My knee-jerk excitements is that this has the potential to be pretty powerful specifically because it's based on Rust so can play really nicely with other languages. Spark runs on the JVM which is a good choice for portability but still introduces a bunch of complexities, and Dask runs in Python which is a fairly hefty dependency you'd almost never bring in unless you're already on python.

In terms of distributed Rust, I've had a look at Lunatic too before which seems good but probably a bit more low-level than what Hydro is going for (although I haven't really done anything other than basic noodling around with it).

tomnicholas1 501 days ago

I was also going to say this looks similar to one layer of dask - dask takes arbitrary python code and uses cloudpickle to serialise it in order to propagate dependencies to workers, this seems to be an equivalent layer for rust.

FridgeSeal 501 days ago

This looks to be a degree more sophisticated than that.

Authors in the comments here mention that the flo compiler (?) will accept-and-rewrite Rust code to make it more amenable to distribution. It also appears to be building and optimising the data-flow rather than just distributing the work. There’s also comparisons to timely, which I believe does some kind of incremental compute.

conor-23 501 days ago

One of the creators of Hydro here. Yeah, one way to think about Hydro is bringing the dataflow/query optimization/distributed execution ideas from databases and data science to programming distributed systems. We are focused on executing latency-critical longrunning services in this way though rather than individual queries. The kinds of things we have implemented in Hydro include a key-value store and the Paxos protocol, but these compile down to dataflow just like a Spark or SQL query does!

Paradigma11 501 days ago

It looks like a mixture between Akka (https://getakka.net/ less enterprisy than the Java version), which is based on the actor model and has a focus on distributed systems, and reactive libraries like rx (https://reactivex.io/). So maybe https://doc.akka.io/libraries/akka-core/current/stream/index... is the best fit.

Cyph0n 501 days ago

Worth mentioning Pekko, the Akka fork.

https://pekko.apache.org/

haolez 501 days ago

Is this an active fork?

necubi 501 days ago

In 2022 Lightbend relicensed Akka from Apache 2.0 to BSL, which was a huge problem for all of the other opensource projects (like Flink) that used it as part of their coordination layer. At this point most or all of them have moved to Pekko, which is a fork of the last release of Akka under Apache 2.0.

Cyph0n 501 days ago

Seems like it based on GH activity, but I don’t know for sure.

https://github.com/apache/pekko

pradn 500 days ago

An important design consideration for Hydro, it seems, is to be able to define a workflow in a higher level language and then be able to cut them into different binaries.

Is that something Akka / RX offer? My quick thought is that they structure code in one binary.

sitkack 502 days ago

This is a project out of the riselab

https://rise.cs.berkeley.edu/projects/

Most data processing and distributed systems have some sort of link back to the research this lab has done.

sriram_malhar 501 days ago

> Most data processing and distributed systems have some sort of link back to the research this lab has done.

Heh. "most data processing and distributed systems"? Surely you don't mean that the rest of the world was sitting tight working on uniprocessors until this lab got set up in 2017!

necubi 501 days ago

I assume they're talking about the longer history of the distributed systems lab at Berkeley, which was AMP before RISE. (It's actually now Sky Lab[0], each of the labs live for 5 years). AMP notably is the origin of Spark, Mesos, and Tachyon (now Alluxio), and RISE originated Ray.

[0] https://sky.cs.berkeley.edu/

conor-23 501 days ago

There is a nice article by David Patterson (who used to direct the lab and won the Turing Award) on why Berkeley changes the name and scope of the lab every five years https://www2.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-... . Unfortunately, there's no good name for the lab across each of the five-year boundaries so people just say "rise lab" or "amp lab" etc.

irq-1 501 days ago

Interesting.

> Good Commandment 3. Thou shalt limit the duration of a center. ...

> To hit home runs, it’s wise to have many at bats. ...

> It’s hard to predict information technology trends much longer than five years. ...

> US Graduate student lifetimes are about five years. ...

> You need a decade after a center finishes to judge if it was a home run. Just 8 of the 12 centers in Table I are old enough, and only 3 of them—RISC, RAID, and the Network of Workstations center—could be considered home runs. If slugging .375 is good, then I’m glad that I had many 5-‐year centers rather than fewer long ones.

(Network of Workstations > Google)

sriram_malhar 501 days ago

Right .. the AMPLab was set up in 2011. The Djikstra prize for distributed computing was set up in 2006 .. people like Djikstra and Lamport and Jim Gray and Barbara Liskov won Turing Awards for a lifetime's worth of work.

Now, Berkeley has been a fount of research on the topic, no question about that. I myself worked there (on Bloom, with Joe Hellerstein). But forgetting the other top universities of the world is a bit ... amusing?

Let's take one of the many lists of foundational papers of this field:

http://muratbuffalo.blogspot.com/2021/02/foundational-distri...

How many came out of Berkeley, let alone a recent entry like the AMPLab?

sitkack 501 days ago

You are mischaracterizing my comment, what I said was true. Most distributed systems work (now) has a link back to Berkeley distributed systems labs. Someone wanted context about Hydro (Joe Hellerstein).

I am not going to make every contextualizing comment an authoritative bibliography , you of all people could have added that w/o being snarky and starting this whole subthread.

sriram_malhar 501 days ago

> Most distributed systems work (now) has a link back to Berkeley distributed systems labs.

I didn't think you were saying that most distributed systems work happening at Berkeley harks back to earlier work at Berkeley. That's a bit obvious.

The only way I can interpret "most distributed systems work now" is a statement about work happening globally. In which case it is a sweeping and false generalization.

Is there another interpretation?

sitkack 501 days ago

correct

macintux 501 days ago

I too was a bit surprised by the assertion, but it doesn't say "ancestry", just "link".

And I'm guessing if you include BOOM[1], the links are even deeper.

[1] http://boom.cs.berkeley.edu