Hacker News new | ask | show | jobs
Strymonas – A library for OCaml and Scala for fast, bulk, in-memory processing (strymonas.github.io)
92 points by otini 3493 days ago
6 comments

This is an absolutely amazing work that targets several pain points of elegant and efficient (literally costless) stream processing. It seems to me that conceptually template metaprogramming as found in D is the closest practical incarnation of this idea, what do you think?
How do D templates compare to staging in MetaOCaml? I had always thought D templates only expanded during compile time, like C++'s.
MetaOCaml is about run-time meta-programming. There's your key difference.
Is Scala not practical?

Julia's generated functions are also capable of similar designs.

Both Scala and OCaml are very practical languages, I was referring to the actual multi-stage streams implemented in the proposed library. What I think is interesting is the fact that D arrived to a similar conceptual model from a more _practical_ perspective by extending the applications of templates.
I guess the question is rather whether MetaOCaml or the Scala-Virtualized compiler fork + the LMS framework code are practical.

I am not so sure, but I have also not looked into it too deeply. But in the end it doesn't matter as long as it is just a maturity issue. After all I guess the stream library in the submission is to be seen as a motivation for making MPS more mainstream.

This paper[1] also currently on the front page is a great intro to MSP if, like me, you aren't familiar with the paradigm.

I'd be interested in hearing some exameple applications of this, if anyone here uses MSP on streams in serious applications. Killing runtime overhead like this seems very powerful intuitively, but use cases where it would have a significant impact don't immediately jump to mind for me.

[1]: http://www.cs.rice.edu/~taha/publications/journal/dspg04a.pd...

I'm currently working on a new streaming system for a big data real-time-ish application (which is why we can't use existing libraries). We've come up with a design that is very similar to that described in the paper, as I understand it on a quick skim, but we don't use MSP and hence can't perform deforestation. We did actually consider using LMS but decided we weren't justified in doing so without getting some benchmarks first. I think we'll hit our targets without going down the staging route, but it's good to have prior work to build on if we need to!

Linear algebra is the other obvious-to-me application of this. Linear algebra libraries typically either create a lot of unnecessary intermediate arrays or provide ugly imperative APIs (hi BLAS!) and force that busy work onto the programmer. Deforestation is an excellent way to regain performance while still maintaining ease of use. Applications include deep learning, and this is broadly the approach taken by TensorFlow and other libraries.

It's interesting that you bring up Lin Alg... I do a lot of computational geometry work and I've always rolled my own supporting libraries from essentially nothing because I (likely naively) prefer that level of control vs learning some mega-framework's abstractions of choice.

It sounds like I need to read more about MSP and maybe start using it.

I'm not sure I get what's the difference between for example:

    def filterFilterTest (xs : Rep[Array[Int]]) : Rep[Int] =
        Stream[Int](xs)
          .filter(d => d % 2 == 0)
          .filter(d => d % 3 == 0)
          .fold(unit(0), ((a : Rep[Int]) => (b : Rep[Int]) => a + b))
and

    xs.filter(d => d % 2 == 0).filter(d => d % 3 == 0).fold(0)(_+_)
(grabbed from unit tests here: https://github.com/strymonas/staged-streams.scala/blob/maste... )

They both look pretty much the same and superficially do the same thing too, so what makes this new thing better/novel?

I would say that the latter is non-optimal, i.e. generates some unnecessary intermediary values whereas the former is optimal as shown in the paper.
What's the difference with http://nessos.github.io/Streams/ for F#/C# ?
The design and implementation of this library was led by my co-author Nick Palladinos and it ports the design of Java 8 streams to CLR.

Check the structure of this library following the definitions of Run in line 41 [1] upwards. You will notice that the code is in continuation passing style. If you combine the signatures (which are there to handle different cases of execution) they could easily be e.g., ('t -> bool) -> bool. But the implementation is more elaborate than that and note that nothing is rewritten. The design relies on the smartness of CLR/JIT to exploit that structure and inline loops. I have just described a firstly push-based, library design. You can check the differences with pull-based in the Clash of the Lambdas [2] paper.

The strymonas library is pull-based first (combined with the push-based approach) to accommodate more cases in a fundamental pull-based way. On top of that we use multi-stage programming as you saw in the paper as we don't want to rely on a sufficiently smart JIT compiler. We use staging to generate tight-loops, fused, without space leaks.

[1] https://github.com/nessos/Streams/blob/master/src/Streams/St...

[2] https://arxiv.org/abs/1406.6631

The strymonas optimal performance is guaranteed for any combination of stream processors. That doesn't seem to be the case of this library which relies on frequent patterns.
For some reason I had assumed that optimizing streams using MSP had already been done. It seems to be a low-hanging fruit. I haven't read the paper yet, so maybe that (optimized stream library using MSP) alone is not the point at all.
How does it compare to alternatives? Is it comparable to say Akka.Streams?
At the moment I would say Strymonas is a tech demo. If I was reviewing the code there would be a lot of comments along the lines of "seal this trait", for example.

I would expect it to be very much faster than Akka Streams. It has quite a different design as far as I can tell, and doesn't support things like distribution across machines.

I thought that Akka Streams didn't support distribution either, at least not yet. Has that changed?
It does actually. Check out http://doc.akka.io/docs/akka-stream-and-http-experimental/1...., where a distributed echo example is shown.
No, that still has not changed yet.
I am not closely familiar with Akka.Streams but can give you an intuitive explanation about what is new in Strymonas: the computation flow described with functional combinators like map, take and filter is represented as staged data ('a code type in OCaml), this means that what you get is not an actual computation but just a declarative description of all the steps. During compilation the combined code data structure is encoded as a simple loop or unfold computation.

Does that help?