Hacker News new | ask | show | jobs
by eugenhotaj 1687 days ago
This is neat for toy problems but I don't see it working well for "real" pipelines. The magical DAG creation is going to be super hard to wrap your head around and even worse to debug.

This reminds me of an internal Google tool for doing async programming in Java (ProducerGraph or something). The idea was that you'd just write annotated functions and the framework would handle all the async stuff. Wasted many thousands of engineering hours while giving an even worse experience than manipulating futures directly.

2 comments

One of the authors here — I think that systems that produce things magically often run into this burden. That doesn’t mean that the magic is bad, per se, rather that the thought put into operationalizing it (the user experience) is often << than that put into executing it (the engine).

In our case there’s very little magic, and a dead simple engine (so far). We (from a DS perspective) have found that expressing transforms in this manner simplifies the code making it well worth the added layer of abstraction. As a platform team we’ve focused on debugging (viz, etc…), but this is part of the reason we open sourced it! The more voices we get the more operational concerns we can iron out to avoid painful debugging experiences.

A lot of these libraries also suffer from really ugly syntax, vs e.g. my library which overloads operators and attempts to "transparently" plug in to python's native syntactic sugar https://github.com/timkpaine/tributary
Looks like an interesting (& powerful) library - what problem is it trying to solve exactly? That wasn't clear from the README (the greek's library doesn't shed any more light on things either).
I could be mistaken, but I think it's aimed at domains where you want a reactive experience similar to what you might get if you build a complicated Excel spreadsheet.

In a spreadsheet you're basically building a DAG (assuming you're using it right) that automatically and efficiently recalculates the downstream nodes whenever you change any of the input nodes.

I think it's actually a pretty hard experience to recreate in many programming languages. I think in Python the thing that comes closest these days is Streamlit, but it can still be a lot slower to put together something than a really fast Excel jockey.

Suppose you had a HamiltonFrame that knew the graph that generated it and knew which columns were inputs and which were outputs. When you update any of the input values it could automatically recalculate any columns downstream of the modified inputs.

This project has similar motivations but explains itself a bit more I think. https://opensource.janestreet.com/incremental/

EDIT: Just realizing belatedly that since it was the author of Tributary* who mentioned it he can probably explain himself better than I can...

*along with roughly 450 cool Jupyter extensions.