Hacker News new | ask | show | jobs
by peterkelly 46 days ago
I've always been of the view that for a workflow language, you should use a proper, turing-complete functional language which gives you all the usual flexiblity for transformations on intermediate data, while also supporting things like automatic parallelisation of things like external, compute-intensive tasks.

I recommend checking out https://github.com/peterkelly/rex and also my PhD thesis on the topic https://www.pmkelly.net/publications/thesis.pdf.

The gap in flexiblity between DAG-only and a full language designed for the task is a significant one.

8 comments

Completely agree with this view. I wish I had seen this thesis earlier when developing redun (https://github.com/insitro/redun/). Looks like a lot of very useful ideas for defining such a system.

We based redun's execution model on very similar ideas of functional programming and graph reduction. In addition, we made it work as an embedded DSL within Python, so one can easily use all the typical data science and ML libraries in a workflow. This has been very helpful for building biotech workflows (genomics, imaging, chem).

I am a bit surprised why many workflow systems shy away from full turing-complete. You usually don't need to trade that away for automatic parallelism, caching, etc.

I guess that ship has sailed and also it's maybe nitpicking but I find it a bit unfortunate to call a new programming language "Rex" when there's already "Rexx" for several decades.
Yes … config-as-code for orchestration is a mess. A DSL is just kicking the can down the road. Synchronous orchestration is good but you’ll need a lot of utility functions for fan-outs and the like. It is helpful to utilize both synchronous and asynchronous code. It is very difficult to do well. I contributed to Flyte V2 OSS which does a fairly pleasant job.
I wonder, isn‘t any Lisp, be it Clojure, Scheme, etc. not exactly suited for such tasks?
Do you implement a DAG within your system to act as a kind of well-defined backbone for analysis and execution, or do you dispense with (explicit) DAGs entirely?
Looks cool.

That's kind of my (not the project's) vision for PRQL - a general LINQ type embeddable data transformation language.

Unfortunately no time to work on it these days.

redun is quite interesting in this regard

https://insitro.github.io/redun/

Spark in Scala does the ETL part of this well. The orchestration part is another story.