|
|
|
|
|
by rdorgueil
3349 days ago
|
|
Bonobo runs each functions in the pipeline in parallel and make the fifo queues plumbing and thread pool management completely transparent. The TLDR would then be "Write some generators or functions, link them in a graph, and call them in order on each line of data as soon as the previous transformation node output is ready.". For example if you have a database cursor that yields each line of a query as its output, it starts to run the next step(s) in the graph as soon as the first result is ready (yet not stop yielding from database until the graph is done for the current row). I did not find it easy to do with the libraries I tried. The docs clearly lacks completion to say the least, and would need an example with a big dataset, one with long individual operations and one with a non linear graph, so it's more obvious that, of course, it's not made to process strings to uppercase twice in a row. Stay tuned, I'm very happy HN brought it to homepage, did not really think it could happen at this stage though and I understand you. But that's a good thing for the project to move forward. |
|
Python is my usual language of choice, but recently I picked up Go for some data processing because there was a lot of benefits to parallelising the task - which Go made easy.