| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rubenfiszel 1339 days ago

Yes, inputs/outputs is likely the most interesting problems for our diverse specs of flows.

Because data pipeline is not the primary concerns of Windmill, we took the stance that Inputs and Output of steps were simply JSON in, JSON out. For all the languages, we simply extract the JSON object into the different parameters of the main, and then we wrap the return into the respective language native serializer for the output (e.g JSON.stringify in Typescript). Then each step can use a javascript expression executed by v8 to do some lightweight transformation between the output of any step to the input of that step.

A lot of the simplification we made is actually parsing the main function parameters into the corresponding jsonschema, supporting deeply nested objects when relevant.

That works great for automation that do not have big input/outputs, but not for data. So what we do for data is to use a folder that we symlink to be shared by all steps if a specific flag for that flow is set. It also force us to have the same worker process all the steps inside that flow when otherwise flow steps could have been processed by any workers. It is very fast since it's all local filesystem but not super scalable.

I am not pleased with that solution and believe that if we were to expand on the data problem, we would certainly rely on fast network and HDFS/Amazon EFS/etc to simply share that mounted folder across the network.

Anyway, sorry for the rambling but I do feel like we're all taking different approach to the same underlying problem of building the best abstraction for flows and believe we might learn from each other's choices.

ps: congrats Patterns on the launch, the tool look absolutely amazing.