|
|
|
|
|
by rjzzleep
1178 days ago
|
|
I've used dozens of platforms, distributed job queues and pipelining tools, including airflow, pachyderm and a bunch of others. most of them turned out to be more effort that it was worth and designed around a very specific use case. Some of them looked fantastic but then had all sorts of weird cases to account for. Kinda like how ArgoCD looks great, but has a bunch of common bugs that nobody seems to care enough about to fix. In the end the most successful platform I built was a custom orm I built around redis objects and queues and the most important part wasn't actually the fancy data processing platform, but actually the details of the container layers, the refactoring of the code to make it easily composable, releasable and easy for the scientists to play with but with enough guard rails so they wouldn't diverge too far from the structure. It made incredibly fast at iterating. Of all the things I worked with Airflow was the one I was most hyped about from all the videos I had seen and which turn out to be the biggest mess of them all. |
|