| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IanCal 3350 days ago

It'd be good to see some comparisons, why this and not one of the other currently available systems? Why should I use this over, for example, Luigi?

What scale is this intended for?

Is it intended to nearly solve a simple problem over my 20TB of data on S3? Big complex graphs? Or more for transitioning a small local report system that's currently in three excel files into a tested python script?

3 comments

rdorgueil 3350 days ago

It's indeed intended for «small data», by opposition to «big data». I know, that does not say much, but I basically wanted to handle small flux of data without having to install the "big weapons".

I'm preparing explanation pages for a lot of the questions I got, including comparisons, volumes of data, where it is good and where it is not ...

All that will be well ready before 1.0, but for now, we're at 0.2 ...

Thanks for all the hackerlove, though!

link

rdorgueil 3350 days ago

With the ancestor of bonobo, I was processing 5M lines of data in around 1 hour, including extraction, joins, api calls and a few loads. That should give a first info about the size target.

link

ah- 3350 days ago

From looking at their examples and interfaces, it's clearly for simple, small scale processing.

link