Hacker News new | ask | show | jobs
by IanCal 3350 days ago
It'd be good to see some comparisons, why this and not one of the other currently available systems? Why should I use this over, for example, Luigi?

What scale is this intended for?

Is it intended to nearly solve a simple problem over my 20TB of data on S3? Big complex graphs? Or more for transitioning a small local report system that's currently in three excel files into a tested python script?

3 comments

It's indeed intended for «small data», by opposition to «big data». I know, that does not say much, but I basically wanted to handle small flux of data without having to install the "big weapons".

I'm preparing explanation pages for a lot of the questions I got, including comparisons, volumes of data, where it is good and where it is not ...

All that will be well ready before 1.0, but for now, we're at 0.2 ...

Thanks for all the hackerlove, though!

With the ancestor of bonobo, I was processing 5M lines of data in around 1 hour, including extraction, joins, api calls and a few loads. That should give a first info about the size target.
From looking at their examples and interfaces, it's clearly for simple, small scale processing.