Hacker News new | ask | show | jobs
by kaon_ 728 days ago
I would love to have your advice. What tool would you recommend to do straightforward ETL's as a single developer? Think of tasks like ETL-ing data from Production to Test or Local. Or quickly combining data from 2 databases to answer some business question.

Six years ago I used Pentaho to do it. And it worked really well. It was easy and quick. Though maintenance was hard sometimes and it felt very dated: The javascript version was ancient, I could find a lot of questions answered online, but they were usually 5-10years old. I am wondering whether I should use something like Amphi for my next simple-ETLs.

3 comments

I've gotten some quick wins with Benthos (now RedPanda Connect) but I agree it's an unsolved problem as there are typically gotchas.

If you can get a true CDC stream from the database to analytics, that would be ideal, but when that isn't available you spend 100x more time trying to bodge together an equivalent batch/retry system.

I also want to know that. The BI team where I work still uses Pentaho. It's buggy and ugly, but it gets the job done most of the time. A few of them know a little of python, so a tool like Amphi could be the next stage.
clickhouse can enable all the things you mentioned