Hacker News new | ask | show | jobs
Building real-time ETL pipelines with Apache Kafka (datacater.io)
22 points by flippingbits 1582 days ago
2 comments

basically an ad.
HN used to be strict about its "no self promotion" and "it must be interesting" rules. Dang probably got tired of constantly fighting about it. Now everyone with a blog treats HN like a dumping ground.

Edit: note that this is the CEO of the site. His account was inactive since 2012, and then in the past 7 months started dumping an ad every month here.

It might be ad like, but there’s a pretty handy diagram explaining Kafka to beginners in there.

I’ve been working to grok Kafka the past little while and that diagram, if accurate stands out from most of the blabber on extended YouTube videos going for ad revenue.

I always recommend the oreilly book 'kafka the definitive guide'. https://kafka.apache.org/books-and-papers
Thank you! Have so many oreillly books not sure why I didn’t think of it :)
I wouldn't say that. Kafka is everywhere in the enterprise and it's adoption is still growing rapidly. This blog post is a decent overview for non-technical people at the very least.
HN adblocker when?
agree
ETL on streams? To be pedantic, does that even make sense? Extract, transform and load... a batch of data. But a stream?
I suggest to try thinking in terms of events instead of streams.

You can extract data change events (e.g., INSERTs) from a data source, transform them with a streaming application (e.g., built with Kafka Streams), and load the transformed events into a data sink.