Hacker News new | ask | show | jobs
by mxmxm 3316 days ago
Counting views/impressions in combination with Apache Kafka sounds like the ideal use case for a stream processor like Apache Flink. It supports very large state which can be managed off-hand. This should enable you to count the exact number of unique views in real time with exactly once semantics. Here is a blog post on large scale counting with more details. It also includes a comparison with other streaming technologies like Sanza and Spark: https://data-artisans.com/blog/counting-in-streams-a-hierarc...

Also check out this blog post by a Twitter engineer on counting ad impressions: https://data-artisans.com/blog/extending-the-yahoo-streaming...