Hacker News new | ask | show | jobs
by 0x74696d 4110 days ago
Yeah, using the Cloudfront CDN logs is a very similar approach. If we'd known about Snowplow and it was production-ready when this service were written, it would certainly be worth looking at.

That being said, this service does double-duty for us: when an analytics ping comes in from the video player we're also using that same data to write "bookmarks" for resuming play. This data has different liveness requirements than the analytics and can't be done as a nightly batch. So we'd end up having to write our own Collector (in the Snowplow architecture) anyway to perform that fan-out of incoming events.

1 comments

Hey 0x74696d - Snowplow co-founder here. Very cool post! The Snowplow Kinesis architecture gives you "fan-out" for free - you could write your bookmarking service as a Kinesis KCL app which reads from the Kinesis enriched event stream written by our Kinesis Enrich (https://github.com/snowplow/snowplow/tree/master/3-enrich/sc...). In this case you'd be using our Scala Stream Collector (Spray with a Kinesis back-end), not the CloudFront CDN Collector.
Nice. If I was starting this project today (it's pretty mature at this point) that'd definitely be something I'd look into.

That being said, when Kinesis previewed it had only Java bindings for KCL. Not sure if that's still the case, but that'd be a limiting factor for our shop unfortunately.

You're right 0x74696d - originally the KCL was Java only. The Java KCL now includes something called the MultiLangDaemon, which means you can write apps in other languages. There is an official Python KCL (https://github.com/awslabs/amazon-kinesis-client-python) but no others I know of yet. There's also AWS Lambda for processing Kinesis streams with JavaScript, and of course you can use Storm or Spark Streaming, although those are JVMish too.