| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mlissner 4111 days ago
	I think it's a reasonable question and I'm a big fan of piwik, but running it at this kind of scale is very hard. We get about 10k hits/day on the piwik instance we run and it's consistently taking more resources than the application it's tracking.

1 comments

bjelkeman-again 4110 days ago

Are there any other viable open source traffic analysis tools than Piwik? I'd hate to have to roll my own.

link

alexatkeplar 4110 days ago

Snowplow (I'm a co-founder) can happily scale to billions of events per day: https://github.com/snowplow/snowplow

link

e12e 4110 days ago

Judging from the repo: "Collectors receive Snowplow events from trackers. Currently we have three different event collectors, sinking events either to Amazon S3 or Amazon Kinesis" (etc) -- it's still not viable to self-host snowplow on own hardware/internal cloud etc? Or is it possible, but you need to run a full cloud? (I understand why one would want a setup that runs on Amazon, if one uses amazon, but when you host your own infrastructure, a self-host option would be nice ... if viable).

Without an option to self-host, snowplow isn't really an alternative to pwiki.

link

alexatkeplar 4110 days ago

Hey e12e! It's a great question. You are right - at the moment Snowplow is still tied to the AWS cloud; we use a variety of AWS services which support massively horizontal processing, including Elastic MapReduce, Kinesis and Redshift. We are working on a Kafka+Samza version of Snowplow which we will release later this year, most likely running on a Mesos cluster that you can deploy where you want.

link

bjelkeman-again 4110 days ago

We have to move away from US hosted services, so we have to wait for the Kafka+Samza version if we go that route. Thanks!

link

jbnicolai 4110 days ago

https://github.com/divolte/divolte-collector is quite nice and can handle extreme loads

link

bjelkeman-again 4110 days ago

That is interesting too for us, as Kafka is possibly in our future too. Thanks!

link

gesman 4110 days ago

I building something based on Splunk. Open source + free Splunk license.

http://www.mensk.com/traffic-ray-new-splunk-app-to-visualize...

link

castell 4110 days ago

Especially interesting would be a Go or Node.js project that use a caching layer (like Memcached) to scale better than writing directly to a SQL/NoSQL database.

link