| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gandalfu 4110 days ago
	Why will the US government send all their traffic stats to Google? Why not use piwik? http://piwik.org/ Im I being too paranoid? This is after all third party software served by the government.

2 comments

mlissner 4110 days ago

I think it's a reasonable question and I'm a big fan of piwik, but running it at this kind of scale is very hard. We get about 10k hits/day on the piwik instance we run and it's consistently taking more resources than the application it's tracking.

link

bjelkeman-again 4110 days ago

Are there any other viable open source traffic analysis tools than Piwik? I'd hate to have to roll my own.

link

alexatkeplar 4110 days ago

Snowplow (I'm a co-founder) can happily scale to billions of events per day: https://github.com/snowplow/snowplow

link

e12e 4110 days ago

Judging from the repo: "Collectors receive Snowplow events from trackers. Currently we have three different event collectors, sinking events either to Amazon S3 or Amazon Kinesis" (etc) -- it's still not viable to self-host snowplow on own hardware/internal cloud etc? Or is it possible, but you need to run a full cloud? (I understand why one would want a setup that runs on Amazon, if one uses amazon, but when you host your own infrastructure, a self-host option would be nice ... if viable).

Without an option to self-host, snowplow isn't really an alternative to pwiki.

link

alexatkeplar 4110 days ago

Hey e12e! It's a great question. You are right - at the moment Snowplow is still tied to the AWS cloud; we use a variety of AWS services which support massively horizontal processing, including Elastic MapReduce, Kinesis and Redshift. We are working on a Kafka+Samza version of Snowplow which we will release later this year, most likely running on a Mesos cluster that you can deploy where you want.

link

bjelkeman-again 4110 days ago

We have to move away from US hosted services, so we have to wait for the Kafka+Samza version if we go that route. Thanks!

link

jbnicolai 4110 days ago

https://github.com/divolte/divolte-collector is quite nice and can handle extreme loads

link

bjelkeman-again 4110 days ago

That is interesting too for us, as Kafka is possibly in our future too. Thanks!

link

gesman 4110 days ago

I building something based on Splunk. Open source + free Splunk license.

http://www.mensk.com/traffic-ray-new-splunk-app-to-visualize...

link

castell 4110 days ago

Especially interesting would be a Go or Node.js project that use a caching layer (like Memcached) to scale better than writing directly to a SQL/NoSQL database.

link

ap22213 4110 days ago

Having worked with the US government, I'm guessing that they don't really know what 18F is doing, or the implications.

link

gandalfu 4110 days ago

The implications are really scary. Im the first one to applaud the new digital office initiative and the talent behind, but when it comes to third party software the government should trust no one, not matter their competence.

Scary scenario #1: All of my interactions with the government are known to google.

Scary scenario #2: Google CDN is compromised and malware is served to everyone!

link