Hacker News new | ask | show | jobs
by sliken 1352 days ago
Heh, my hacked version of this was used for apache logging. I just ran Mysql on top of a ZFS volume with compression turned on and had a column for timestamp, IP, referral IP, URL, action, and return code. I was amazed at how fast it was, how easy/fast it was to query, disk storage efficiency, and was overall quite impressed at how it was nearly as useful as a standard web traffic analysis tool that took significant time to crunch the logs, but worked on live data.
1 comments

I wonder what would happen if you stored the columns in separate tables (perhaps pairs of columns?) and queried them with a join off a shared ID (perhaps a view or materialized view?) in order to really take advantage of compression’s ability to compress highly self-similar data located together, highly.

Also, I assume you used a smallish blocksize in ZFS because of the frequent small writes?

Well the storage was crazy efficient, I kept checking to make sure it was recording what I thought it was. No way 50M hits could fit in a file that small...

Timestamps, especially north of 1000 hits/sec have many bits in common. URL, Referrer, and IP address where all just indexes. That worked really well because it was storage efficient, and made various queries like "who hit this URL", "who is our top referrer" and the like very efficient. Things that used to require ingesting a months worth of logs and spitting out a report would often be answered with a simple SQL query.

All in all using indexed columns was a huge win.