| HN Mirror

Well the storage was crazy efficient, I kept checking to make sure it was recording what I thought it was. No way 50M hits could fit in a file that small...

Timestamps, especially north of 1000 hits/sec have many bits in common. URL, Referrer, and IP address where all just indexes. That worked really well because it was storage efficient, and made various queries like "who hit this URL", "who is our top referrer" and the like very efficient. Things that used to require ingesting a months worth of logs and spitting out a report would often be answered with a simple SQL query.

All in all using indexed columns was a huge win.