|
|
|
|
|
by pmarreck
1351 days ago
|
|
I wonder what would happen if you stored the columns in separate tables (perhaps pairs of columns?) and queried them with a join off a shared ID (perhaps a view or materialized view?) in order to really take advantage of compression’s ability to compress highly self-similar data located together, highly. Also, I assume you used a smallish blocksize in ZFS because of the frequent small writes? |
|
Timestamps, especially north of 1000 hits/sec have many bits in common. URL, Referrer, and IP address where all just indexes. That worked really well because it was storage efficient, and made various queries like "who hit this URL", "who is our top referrer" and the like very efficient. Things that used to require ingesting a months worth of logs and spitting out a report would often be answered with a simple SQL query.
All in all using indexed columns was a huge win.