| HN Mirror

Yes, exactly — normalization is really useful for reasons of quality and correctness, but generally not so important for data like logs that's rotating through the system on a pretty constant basis.

And addressing the parent's point on databases: they don't look like an RDMS, but you can kind of think of log management/querying systems like Splunk et al. to be like a specialized database with specific properties:

- Flexible indexing: Logs change frequently which makes keys come and go, so it's convenient not to not have to be constantly building new indexes to make them searchable.

- Optimized for recent data: Newer logs tend to be accessed relatively frequently and older logs much more rarely (if ever), so it's generally a good idea for these systems will rotate data through different tiers of storage as they age — the new on fast machines with fast disks, the old on slower machines with large disks, and the very old probably just in S3 or something.

- High volume: Any of the traditional relational databases would have a lot of trouble with the volume of data that we put through Splunk. (That said, its problem domain is more constrained — it scales horizontally much more easily because it doesn't have have to concern itself with things like consensus around write consistency.)