| Author here! These 10B log lines are from the last 60 days of activity from https://gocardless.com/ systems. It includes: - System logs, such as our Kubernetes VM host logs, or our Chef Postgres machines - Application logs from Kubernetes pods - HTTP and RPC logs - Audit logs from Stackdriver (we use GCP for all our infrastructure) > do you ever do anything with them that requires you to store so much data rather than just a representative subset? Some of the logs are already sampled, such as VPC flow logs, but the majority aim for 100% capture. Especially for application logs, which are used for audit and many other purposes, developers expect all of their logs to stick around for 60d. Why we do this is quite simple: for the amount of value we get from storing this data, in terms of introspection, observability and in some cases differentiated product capabilities like fraud detection, the cost of running this cluster is quite a bargain. I suspect we'll soon cross a threshold where keeping everything will cost us more than it's worth, but I'm confident we can significantly reduce our costs with a simple tagging system, where developers mark logs as requiring shorter retention windows. Hopefully that gives you a good answer! In case you're interested, my previous post mentioned how keeping our HTTP logs around in a queryable form was really useful for helping make a product decision: https://blog.lawrencejones.dev/connected-data/ |