|
Yes, shipping computations instead of data is a reasonable design goal. Your proposed system only works when the predicate is independent across all logs though, correct? If you have to correlate or join your logs to anything, then this model becomes more complex. Not to mention, you're adding an additional performance tax to your prod machines which could be more costly than shipping logs to a centralized store. (A team should profile and make a tradeoff decision appropriate to their context.) Additionally, what happens when we want to correlate these logs with tens of other systems? I guess I don't agree that distributed log analysis simplifies the problem any more than centralized log analysis does. If the primary concern is cost, then you can save equivalent amounts of money with a different lifecycle policy for centralized logs. EDIT: Btw, don't get me wrong, you are asking the right questions that HubSpot's performance team should be asking. The first phase of a cost savings program should observe benefits against cost, or stated another way, requirements vs cost. You're asking the right question, i.e., uhm, how do we actually use this data after we log it? I find it striking that this cost analysis didn't say anything about the end-user's use cases or benefits. Sure, we can optimize a system and save 40% the cost, but what if no one is using the system? Then we could save 100% the cost. |
Setting aside that the human time required for the investigation was probably close to $40-50, it was still not a slam dunk to get the business to shrink retention to a few days for critical debug.