|
|
|
|
|
by rewmie
944 days ago
|
|
> It sounds like they were in a place that a lot of companies are in where they don't have a single pane of glass for observability. One of the biggest features of AWS which is very easy to take for granted and go unnoticed is Amazon CloudWatch. It supports metrics, logging, alarms, metrics from alarms, alarms from alarms, querying historical logs, trigger actions, etc etc etc. and it covers each and every single service provided by AWS including metaservices like AWS Config and Cloudtrail. And you barely notice it. It's just there, and you can see everything. > One of the terrible mistakes I see companies make with this tooling is fragmenting like this. So much this. It's not fun at all to have to go through logs and metrics on any application,and much less so if for some reason their maintainers scattered their metrics emission to the four winds. However, with AWS all roads lead to Cloudwatch, and everything is so much better. |
|
Most of my clients are not in the product-market fit for AWS CloudWatch, because most of their developers don't have the development, testing and operational maturity/discipline to use CloudWatch cost-effectively (this is at root an organization problem, but let's not go off onto that giant tangent). So the only realistic tracing strategy we converged upon to recommend for them is "grab everything, and retain it up to the point in time we won't be blamed for not knowing root cause" (which in some specific cases can be up to years!), while we undertake the long journey with them to upskill their teams.
This would make using CloudWatch everywhere rapidly climb up into the top three largest line item in the AWS bill, easily justifying spinning that tracing functionality in-house. So we wind up opting into self-managed tooling like Elastic Observability or Honeycomb where the pricing is friendlier to teams in unfortunate situations that need to start with everything for CYA, much as I would like to stay within CloudWatch.
Has anyone found a better solution to these use cases where the development maturity level is more prosaic, or is this really the best local maxima at the industry's current SOTA?