|
|
|
|
|
by Forricide
479 days ago
|
|
Fixing everything is impractical, but I'd say a safer rule of thumb would be to at least understand small strangenesses/errors. In the case of things that are hard to fix - e.g. design/architectural decisions that lead to certain performance issues or what have you - it's still usually not too time consuming to get a basic understanding of why something is happening. Still better to quash small bugs and errors where possible, but at least if you know why they happen, you can prevent unforeseen issues. |
|
Nobody wants to triage an issue for eight weeks, but one thing to keep in mind is that the more difficult it is to triage an issue the more ignorance about the system that process is revealing - if your most knowledgeable team members are unable to even triage an issue in a modest amount of time it reveals that your most knowledgeable team members have large knowledge gaps when it comes to comprehending your system.
This, at least, goes for a vague comprehension of the cause - there are times you'll know approximately what's going wrong but may get a question from the executive suite about the problem (i.e. "Precisely how many users were affected by the outage that caused us to lose our access_log") that might take weeks or months or be genuinely nigh-on-impossible to answer - I don't count questions like that as part of issue diagnosis. And if it's a futile question you should be highly defensive about developer time.