We tend to operate on sever performance metrics, error rates, and networking metrics. We've found through practice that these metrics tend to reveal most issues that we're targeting.
If you meant in the more mathematical sense we perform our clustering in a normalized euclidean space.
It appears to cluster on a particular server metric (e.g. % CPU used), for readings of that metric across a group of servers. One of the example graphs had 'errors per servers,' and I'd presume each color on that graph represents a different server.
If you meant in the more mathematical sense we perform our clustering in a normalized euclidean space.