| HN Mirror

Elo converges on stable scores fairly quickly, depending on the K-factor. I wouldn't think it would be much of an issue at all for something like this, since you can ensure you're testing against every other member (avoiding "Elo islands"). But obviously the more trials the better.

The Glicko rating system is very similar to Elo, but it also models the variance of a given rating. It can directly tell you a "rating deviation."