I think they're asking how the number to stop at was determined, not what the number stopped at was.
My guess as to determining whether it's 64 attempts to a pass for one and 5 attempts to a fail for another is simply "whether or not the author felt there was a chance random variance would result in a pass with a few more tries based on the initial 5ish". I.e. a bit subjective, as is the overall grading in the end anyways.
That's exactly what it was. It's hard to define a discrete rubric for grading at an inherently qualitative level. Usually more attempts means that it seemed like the model had the "potential" to get across the finish line so I gave it more opportunities.
If there's only a few attempts and ends in a failure, there's a pretty good chance that I could sort of tell that the model had ZERO chance.