It’s an imperial measure of the number of sentences in a story. The metric version is the “Gilgamesh”, a reference to a prototype story maintained by ISO in Paris.
It's a relative abstract measure of case size that's the same across experience levels. A junior and a senior should both be able to agree that a given case is small/medium/large relative to the kind of cases their team usually handles, even if the case would take two hours for the senior and two weeks for the junior. Story points codify small/medium/large into numbers (Fibbonacci is a common choice, like 1, 2, 3, 5, 8, 13, where 13 is often "too big for one sprint").
Mapping story points to time doesn't really work for individual cases because of those different experience levels, it's going to heavily depend on who does the case. Instead, you track story points competed in total for the team for the entire sprint - the different experience levels average out into something consistent, like 30-35 story points per sprint.
"Velocity" is related scrum terminology, and is the mapping of that whole-team measure back to time. A previous team that understood how this worked and stuck to it had those story points per two-week sprints, so we could estimate things months out with reasonable accuracy despite the different skill levels.
I also thought this post was going to be about story points because it's a common complaint from people who don't understand the "different experience levels" part. If everyone on the team reliably took the same amount of time for a given case, then yeah, you could cut it out and just estimate in time. But it's not for that.
I had a manager institute PERT estimations for every task/sticky, which was interesting but not necessarily worth it.
In the end, the work takes the time it takes, and nobody knows how long that will be ahead of time. Fiddling around with estimates helps with ranking but not prediction.
If the work takes the time it takes and nobody knows how long with that, why not track and iterate on the predictions versus outcomes creating experience and data that would enable prediction and prediction refinement?
Over time the estimates should be trending closer to outcome, as the process improves in breaking down and specifying the details that impact prediction & work, and the statistical gap from previous estimates gets baked into future estimates. The process, capabilities, ability to identify diverging factors, and correction of initial estimates should all be maturing concurrently.
The entire point of using fuzzy numbers is to enable fuzzy yet usable predictions. Similar work in a similar situation, armed with specific statistics and outcome, should be highly predictable at the team and individual level over time.
Sure, that’s the theory behind middle management. Unfortunately the bosses who run the places keep saying yes to things that have never been done before and for which few priors exist.
Alternately, it captures a bit of so many things (tech-debt in codebase, mental health of team, task risks) that it's best to avoid trying to link it to any one thing.
The past X weeks of point-estimates is what you use to forecast which things fit in the next Y weeks, and you can't have both stability and forecast accuracy. Any attempt to permanently "peg" a point to a certain number of man-hours is going to interfere with that accuracy.