Hacker News new | ask | show | jobs
by zsch 4777 days ago
The 2012 data I used as the basis of my program actually had the same thing you describe – the longest streak was an 8 game losing streak despite having more wins than losses overall.

And I understand exactly where you're coming from. This is very preliminary, and if anything it was good coding practice for me. Though I very much intend to incorporate more significant factors like the lineup, the opposing team, and their history.

1 comments

First improvement: do this for every team ever. Then combine for all teams, first in an individual season, then try basing the win% iteratively based on more history.

Based on these models, you should have some good examples of selection bias, and see how the model changes based on what you are not testing for, but what is implicit in the data (since data is merely a set of samples of data generated by one iteration of the (unknowable to some degree) true talent functions for each team (player, lineup decision, injury, close call by an ump, etc.)

If you're interested in going down the rabbit hole, there's tons of people who can show the way (and they're nice! At least tangotiger is way nicer than he should be in listening to people who have put no effort in understanding what is good and what is beginner's blind bliss)

Hot and cold streaks are just random variance, so is whether balls are hit within reach of fielders or safely out of reach, given a certain contact quality (ground ball, fly ball, infield pop up, or line drive all have vastly different tendencies to fall for a hit - line drives ~.600-700 babip if I recall, FB ~ low .200ish, GB ~ .300, pop up 0ish?) point is these are all known, to se degree, given the historical data.

If anyone wants to explore this stuff further let me know & I can point you to the right spots to help a specific interest?