|
|
|
|
|
by zsch
4777 days ago
|
|
The 2012 data I used as the basis of my program actually had the same thing you describe – the longest streak was an 8 game losing streak despite having more wins than losses overall. And I understand exactly where you're coming from. This is very preliminary, and if anything it was good coding practice for me. Though I very much intend to incorporate more significant factors like the lineup, the opposing team, and their history. |
|
Based on these models, you should have some good examples of selection bias, and see how the model changes based on what you are not testing for, but what is implicit in the data (since data is merely a set of samples of data generated by one iteration of the (unknowable to some degree) true talent functions for each team (player, lineup decision, injury, close call by an ump, etc.)
If you're interested in going down the rabbit hole, there's tons of people who can show the way (and they're nice! At least tangotiger is way nicer than he should be in listening to people who have put no effort in understanding what is good and what is beginner's blind bliss)
Hot and cold streaks are just random variance, so is whether balls are hit within reach of fielders or safely out of reach, given a certain contact quality (ground ball, fly ball, infield pop up, or line drive all have vastly different tendencies to fall for a hit - line drives ~.600-700 babip if I recall, FB ~ low .200ish, GB ~ .300, pop up 0ish?) point is these are all known, to se degree, given the historical data.
If anyone wants to explore this stuff further let me know & I can point you to the right spots to help a specific interest?