Hacker News new | ask | show | jobs
by kodablah 3306 days ago
For those wanting to play w/ the data, there are a lot of resources [0]. I personally have combined older retrosheet data [1] with modern MLB data to some neat uses, not the least of which to try out tech like Druid (big data, live slicing, etc). E.g. If you wanted data from Sunday's Houston vs Texas game, GDX has tons of XML for parsing at [2]. There are plenty of guides that tell you what is what of course. It has been on my mind to develop a tensorflow graph trained w/ existing data to help me win some FanDuel/DraftKings money, but I haven't as of yet (and I should note the MLB data has restrictions against bulk or commercial use).

0 - https://github.com/baseballhackday/data-and-resources/wiki/R...

1 - http://retrosheet.org/

2 - http://gdx.mlb.com/components/game/mlb/year_2017/month_06/da...

2 comments

Thank you so much for these links. Recently I've thought about building something data viz and stats related using baseball stats.

I know MLBAM has a bunch of data they keep to themselves, but I should definitely be able to find something to play with here. Many thanks for sharing!

Wow, thanks for the resources!