Hacker News new | ask | show | jobs
by gcaprio 5730 days ago
I'm actually glad someone brought this up. I'm starting a company around this very idea: making data available and consumable. Our first site is up:

http://www.cfbreference.com

There's about 5 years of data that we've culled from the NCAA about CFB. We're adding more every week and will soon go back in time for historical data.

But, our twist is that the site will be upgraded to be a completely consumable site. Full REST API support, dynamic url data generation and more. We're adding new stuff every day. So you can get the data you way in JSON, RDF, XML & HTML depending on your Accept header, querying string parameter and even url parameters.

We are going to try and build apps on top of this date, but data sites are and will remain FREE. We want to encourage community participation contributions. That means free for anyone, anywhere even if you yourself don't contribute data.

We're also going to add scoring / charting apps for mobile phones so that you can chart your own games and, if you'd like, contribute the data back to use.

We're not 100% there yet, but I'll post here when we are. We'd love feedback from the entire HN community, not only on the sports data aspect but on the technical implementation. After all, if it's not easy to use & powerful, we're not doing a good enough job.

1 comments

It definitely seems that there must be an opportunity here to compete against STATS and others. But if (like you) you wanted to compete where do you legally get the stats in the first place or in an ongoing manner. What is the original source of the data? Do you have to pay for game tapes, watch them and compile the stats yourself?
I'm also curious as to how you get the data. The fact that there seems to be almost a monopoly by STATS on the data leads me to believe that there needs to be more competition in this space. Curious as to how you guys are going about it.
The NCAA publishes their data as batch feeds for use. We're starting with just using that. In addition to the current data they do that for ( 2000- ) a lot of old data is available dating back to the 1800's in various formats. ( I'm speaking of college football here in particular ).

However, I believe there are certain restrictions surrounding the real time access of data. We haven't come up against that obstacle yet, since we're first attacking historical archiving of data after the fact, even if it's only 1 day after the games have been played.