Hacker News new | ask | show | jobs
by erickj 641 days ago
If I was going to build this prototype I'd start with just a semistructured textual play by play recap as the input. Also including roster, injury, amd schedule information with a fairly basic prompt would probably go a long way.

This data exists for most live games at this point via various web services. I'm sure espn has significant resources internally to source that info

2 comments

I don't think ESPN does anything that takes significant resources. That's all handled by SportsRadar or ... there's another big provider but their name alludes me. They basically firehose you all the game information as structured data and you can use it programmatically however you'd like.
I assume this is what lets baseball games show obscure factoids like "3rd in the NL West when facing left-handed pitchers on Tuesday"?
You have the Elias Sports Bureau to thank for all the fun baseball stats out there https://en.wikipedia.org/wiki/Elias_Sports_Bureau
Definitely. I have no experience in live game statistics, but from my sports content experience I bet there's data scientists and applications behind the scenes that specifically pull this data to be read on-air.
I imagine the primary customers of the data feeds are gambling companies who let people bet on matches that are in progress.
Yeah it feels like the ideal way is to feed in a transcript of the announcer audio + some standard stats. That would ensure you catch both the human stories & the factual content.

But I wonder if there are licensing issues with using the audio/transcript to generate your summary. I know that the raw stats are public domain but I wouldn't be surprised if they can't use the transcripts or audio.