Hacker News new | ask | show | jobs
by jameshart 1495 days ago
There are certainly academics who collect and study spoken language corpora, not just written - it’s very much a matter of what gets collected and catalogued though. The fact that early citations here are from Usenet speaks to the availability and search ability of that corpus much more than to its role in the origination of written speech. Transcripts of IRC and MUDs and aim chats are not collected and indexed, so they don’t get referenced.

Similarly with spoken corpora it tends to be things like interviews with old people created to preserve dialect recordings, or material from local radio news - rather than random conversations among young people.

I guess by virtue of ‘tape in the studio just kept rolling’ there might be rather more recorded examples of band members chatting away over the years than of other similar aged groups.