|
|
|
|
|
by lars
2273 days ago
|
|
This is cool. For those who are not super familiar with language processing, I think it's good to point out the limitations of what's been done here though. They mention that professional speech transcription has word error rate around 5%, and that their method gets a WER of 3%. Sure, but the big distinction is that speech transcription must operate on an infinite number of sentences, even sentences that have never been said before. This method only has to distinguish between 30-50 sentences, and the same sentences must exist at least twice in the training set and once in the test set. Decoding word-by-word is really a roundabout way of doing a 50-way classification here. It's an invasive technique, so they need electrodes on a human cortex. This means data collection is costly, so their operating in very low data regime compared to most other seq2seq applications. It seems theoretically possible that this could operate on Google translate level accuracy if the sentence dataset was terrabyte sized rather than kilobyte sized. That dataset size seems very unlikely to be collected any time soon, so we'll need massive leaps in data efficiency in machine learning for something like this to reach that level. They explore transfer learning for this, which is nice to see. Subject-independent modelling is almost certainly a requirement to achieve significant leaps in accuracy for methods like this. |
|
"On the other hand, the network is not merely classifying sentences, since performance is improved by augmenting the training set even with sentences not contained in the testing set (Fig. 3a,b). This result is critical: it implies that the network has learned to identify words, not just sentences, from ECoG data, and therefore that generalization to decoding of novel sentences is possible."