Hacker News new | ask | show | jobs
by Veedrac 2361 days ago
The appropriately large models with public recognition I know of use attention, which is too memory-hungry to work effectively on the CS-1. The datasets aren't the issue.

I'm fine with skepticism. It's certainly plausible that they don't actually do all that well.