|
|
|
|
|
by Veedrac
2361 days ago
|
|
The appropriately large models with public recognition I know of use attention, which is too memory-hungry to work effectively on the CS-1. The datasets aren't the issue. I'm fine with skepticism. It's certainly plausible that they don't actually do all that well. |
|