|
|
|
|
|
by Grosvenor
612 days ago
|
|
https://news.ycombinator.com/item?id=36871528 Hah. Yes. It looks like they only show up in models with 6.7B parameters or more. The problem can start at 125M. Small enough to test on a whim. So train a model that exhibits these behaviours, then try it out. |
|