|
|
|
|
|
by kybernetikos
1207 days ago
|
|
There was an interesting comment a while back about the problem of generating "a" or "an" correctly for a token generator. In order to do so, you have to predict what you'll generate next. Smaller models get this wrong. Even chatgpt, which doesn't get this wrong has limits on its ability to look ahead into its own likely output. I suspect that this is just a difficult task for a token generator and to fix it naturally requires a much bigger model. All these hacks that fix problems by maintaining a "train of thought" are fascinating though, given that we seem to have evolved a similar hack. |
|