Y
Hacker News
new
|
ask
|
show
|
jobs
by
euclaise
996 days ago
They actually have a performance edge, but they aren't well suited to chat models because you can't do caching of past states like with decoder-only models