Hacker News new | ask | show | jobs
by euclaise 996 days ago
They actually have a performance edge, but they aren't well suited to chat models because you can't do caching of past states like with decoder-only models