Hacker News new | ask | show | jobs
by karmasimida 622 days ago
I mean it doesn’t necessarily needs 2x QK to match that performance, in terms of accuracy, of a regular transformer right?