Y
Hacker News
new
|
ask
|
show
|
jobs
by
MichaelMoser123
394 days ago
deepseek-v2,v3,r1 are all using multi-headed attention.