Y
Hacker News
new
|
ask
|
show
|
jobs
by
olq_plo
397 days ago
Very cool idea. Can't wait for converted models on HF.
1 comments
MichaelMoser123
397 days ago
deepseek-v2,v3,r1 are all using multi-headed attention.
link