| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andrekandre 123 days ago

   The sgai_rsp_matmul_q4() stub is planned for RSP microcode:

     DMA Q4 weight tiles into DMEM (4KB at a time)
     VMULF/VMADH vector multiply-accumulate for 8-lane dot products
     Estimated 4-8× speedup over scalar VR4300 inference

----

rsp is the gift that keeps on giving; such a forwards-looking architecture (shame about the rambus latency tho)

1 comments

AutoJanitor 123 days ago

We are going to use the gpu 128simd soon but it only has 4kb ram addressable so matmul offload in small chunks!

link

andrekandre 123 days ago

thats such really cool work; i wish i could get payed to do stuff like this, more power to you all ^^

link

AutoJanitor 123 days ago

I am doing this for 0 dollars. I am a self funded ai research lab. So when people diss me i get a little jaded, but then I remember I am doing cool stuff. Even if others don't see it. Thats enough for me!

link