Hacker News new | ask | show | jobs
by cavisne 836 days ago
A recent presentation on the architecture

https://youtu.be/WQDMKTEgQnY?si=W0E9Kq6P280l3Wcl

IMO we still need an MLPerf submission or similar to really understand if this is more efficient or more efficient only if you also want to minimize latency.

Nvidia has pulled enough rabbits out of the hat when it comes to MLPerf I’m still not convinced they can’t work some CUDA magic and undercut them on efficiency.