Hacker News new | ask | show | jobs
by brucethemoose2 1186 days ago
Not if the bus is wide enough :P. EPYC Genoa is already ~450GB/s, and the M2 max is 400GB/s.

Anyway, what I was implying is that simply fitting a trillion parameter model into a single pool is probably more efficient than splitting it up over a power hungry interconnect. Bandwidth is much lower, but latency is also slower, you are shuffling much less data around.