Hacker News new | ask | show | jobs
by brucethemoose2 820 days ago
As I understand it, the WSE-2's interconnect is actually quite good, and models are split across chips kinda like GPUs.

And keep in mind that these nodes are hilariously "fat" compared to a GPU node (or even an 8x GPU node), meaning less congestion and overhead from the topology.