Hacker News new | ask | show | jobs
by Dylan16807 205 days ago
Depends on what you're doing. I'm pretty sure the bandwidth for inference isn't much.
1 comments

Depends, if it's tensor parallel or pipeline parallel. Only PP doesn't pass too much. TP does