Y
Hacker News
new
|
ask
|
show
|
jobs
by
namibj
905 days ago
You can easily use pipeline-parallelism though. Especially if you have 8-16 lanes of PCIe4 with direct P2P access between the cards.
IIRC you want micro-batching though, to overlap pipeline phases.