Y
Hacker News
new
|
ask
|
show
|
jobs
by
YetAnotherNick
146 days ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.