Hacker News new | ask | show | jobs
by ioedward 1189 days ago
Normally people split up the model across multiple GPUs, i.e. model/tensor parallelism.