Hacker News new | ask | show | jobs
by make3 235 days ago
the models are huge, so not a single (latest gen) one can fit on a single GPU.

It's likely that these are small unpopular (non flagship) models, or that they only pack eg one layer of each model.

1 comments

Per the very short article, the solution was to pack multiple models per GPU.
yes but that could mean a layer per model