Hacker News new | ask | show | jobs
Pool spare GPU capacity to run LLMs at larger scale (github.com)
11 points by i386 80 days ago
3 comments

> MoE models via expert sharding with zero cross-node inference traffic

This makes the whole project questionable

This is very promising, definitely looks more user friendly than exo. Can't wait to try it out.
You lost me on "spare GPU". I don't have any capable GPUs, let alone spare ones :)