Hacker News new | ask | show | jobs
by swyx 1022 days ago
> the AI Horde approach of distributed models seems much more practical anyway.

i wasnt aware this was a term of art. is there a definitive blogpost or product explaining this approach?

1 comments

This is a reference to Kobold Horde, a distributed volunteer network of GPUs that can be inferenced upon.
^

I didn't mean to imply splitting llama up between machines (though that is a thing with llama.cpp), but a pool of clients and servers who make requests and process them:

https://lite.koboldai.net/

A few users with half decent PCs can serve a much larger group of people, and the "lesser" hosts can host smaller models to "earn" access to larger ones.