| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swyx 1022 days ago
	> the AI Horde approach of distributed models seems much more practical anyway. i wasnt aware this was a term of art. is there a definitive blogpost or product explaining this approach?

1 comments

ukuina 1022 days ago

This is a reference to Kobold Horde, a distributed volunteer network of GPUs that can be inferenced upon.

link

brucethemoose2 1022 days ago

I didn't mean to imply splitting llama up between machines (though that is a thing with llama.cpp), but a pool of clients and servers who make requests and process them:

https://lite.koboldai.net/

A few users with half decent PCs can serve a much larger group of people, and the "lesser" hosts can host smaller models to "earn" access to larger ones.

link