| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 1022 days ago

I didn't mean to imply splitting llama up between machines (though that is a thing with llama.cpp), but a pool of clients and servers who make requests and process them:

https://lite.koboldai.net/

A few users with half decent PCs can serve a much larger group of people, and the "lesser" hosts can host smaller models to "earn" access to larger ones.