| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by philipov 1136 days ago
	Uh... is there some way to use this without connecting to a server? Like, for a game that can be played offline? Finding a way to make the machine learning piece a completely self-contained library that can be shipped at scale to run on individual computers is the big hurdle to making AI like this practical for games. If I have to rely on your service staying up for my game to work, that's an unacceptable supply chain risk.

3 comments

AgentK20 1136 days ago

EDIT: Actually there's apparently been a lot of progress recently that I hadn't kept up with; see the replies to this comment.

Original message: From a quick peek at the source, this depends on the ChatGPT API for the underlying LLM. It could probably be modified to use a local copy of an LLM, but most models I've seen are 300GB+ and require significant computational resources to operate (think several $15k NVIDIA A100 compute nodes). There's a lot of effort being put in by the open source community to minimize these models and run them on commodity hardware, but as of yet the quality of the responses from the model are correlated with how large (and therefore how much compute) the model has. Give it a year or two and it'll probably be more reasonable to integrate a local LLM for gaming purposes.

link

theaiquestion 1136 days ago

> most models I've seen are 300GB+ and require significant computational resources to operate (think several $15k NVIDIA A100 compute nodes).

What? Where have you been the last 3 months?

> the quality of the responses from the model are correlated with how large (and therefore how much compute) the model has

There's a lot more to this including the model structure, training methods, number of training tokens, quality of training data, etc.

I'm not at all saying that Vicuna/Alpaca/SuperCOT/Other llama based models are as good as GPT3.5 - but they should be capable of this, they still create coherent answers.

You need preferably 24GB of vram, but you can get away with less, or you can use system memory (although that'll be slow).

There is a openai api proxy that might let this work without too much work actually

EDIT: It actually says in the readme they plan to support StableLM which is interesting because at least at the moment that's not a well performing model

EDIT 2: You should try the replit2.8B model - This is surprisingly good at programming - https://huggingface.co/spaces/replit/replit-code-v1-3b-demo

link

inhumantsar 1136 days ago

Even if you're a more lightweight model, it's still not very practical to require a dedicated 24GB GPU for every active gamer, whether local or cloud hosted.

For all intents and purposes, it's as much of a non-starter in a production game as the multiple A100 scenario.

Of course that isn't going to remain the case for long as the recent advancements in optimization make their way into live systems, but still.

link

theaiquestion 1136 days ago

> it's still not very practical to require a dedicated 24GB GPU

totally agreed, you could get away with 12GB too which is in the midrange.

That said yeah it's still not something you could make a game with yet, I'm just pointing out 300GB+ of VRAM isn't the bar for entry here, it is reachable for medium-high end consumers but that's not really including the games resources either, and most gamers aren't medium-high end so...

link

nm980 1136 days ago

> EDIT: It actually says in the readme they plan to support StableLM which is interesting because at least at the moment that's not a well performing model

I chose StableLM because that's the only other model I knew of besides ChatGPT. I'm open to adding support for other models after I fix some bugs first.

link

theaiquestion 1136 days ago

You might consider supporting ooba's api which would give you a lot of support for different things really quickly.

https://github.com/oobabooga/text-generation-webui/

link

nullsense 1136 days ago

Yeah, I second this. I use this frequently and lots of models downloaded that I test out with it. I'm keen to see a more API led approach.

link

AgentK20 1136 days ago

Oh, fair enough. I hadn't been keeping up too much but hadn't realized they had progressed that far. I'll have to do some tinkering this evening.

link

MacsHeadroom 1136 days ago

7B parameter models are more than enough for this and run faster than talking pace on even a low end CPU.

Even a finetuned 3B model would be excellent for generative agents and only use about 2GB of RAM to at high speeds on even a single core CPU.

link

holografix 1135 days ago

Can you share some examples Of what models you’re referring to?

link

T-A 1135 days ago

https://github.com/ggerganov/llama.cpp

https://huggingface.co/mosaicml/mpt-7b

link

woah 1136 days ago

Dunno. I only ever play games that require an internet connection. I doubt this is an issue for most players.

link

ehnto 1135 days ago

Games are increasingly moments in time, moments where all the services are working, and other people are playing. Get it while it's hot or you'll be playing a dead world.

Do I think that's good? Absolutely not, I think it's terrible. But the commentor I'm replying to is right, "most players" won't care at all, else we wouldn't be in this position.

link