| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ijk 465 days ago

There's two general categories of local inference:

- You're running a personal hosted instance. Good for experimentation and personal use; though there's a tradeoff on renting a cloud server.

- You want to run LLM inference on client machines (i.e., you aren't directly supervising it while it is running).

I'd say that the article is mostly talking about the second one. Doing the first one will get you familiar enough with the ecosystem to handle some of the issues he ran into when attempting the second (e.g., exactly which model to use). But the second has a bunch of unique constraints--you want things to just work for your users, after all.

I've done in-browser neural network stuff in the past (back when using TensorFlow.js was a reasonable default choice) and based on the way LLM trends are going I'd guess that edge device LLM will be relatively reasonable soon; I'm not quite sure that I'd deploy it in production this month but ask me again in a few.

Relatively tightly constrained applications are going to benefit more than general-purpose chatbots; pick a small model that's relatively good at your task and train it on enough of your data and you can get a 1B or 3B model that has acceptable performance, let alone the 7B ones being discussed here. It absolutely won't replace ChatGPT (though we're getting closer to replacing ChatGPT 3.5 with small models). But if you've got a specific use case that will hold still enough to deploy a model it can definitely give you the edge versus relying on the APIs.

I expect games to be one of the first to try this: per-player-action API costs murder per-user revenue, most of the gaming devices have some form of GPU already, and most games are shipped as apps so bundling a few more GB in there is, if not reasonable, at least not unprecedented.

2 comments

aazo11 465 days ago

Very interesting. I had not thought about gaming at all but that makes a lot of sense.

I also agree the goal should not be to replace ChatGPT. I think ChatGPT is way overkill for a lot of the workloads it is handling. A good solution should probably use the cloud LLM outputs to train a smaller model to deploy in the background.

link

CharlieRuan 465 days ago

Curious what are some examples of "per-player-action API costs" for games?

link

ijk 465 days ago

Inference using an API costs money. Not a lot of money, per million tokens, but it adds up if you have a lot of tokens...and some of the obvious game uses really chew through the tokens. Like chatting with a character, or having the NPC character make decisions via reasoning model. Can easily make the tokens add up.

Games, on the other hand, are mostly funded via up-front purchase (so you get the money once and then have to keep the servers running) or free to play, which very carefully tracks user acquisition costs versus revenue. Most F2P games make a tiny amount per player; they make up the difference via volume (and whales). So even a handful of queries per day per player can bankrupt you if you have a million players and no way to recoup the inference cost.

Now, you can obviously add a subscription or ongoing charge to offset it, but that's not how the industry is mostly set up at the moment. I expect that the funding model will change, but meanwhile having a model on the edge device is the only currently realistic way to afford adding an LLM to a big single player RPG, for example.

link

K0balt 465 days ago

You release the game with a variable in game experience. If the player has two 4090s chugging away, she runs everything locally. If he’s got an rx480, a barebones 1b model or a subscription for the nicer AI NPCs, which can open up AI driven side quests (adding minor content within having to write it). Include a “free” month when you register the game.

link

ivape 465 days ago

What if I charge "whales" in games to talk to an anime girl? Maybe I'll only let you talk to her once a day unless you pay me like a kissing booth for every convo. There's going to be some predatory stuff out there, I can see what the GP is talking about with games.

link

kevingadd 465 days ago

For a while basically any mobile or browser freemium game you tried would have progress timers for building things or upgrading things and they'd charge you Actual Money to skip the wait. That's kind of out of fashion now though some games still do it.

link