Hacker News new | ask | show | jobs
by ilaksh 755 days ago
You can 100% do that with quantized models that are 8b and below. Take a look at ollama to experiment. For incorporating in a game I would probably use llama.cpp or candle.

The game itself is not going to have much VRAM to work with though on older GPUs. Unless you use something fairly tiny like phi3-mini.

There are a lot more options if you can establish that the user has a 3090 or 4090.