Hacker News new | ask | show | jobs
by smoldesu 957 days ago
Your hardware should be fine for inferencing, as long as you don't bother trying to get the GPU working.

My $0.02 would be to try getting LocalAI running on your machine with OpenCL/CLBlas acceleration for your CPU. If you're running other things, you could limit the inferencing process to 2 or 3 threads. That should get it working; I've been able to inference even 13b models on cheap Rockchip SOCs. Your CPU should be fine, even if it's a little outdated.

LocalAI: https://github.com/mudler/LocalAI

Some decent models to start with:

TinyLlama (extremely small/fast): https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGU...

Dolphin Mistral (larger size, better responses: https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF

1 comments

Thank you so much, actually I tried to run mistral and TinyLlama over ollama without success. But I've never limit inferencing process or anything. Let me try LocalAI