Hacker News new | ask | show | jobs
by monkmartinez 459 days ago
Its really not hard to test with llamafile or ollama, especially with smaller 7B models. Just have a go.

There are a bazzillion and one hardware combinations where even RAM timings can make a difference. Offloading a small portion to a GPU can make a HUGE difference. Some engines have been optimized to run on Pascal with CUDA compute below 7.0, and some have tricks for newer gen cards with modern CUDA. Some engines only run on Linux while others are completely x-platform. It is truly the wild-west of combinatorics as they relate to hardware and software. It is bewildering to say the least.

In other words, there is no clear "best" outside of a DGX and Linux software stack. The only way to know anything right now is to test and optimize for what you want to accomplish by running a local llm.

1 comments

Built this https://www.caniusellm.com/ to check if you can run LLM in your local. You can provide custom config too