| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 975 days ago

> On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.

Yeah, I think the setup lost most users there.

A separate model/app approach (like Koboldcpp) seems way easier TBH.

Also, GPU support is assumed to be CUDA or Metal.

2 comments

jart 975 days ago

Author here. llamafile will work on stock Windows installs using CPU inference. No CUDA or MSVC or DLLs are required! The dev tools are only required to be installed, right now, if you want get faster GPU performance.

link

vsnf 974 days ago

My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.

It does run on the CPU though, so at least that's pretty cool.

link

jart 974 days ago

I've received a lot of good advice today on how we can potentially improve our Nvidia story so that nvcc doesn't need to be installed. With a little bit of luck, you'll have releases soon that get your GPU support working.

link

abareplace 974 days ago

The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.

link

fragmede 975 days ago

I'm sure doing better by windows users is on the roadmap, exec then reexec to get into the right runtime, but it's a good first step towards making things easy.

link