Hacker News new | ask | show | jobs
by GordonS 1189 days ago
Nice, main.exe seems to work just fine with the 7B quantized model - generates a token every 400ms on an AMD Ryzen 5 2600!

But, quantize.exe doesn't seem to work - any valid command (such as below) pauses for a split second, then returns with no output?

$ quantize.exe ggml-model-f16.bin ggml-model-q4_0.bin 2

1 comments

In case this helps anyone else: I built it myself on Windows with CMake, and then everything just works.
Do you mind sharing the binaries?
Sure! https://filetransfer.io/data-package/8hxKAiaH#link

I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.

Thank you.

llama.exe is basically main.exe?

I actually learned how to compile this code via CMake/VS2019. It's sure a whole lot more complicated then it was 25 years ago when I was writing C.

Yes, llama.exe is actually the name the project produces - the other poster must have renamed it to main.exe.

I just did `scoop install cmake`, then built from the command line, was a doddle!