| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by GordonS 1236 days ago

Nice, main.exe seems to work just fine with the 7B quantized model - generates a token every 400ms on an AMD Ryzen 5 2600!

But, quantize.exe doesn't seem to work - any valid command (such as below) pauses for a split second, then returns with no output?

$ quantize.exe ggml-model-f16.bin ggml-model-q4_0.bin 2

1 comments

GordonS 1235 days ago

In case this helps anyone else: I built it myself on Windows with CMake, and then everything just works.

link

starik36 1234 days ago

Do you mind sharing the binaries?

link

GordonS 1233 days ago

Sure! https://filetransfer.io/data-package/8hxKAiaH#link

I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.

link

starik36 1233 days ago

Thank you.

llama.exe is basically main.exe?

I actually learned how to compile this code via CMake/VS2019. It's sure a whole lot more complicated then it was 25 years ago when I was writing C.

link

GordonS 1232 days ago

Yes, llama.exe is actually the name the project produces - the other poster must have renamed it to main.exe.

I just did `scoop install cmake`, then built from the command line, was a doddle!

link