Hacker News new | ask | show | jobs
by bed147373429 1055 days ago
I have a question: Last week I downloaded llama-7b-chat from meta's github directly (https://github.com/facebookresearch/llama) using the URL they sent via e-mail. As a result, I now have the model as consolidated.00.pth.

Your commands assume the model is a .bin file (so I guess there must be a way to convert the pytorch model .pth to the .bin file). How can I do this and what is the difference between the two models?

The facebook repo provides commands for using the models, these commands don't work on my windows machine: "NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to ...."

The facebook repo does not describe which OS you are supposed to use, so I assumed it would work on Windows too. But then if this can work why would anyone need the ggerganov llama code? I am new to all of this and easily confused, so any help is appreciated

3 comments

To be perfectly honest, I know absolutely nothing about AI or Llama; I'm just a Windows C++ programmer so I wanted to provide cmake instructions for Windows, sorry. The .bin file is what I got from the OP's link
It's ok, I just followed your instructions and with that model is works well. But are you sure that this uses CUDA? My CPU utilization is at 50% while my GPU utilization is at 1% while the output is being generated..
The cmake build prints that it finds cuda when I run the cmakelists (prints the location of cuda headers), however I dont see any noticeable difference between cpu-only and cuda builds. So if its not working then maybe there a CLI option thats required, or maybe cuda support is broken on windows
llama.cpp needs the files to be in ggml format, there is a command string you can run to convert one from the other (as well as perform quantization). Or just download the GGML version

https://www.reddit.com/r/LocalLLaMA/wiki/models#wiki_llama_2...

try *cd llama.cpp && python convert-pth-to-ggml.py models/7B/ 1*