|
|
|
|
|
by jart
1112 days ago
|
|
Whenever a community project goes commercial, its interests are usually no longer aligned with the community. For example, llama.com makes frequent backwards-incompatible changes to its file format. I maintain a fork of ggml in the cosmopolitan monorepo which maintains support for old file formats. You can build and use it as follows: git clone https://github.com/jart/cosmopolitan
cd cosmopolitan
# cross-compile on x86-64-linux for x86-64 linux+windows+macos+freebsd+openbsd+netbsd
make -j8 o//third_party/ggml/llama.com
o//third_party/ggml/llama.com --help
# cross-compile on x86-64-linux for aarch64-linux
make -j8 m=aarch64 o/aarch64/third_party/ggml/llama.com
# note: creates .elf file that runs on RasPi, etc.
# compile loader shim to run on arm64 macos
cc -o ape ape/ape-m1.c # use xcode
./ape ./llama.com --help # use elf aarch64 binary above
It goes the same speed as upstream for CPU inference. This is useful if you can't/won't recreate your weights files, or want to download old GGML weights off HuggingFace, since llama.com has support for every generation of the ggjt file format. |
|