Hacker News new | ask | show | jobs
by jart 1112 days ago
Whenever a community project goes commercial, its interests are usually no longer aligned with the community. For example, llama.com makes frequent backwards-incompatible changes to its file format. I maintain a fork of ggml in the cosmopolitan monorepo which maintains support for old file formats. You can build and use it as follows:

    git clone https://github.com/jart/cosmopolitan
    cd cosmopolitan

    # cross-compile on x86-64-linux for x86-64 linux+windows+macos+freebsd+openbsd+netbsd
    make -j8 o//third_party/ggml/llama.com
    o//third_party/ggml/llama.com --help

    # cross-compile on x86-64-linux for aarch64-linux
    make -j8 m=aarch64 o/aarch64/third_party/ggml/llama.com
    # note: creates .elf file that runs on RasPi, etc.

    # compile loader shim to run on arm64 macos
    cc -o ape ape/ape-m1.c   # use xcode
    ./ape ./llama.com --help # use elf aarch64 binary above
It goes the same speed as upstream for CPU inference. This is useful if you can't/won't recreate your weights files, or want to download old GGML weights off HuggingFace, since llama.com has support for every generation of the ggjt file format.