Hacker News new | ask | show | jobs
by yaantc 805 days ago
Use llamafile [1], it can be as simple as downloading a file (for mixtral, [2]), making it executable and running it. The repo README has all the info, it's simple and downloading the model is what takes the most time.

In my case I got the runtime detection issue (explained in the README "gotcha" section). Solved my running "assimilate" [3] on the downloaded llamafile.

    [1] https://github.com/Mozilla-Ocho/llamafile/
    [2] https://huggingface.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile?download=true
    [3] https://cosmo.zip/pub/cosmos/bin/assimilate
1 comments

Thank you !