Hacker News new | ask | show | jobs
by cocktailpeanut 1201 days ago
UPDATE:

Thanks for all the feedback! I went outside to take a walk after posting this and just came back, and went through them to summarize what needs to be improved.

Basically looks like it comes down to the following:

  - *customize features:* Should not be difficult (will add flag features)
    - *path:* customize the home directory (instead of automatically storing to $HOME)
    - *python:* some people are having issues with the python binary (since the package is essentially calling these shell commands). Maybe add a flag to specify the exact name of the python binary (such as "--python python3")
    - *avoid downloading files:* I have this issue too when I just want to install the code instead of downloading the full model which takes a long time. Might add a flag to avoid downloading models in case you already have them (EDIT: actually upon thinking about it, it's better to just set the source model folder, something like --model)
    - *other flags:* The rest of the flags natively supported by the llama.cpp project, such as top_k, top_p, temp, batch_size, threads, seed, n_predict, etc. (They are already in the code but just was not exposed for CLI and not documented)
    
  - *documentation*
    - document the machine spec
    - document the storage spec: how much space is used?
    - node version: which version of node.js is required?
    - python version: which version of python doesn't work?
Am I missing anything? Feel free to leave comments, will try to roll out some updates as soon as I can. To stay updated, feel free to follow me on twitter https://twitter.com/cocktailpeanut (or you could create issues on GitHub too!)
6 comments

I tried to run your NPX commands from the examples on a fresh WSL install of Ubuntu 20.04, but if you don't have build tools installed, they both just silently fail.

I only realized what was happening after trying to go the other route and use it in a package, where I then noticed the NPM install will give a node-gyp error about make missing.

I'm on NixOS, where you have to explicitly state dependencies (which is a good thing, except when... this happens)

Besides make (which I can quickly set up a project environment to make available for), what other deps do you think it uses but doesn't declare or state? ;)

The other one I noticed is pip! A lot of the script fails without pip, and it takes until after the fairly long downloads finish to let you know it was needed.
so it needs make/gcc, python AND node available... what versions, I wonder?
I successfully used the latest version of node LTS (via NVM) and the latest versions of python-pip3 and build essentials from the Canonical apt repo, if that helps.
I don’t understand why it’s downloading at all, that shouldn’t be default behavior.

It should have default instructions to load a file from a default place, and then arguments/flags to load from a specific path, and then MAYBE a prompt to download the models after it can’t find them on the paths, plural

UPDATE 2:

Thanks to all the pull requests, we've managed to solve most of these issues in the most optimal manner.

Version 0.1.0 released: https://news.ycombinator.com/item?id=35143171

I followed the initial instructions and the 7B model worked just fine.

I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.

Is there a way to run just one of the larger models?

I am going to test this out today and roll this out as soon as I can, hopefully tomorrow. stay tuned.
What's the minimum spec GPU required? NVIDIA only? Any differences between Debian and Fedora Linuxes? RAM required?
This app is CPU only and gets good speeds on even mobile phone CPUs. Minimum RAM required is 5GB.
Oh wow, any way to do this on Android yet? That would be fun to tinker with, even if it's just the smaller model. Even my older Note 9 has 6GB.
Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android

I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.

What distro and PC specs do you have success with?
I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong. Distro Ubuntu 22.04
It would be great to also understand how one can finetune this model. Thanks for the awesome work!
you may be able to use pyenv to increase compatibility across Linux distributions