| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rolleiflex 1194 days ago

I'm following the instructions on the post from the original owner of the repository involved here. It's at https://til.simonwillison.net/llms/llama-7b-m2 and it is much simpler. (no affiliation with author)

I'm currently running the 65B model just fine. It is a rather surreal experience, a ghost in my shell indeed.

As an aside, I'm seeing an interesting behaviour on the `-t` threads flag. I originally expected that this was similar to `make -j` flag where it controls the number of parallel threads but the total computation done would be the same. What I'm seeing is that this seems to change the fidelity of the output. At `-t 8` it has the fastest output presumably since that is the number of performance cores my M2 Max has. But up to `-t 12` the output fidelity increases, even though the output drastically slows down. I have 8 perf and 4 efficiency cores, so that makes superficial sense. At `-t 13` onwards, the performance exponentially decreases to the point that I effectively no longer have output.

3 comments

gorbypark 1194 days ago

That's interesting that the fidelity seems to change. I just realized I had been running with `-t 8` even though I only have a M2 MacBook Air (4 perf, 4 efficiency cores) and running with `-t 4` speeds up 13B significantly. It's now doing ~160ms per token versus ~300ms per token with the 8 cores settings. It's hard to quantify exactly if it's changing the output quality much, but I might do a subjective test with 5 or 10 runs on the same prompt and see how often it's factual versus "nonsense".

link

dmw_ng 1193 days ago

I also noticed hitting CTRL+S to pause the TTY output seemed to cause a reliable prompt to suddenly start printing garbage tokens after CTRL+Q to resume a few seconds later. It may have been a coincidence, but instant thought was very much "synchronization bug"

link

IIAOPSW 1193 days ago

Don't you hate it when someone interrupts your train of thought.

link

bee_rider 1193 days ago

What do you use it for, out of curiosity? Can it do shell autocompletes (this is what “ghost in the shell” made me think of, haha).

link

rolleiflex 1193 days ago

Nothing. It's technology for the love of it.

I'm sure there are potential uses but training your own LLM would probably be more meaningfully useful versus running someone else's trained model, which is what this is.

link