|
|
|
|
|
by rolleiflex
1194 days ago
|
|
I'm following the instructions on the post from the original owner of the repository involved here. It's at https://til.simonwillison.net/llms/llama-7b-m2 and it is much simpler. (no affiliation with author) I'm currently running the 65B model just fine. It is a rather surreal experience, a ghost in my shell indeed. As an aside, I'm seeing an interesting behaviour on the `-t` threads flag. I originally expected that this was similar to `make -j` flag where it controls the number of parallel threads but the total computation done would be the same. What I'm seeing is that this seems to change the fidelity of the output. At `-t 8` it has the fastest output presumably since that is the number of performance cores my M2 Max has. But up to `-t 12` the output fidelity increases, even though the output drastically slows down. I have 8 perf and 4 efficiency cores, so that makes superficial sense. At `-t 13` onwards, the performance exponentially decreases to the point that I effectively no longer have output. |
|