Hacker News new | ask | show | jobs
by andy99 1045 days ago
I made a jupyter notebook "llama2.ipynb" from the Karpathy project: https://github.com/rbitr/llama2.ipynb

I didn't do a pure python, mine uses numpy, and although I haven't benchmarked, it runs the stories15M model much faster than 1.3 tok/sec on my 2018 macbook. You should try swapping in numpy matrix multiplication, or @ (I actually don't know if that's native or part of another package) for matmul and see what changes.

1 comments

1.3 tok / sec is something similar to my Python version port performance, but I tried on M1 Max