Hacker News new | ask | show | jobs
by Bostonian 1045 days ago
The llama2.py code defines its own accum, rmsnorm and matmul. Why not use NumPy? A "pure Python" code that is much slower than one using NumPy is less interesting to me.
1 comments

If your goal is to make it as fast as possible, then for sure Python implementation is not a solution here. I think for this exactly reason llama.cpp got high attention