I find these efforts impressive, but what is the value proposition here? (I'm not just talking about this fork, but also Karapathy's llama2.c as well).
Personally for me the value was to implement a complex logic from a scientific paper in a pure Python.
It helps to understand the essence of a cutting edge AI technology.
And it's quite fascinating that it would take about 500 lines of core part code to implement inference for such a complex solution.
Regarding the original llama2.c as I believe the value proposition is to have simple implementation that can execute the inference locally on wide variety of platforms. What if we can execute fine-tuned Llama7B on our phones?