|
|
|
|
|
by austinvhuang
141 days ago
|
|
My first implementation of gemma.cpp was kind of like this. There's such a massive performance differential vs. SIMD though that I learned to appreciate SIMD (via highway) as one sweet spot of low-dependency portability that sits between C loops and the messy world of GPUs + their fat tree of dependencies. If anyone want to learn the basics - whip out your favorite LLM pair programmer and ask it to help you study the kernels in the ops/ library of gemma.cpp: https://github.com/google/gemma.cpp/tree/main/ops |
|