|
|
|
|
|
by NightBlossom
147 days ago
|
|
Hi HN, author here. I built this project because I wanted to understand the low-level mechanics of LLMs and how FFI overhead differs between languages. Some key takeaways: Architecture: It's a 6.9B MoE model implemented purely in Rust, Go, and Python. Shared CUDA: All three languages bind to the exact same CUDA kernels (no PyTorch/TensorFlow). Performance: I was surprised to see how Go handles cgo overhead compared to Rust's FFI in this specific workload. I know it's reinventing the wheel, but it was a great way to learn.
Happy to answer any questions about the implementation or the FFI architecture! |
|