Hacker News new | ask | show | jobs
by helloericsf 730 days ago
Seems interesting! https://github.com/turboderp/exllama "A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."