Y
Hacker News
new
|
ask
|
show
|
jobs
by
helloericsf
730 days ago
Seems interesting!
https://github.com/turboderp/exllama
"A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights."