|
|
|
|
|
by Palmik
957 days ago
|
|
This is amazing, and will unlock many possibilities. I just recently read the S-LoRA paper, which is related, but it's even better to have a working (and extremely efficient!) implementation. How hard would it be to adapt your kernels to work with the new-gen quants like AWQ or EXL2? |
|
We are polishing the 4-bit code. It will be added to Punica code base soon. Please stay tuned :)
[1] https://arxiv.org/abs/2310.19102