Hacker News new | ask | show | jobs
by sottol 1189 days ago
They're using GTPQ -- here you go: https://arxiv.org/abs/2210.17323 . The authors benchmarked two families of models over a wide range of numbers of params.
1 comments

llama.cpp is using RTN at the moment.