| HN Mirror

The first paper is good to critique the performance of quantised models, it points out that 40-50% 'compression' typically results in only slight loss for RAG tasks relying on in-context learning, but for factual tasks replying on stored knowledge, performance very quickly dropped off. They looked at Vicuna, one of the earlier models, so I wonder how applicable it is to recent models like the Phi 3 range. I don't think deliberate clever adversarial attacks like those of the 2nd paper are a sensible worry for most, but it is fun. Thanks for the links @janwas.