Hacker News new | ask | show | jobs
by ilaksh 899 days ago
I was experimenting with getting a few models to output Rhai scripting and found that the non-quantized models or 6 bit were able to do it as I requested with a few hints, but the 4 or 5 bit ones got confused.

Whereas the 4 or 5 bit could handle equivalent requests with Python.

My conclusion was that I should find tune a 4 or 5 bit on Rhai scripting question output pairs and it I made enough good ones, the performance on my task would improve.

Maybe if I just switch to Exllama2 or something then the 6 bit will run fast enough.