|
|
|
|
|
by ed
606 days ago
|
|
Oh cool! I’ve been playing with quantized llama 3B for the last week. (4-bit spinquant). The code for spinquant has been public for a bit. It’s pretty adept at most natural language tasks (“summarize this”) and performance on iPhone is usable. It’s even decent at tool once you get the chat template right. But it struggles with json and html syntax (correctly escaping characters), and isn’t great at planning, which makes it a bad fit for most agenetic uses. My plan was to let llama communicate with more advanced AI’s, using natural language to offload tool use to them, but very quickly llama goes rogue and starts doing things you didn’t ask it to, like trying to delete data. Still - the progress Meta has made here is incredible and it seems we’ll have capable on-device agents in the next generation or two. |
|
You should customise your sampler to mandate JSON grammar after ```json tokens.