| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ed 606 days ago

Oh cool! I’ve been playing with quantized llama 3B for the last week. (4-bit spinquant). The code for spinquant has been public for a bit.

It’s pretty adept at most natural language tasks (“summarize this”) and performance on iPhone is usable. It’s even decent at tool once you get the chat template right.

But it struggles with json and html syntax (correctly escaping characters), and isn’t great at planning, which makes it a bad fit for most agenetic uses.

My plan was to let llama communicate with more advanced AI’s, using natural language to offload tool use to them, but very quickly llama goes rogue and starts doing things you didn’t ask it to, like trying to delete data.

Still - the progress Meta has made here is incredible and it seems we’ll have capable on-device agents in the next generation or two.

1 comments

tucnak 606 days ago

>But it struggles with json

You should customise your sampler to mandate JSON grammar after ```json tokens.

link

ed 603 days ago

Grammar samplers are clever! But in the case of a missing escape character you’ll end up with a corrupted string.

Take for example: "A dog says \"Woof!\""

With a grammar, you’ll end up with "A dog says " when the model forgets to escape.

Which is valid JSON, but not what the model intended.

So it’s usually better to catch the exception and ask the model to try again.

Unless you’ve come across a sampler with backtracking? That would be cool

link