Hacker News new | ask | show | jobs
by beoberha 606 days ago
For me, it was almost random if I would get a little spiel at the beginning of my response - even on the unquantized 8b instruct. Since ollama doesn’t support grammars, I was trying to get it to work where I had a prompt that summarized an article and extracted and classified certain information that I requested. Then I had another prompt that would digest the summary and spit out a structured JSON output. It was much better than trying to do it in one prompt, but still far too random even with temperature at 0. Sometimes the first prompt misclassified things. Sometimes the second prompt would include a “here’s your structured output”.

And Claude did everything perfectly ;)

1 comments

Why not preprompt with ```json {
Yes, you can pre-fill the assistant's response with "```json {" or even "{" and that should increase the likelihood of getting a proper JSON in the response, but it's still not guaranteed. It's not nearly reliable enough for a production use case, even on a bigger (8B) model.

I could recommend using ollama or VLLm inference servers. They support a `response_format="json"` parameter (by implementing grammars on top of the base model). It makes it reliable for a production use, but in my experience the quality of the response decreases slightly when a grammar is applied.

Grammars are best but if you read their comment they're apparently using ollama in a situation that doesn't support them.