| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by isoprophlex 546 days ago

I make very heavy use of structured output (to convert unstructured data into something processable, eg for process mining on customer service mailboxes)

Is it any good for this, if you tested it?

I'm looking for something that hits the sweet spot of runs locally & follows prescribed output structure, but I've been quite underwhelmed so far

7 comments

enkrs 546 days ago

I thought structured output is a solved problem now. I've had consistent results with ollama structured outputs [1] by passing Zod schema with the request. Works even with very small models. What are the challenges you're facing?

[1] https://ollama.com/blog/structured-outputs

link

freehorse 546 days ago

Structured output is solved, it is structuring data that's not, because that is an unbounded problem. There is no limit to how messy your data may be, and no limit to the accuracy and efficiency you may require.

I have used such models to structure human-generated data into sth a script can then read and process, getting important aspects in this data (eg what time the human reported doing X thing, how long, with whom etc) into like a csv file with columns eg timestamps and whatever variables I am interested in.

link

Der_Einzige 546 days ago

For anyone who thinks it isn't "solved", outlines debunked the paper which claims that "structured generation harms creativity":

https://blog.dottxt.co/say-what-you-mean.html

link

the_mitsuhiko 546 days ago

I get decent JSON from it quite well with the "assistant: {" trick. I'm not sure how well trained it is to do JSON. The template on ollama has tools calls so I assume they made sure JSON works: https://ollama.com/library/mistral-small:24b/blobs/6db27cd4e...

link

a_wild_dandan 546 days ago

And for anyone looking to dig deeper, check out "grammar-based sampling."

link

azinman2 546 days ago

What’s the “assistant: {" trick? You just end your prompt with that?

link

simonw 546 days ago

Mistral supports prefixes: https://docs.mistral.ai/guides/prefix/

link

azinman2 546 days ago

That’s cool. However it only shows a few odd. I’d imagine the model needs to explicitly support this (be trained with it). None are about json… do you use that trick with json?

link

simonw 545 days ago

I use it to get JSON pretty often. See also: https://twitter.com/simonw/status/1885091289554968975

link

starik36 546 days ago

The only model that I've found to be useful in processing customer emails is o1-preview. The rest of the models work as well, but don't get all the minutia of the emails.

My scenario is pretty specific though and is all about determining intent (e.g. what does the customer want) and mapping it onto my internal structures.

The model is very slow, but definitely worth it.

link

d4rkp4ttern 545 days ago

It does decently well actually. You can test function-calling using Langroid. There are several example scripts you could try from the repo, e.g.

    uv run examples/basic/tool-extract-short-example.py --model ollama/mistral-small

sample output: https://gist.github.com/pchalasani/662d7f13dbe690d6e2bfef01c...

Langroid has a ToolMessage mechanism that lets you specify a tool/fn-call using Pydantic, which is then transpiled into system message instructions.

link

mohsen1 546 days ago

See function calling being called out here

https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-...

link

mercer 546 days ago

I've found phi4 to be very good for this.

link

rkwz 546 days ago

What local models are you currently using and what issues are you facing?

link