| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sigmoid10 263 days ago
	I always find it amazing how many people seem to fail to use current LLMs to the fullest, even though they apparently work with them in research settings. This benchmark pipeline simply calls the OpenAI API and then painstakingly tries to parse the raw text output into a structured json format, when in reality the OpenAI API has supported structured outputs for ages now. That already ensures your model generates schema compliant output without hallucinating keys at the inference level. Today all the major providers support this feature either directly or at least indirectly via function calling. And if you run open models, you can literally write arbitrary schema (i.e. not limited to json behind the scenes) adhering inference engines yourself with rather manageable effort. I'm constantly using this in my daily work and I'm always baffled when people tell me about their hallucination problems, because so many of them can be fixed trivially these days.

6 comments

barthelomew 263 days ago

Hey there! I mostly designed and wrote most of the actual interpreter during my internship at Microsoft Research last summer. Constrained decoding for GPT-4 wasn’t available when we started designing the DSL, and besides, creating a regex to constrain this specific DSL is quite challenging.

When the grammar of the language is better defined, like SMT (https://arxiv.org/abs/2505.20047) - we are able to do this with open source LLMs.

sigmoid10 262 days ago

What are you talking about? OpenAI has supported structured json output in the API since 2023. Only the current structured output API was introduced by OpenAI in summer 2024, but it was primarily a usability improvement that still runs json behind the scenes.

barthelomew 262 days ago

You're right about the 2023 JSON mode, but our project required enforcing a much more complex DSL grammar (look in Appendix for details), not just ensuring a *valid JSON object*. The newer structured output APIs are a significant improvement, but the earlier tools weren't a fit for the specific constraints we were working under at the time.

dang 262 days ago

> What are you talking about?

Please edit out swipes like this from your HN comments—this is in the site guidelines: https://news.ycombinator.com/newsguidelines.html. It comes across as aggressive, and we want curious conversation here.

Your comment would be fine without that bit.

sigmoid10 262 days ago

This is not meant as snide, I'm literally confused if I might have misunderstood the problem here. Because the solution would be so obvious.

dang 262 days ago

I believe you! but when an internet reply leads with "what are you talking about?", it's likely to pattern-match this way for many readers. If that's not your intent, it's best to use an alternate wording.

vasco 262 days ago

Not to be rude, but they clarified it's not a snide, why are you trying to control speech to this degree? If we don't like his tone we can downvote him as well anyway and self regulate.

atrus 263 days ago

I wouldn't find it amazing, there are so many new models, features, ways to use models that the minute you pause to take a deep dive into something specific, 43 other things have already passed by you.

sigmoid10 263 days ago

I would agree if you are a normal dev who doesn't work in the field. But even then reading the documentation once a year would have brought you insane benefits regarding this particular issue. And for ML researchers there is no excuse for stuff like that at this point.

jssmith 263 days ago

I see JSON parse errors on occasion when using OpeanAI structured outputs that resolve upon retry. It seems it’s giving instructions to the LLM but validation is still up to the caller. Wondering if others see this too.

barthelomew 262 days ago

Hey, yes! This is because the DSL (Domain Specific Language) is pretty complex, and the LLM finds it hard. We prototype a much more effective version using SMT in our NeurIPS 2025 paper (https://arxiv.org/abs/2505.20047). We shall soon open source that code!

sigmoid10 262 days ago

Depends on how strictly you define your types. Are you using pydantic to pass the information to the API? There are a few pitfalls with this, because not everything is fully supported and it gets turned into json behind the scenes. But in principle, the autoregressive engine will simply not allow tokens that break the supplied schema.

striking 262 days ago

Not sure if I've been using it wrong but I've tried using the Zod-to-structured-output helper with GPT-5 and often gotten weird stuff like trailing commas that break a parse or seeing multiple JSON responses in the same response.

Ultimately there are still going to be bugs. For this reason and several others you'll still need it wrapped in a retry.

sigmoid10 262 days ago

Yeah that sounds 100% like a user or middleware issue. Don't bother with these wrappers, they are always outdated anyways. Learn how to use the API directly, it will save you a ton of headaches. And it's really not that hard.

striking 262 days ago

No, we're using the OpenAI vendored version of zod-to-json-schema via https://github.com/transitive-bullshit/openai-zod-to-json-sc..., and applying it directly to the `json_schema` field of the OpenAI API. Maybe we have a subtle bug somewhere but I'd expect a 400 response if we were genuinely sending a malformed request.

eric-burel 262 days ago

Yep from time to time.

IanCal 262 days ago

I’d also be surprised if the models are better at writing code in some custom schema (assuming that’s not z3s native structure) than writing code in something else. Decent models can write pretty good code and for a lot of mistakes can fix them, plus you get testing/etc setups for free.

eric-burel 262 days ago

It's a relatively new feature, also people need actual professional training to become true LLM developers using them to their fullest and not just developers that happen to call an LLM API here and there. Takes a lot of time and effort.

retinaros 263 days ago

yes this can also improve the said reasoning.

sigmoid10 263 days ago

The secret the big companies don't want to tell you is that you can turn all their models into reasoning models that way. You even have full control over the reasoning process and can make it adhere to a specific format, e.g. the ones used in legal settings. I've built stuff like that using plain old gpt-4o and it was even better than the o series.