Hacker News new | ask | show | jobs
by msp26 993 days ago
>Reasons behind the project: I have been working with OpenAI LLMs recently and I like to get my data output in structured JSON. I realised it's rather good at taking a schema where properties are compressed within an instruction, and then in return it return data based on the compressed schema.

OpenAI Function Calling does JSON to Typescript conversion under the hood for functions so I'm not surprised it works well.

1 comments

You can pretty much make up any pseudo grammar like this one, which is a reduced JSON object that is close to CUE: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/pro...

No need to be formal or use a standard format, just need a pattern the LLM can fill or follow

I have no idea what happens when I put data through these models and how they work.

My thought was it may have been trained on so much JSON and JSON schema data, that simply providing a JSON schema and telling it the data it outputs must validate against the schema will produce good results.

gpt-3.5-turbo and gpt-4 have worked superb at this so far and I'm excited to test with the new gpt-3.5-turbo-instruct model!

yes, if you finetune, you can get some preferred formats, see something like codellama. When using more generic model, you can do more generic things.

One of the benefits of using a reduced syntax is reduced tokens, so the LLM can focus on the interesting parts while ignoring uninteresting " and ,

Let the model think more (and about the important parts)