Hacker News new | ask | show | jobs
by 33a 1066 days ago
Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

3 comments

At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?
Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.

For the TS compiler: If you took each generation step, closed any partial JSON objects (ie close any open `{`), checked that it was valid JSON and then validated it using a deep version of Partial<T>, that should do the trick.
Not for even the simplest schemas.

Eg, given even the type:

    {"aLongerKey": "value"}
The generation prefix:

    {"a
would by your algorithm produce the following invalid output:

    {"a}
That's why I mentioned you check the JSON validity first. You'd obviously need to continue letting it generate tokens until you can parse the JSON to check if the type is partial. You could of course close even the quotes but then you'd get "not valid" signals from TS when the AI is like "just let me finish!" :-)
But that isn’t valid JSON
Right, it would fail even before hitting the typing check.
I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...

except that guidance is defunct and is not maintained anymore.
did they announce that anywhere? it does appear like progress has slowed down quite a lot.
I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.