| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 33a 1066 days ago
	Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message. This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.

3 comments

babyshake 1066 days ago

At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?

link

akavi 1066 days ago

Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.

link

rezonant 1066 days ago

For the TS compiler: If you took each generation step, closed any partial JSON objects (ie close any open `{`), checked that it was valid JSON and then validated it using a deep version of Partial<T>, that should do the trick.

link

akavi 1066 days ago

Not for even the simplest schemas.

Eg, given even the type:

    {"aLongerKey": "value"}

The generation prefix:

{"a

would by your algorithm produce the following invalid output:

    {"a}

link

rezonant 1066 days ago

That's why I mentioned you check the JSON validity first. You'd obviously need to continue letting it generate tokens until you can parse the JSON to check if the type is partial. You could of course close even the quotes but then you'd get "not valid" signals from TS when the AI is like "just let me finish!" :-)

link

just-ok 1066 days ago

But that isn’t valid JSON

link

akavi 1066 days ago

Right, it would fail even before hitting the typing check.

link

osaariki 1066 days ago

I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance

link

J_Shelby_J 1066 days ago

It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...

link

behnamoh 1066 days ago

except that guidance is defunct and is not maintained anymore.

link

huac 1066 days ago

did they announce that anywhere? it does appear like progress has slowed down quite a lot.

link

SkyPuncher 1066 days ago

I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.

link