| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BoorishBears 1042 days ago
	I just left a comment along these lines, but realistically it's probably cheaper to just re-emit than to add the machinery that enables this to their existing architecture. At most I could have seen them maybe running a schema validator against the output and re-requesting on your behalf, but even that's probably cheaper for them to do client side (I will say, I'm surprised their API wrapper hasn't been updated to do this yet)

1 comments

2bitencryption 1042 days ago

> maybe running a schema validator against the output and re-requesting on your behalf

this is the part that blows my mind. You don't have to do this! You don't have to sample the entire output, and then validate after the fact.

You're not required to greedily pick the token with the highest score. You get the scores of all tokens, on every forward pass! So why even waste time picking invalid tokens if you're just going to validate and retry later on??

(note: when I say "you" here, I mean whoever is hosting the model. It is true that OpenAI does not expose all token scores, it only gives you back the highest-scoring one. So a client-side library is not able to perform this grammar-based sampling.

BUT, OpenAI themselves host host the model, and they see all token outputs, with all scores. And in the same API request, they allow you to pass the "function definition" as a JSON schema. So why not simply apply that function definition as a mask on the token outputs? They could do this without exposing all token scores to you, which they seem very opposed to for some reason.)

link

BoorishBears 1042 days ago

Maybe re-read what I said?

> realistically it's probably cheaper to just re-emit than to add the machinery that enables this to their existing architecture

There are literally dozens of random projects that have implemented logit based masking, it's a trivial thing to implement.

What's probably not as trivial is deploying it at scale with whatever architecture OpenAI already has in place. Especially if they're using the router-based MoE architecture most people are assuming they use.

OpenAI doesn't expose token probabilities for their RLHF models, yet they did for GPT-3. Originally that lead to speculation that was to make building competitors harder, but they've now said they're actually still working on it... which leans even further into the idea they may have an architecture that makes the kind of sampling these projects rely on more difficult to implement than normal.

Given how fast and cheap they've made access to these models, their current approach is a practical workaround if that's the case.

link

behnamoh 1042 days ago

when GPT-4 first became available, I had a feeling that something about it felt “hacky”. Compared to GPT-3 which was more streamlined, mature, and well thought out, GPT-4 was like a system put together to outperform the previous one at all costs. I wouldn’t be surprised if that led to design decisions that made their model hard to improve. Maybe GPT-5 will not be around any time soon.

link