| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BoorishBears 1038 days ago

Maybe re-read what I said?

> realistically it's probably cheaper to just re-emit than to add the machinery that enables this to their existing architecture

There are literally dozens of random projects that have implemented logit based masking, it's a trivial thing to implement.

What's probably not as trivial is deploying it at scale with whatever architecture OpenAI already has in place. Especially if they're using the router-based MoE architecture most people are assuming they use.

OpenAI doesn't expose token probabilities for their RLHF models, yet they did for GPT-3. Originally that lead to speculation that was to make building competitors harder, but they've now said they're actually still working on it... which leans even further into the idea they may have an architecture that makes the kind of sampling these projects rely on more difficult to implement than normal.

Given how fast and cheap they've made access to these models, their current approach is a practical workaround if that's the case.

1 comments

behnamoh 1038 days ago

when GPT-4 first became available, I had a feeling that something about it felt “hacky”. Compared to GPT-3 which was more streamlined, mature, and well thought out, GPT-4 was like a system put together to outperform the previous one at all costs. I wouldn’t be surprised if that led to design decisions that made their model hard to improve. Maybe GPT-5 will not be around any time soon.

link