|
|
|
|
|
by BoorishBears
1038 days ago
|
|
Maybe re-read what I said? > realistically it's probably cheaper to just re-emit than to add the machinery that enables this to their existing architecture There are literally dozens of random projects that have implemented logit based masking, it's a trivial thing to implement. What's probably not as trivial is deploying it at scale with whatever architecture OpenAI already has in place. Especially if they're using the router-based MoE architecture most people are assuming they use. OpenAI doesn't expose token probabilities for their RLHF models, yet they did for GPT-3. Originally that lead to speculation that was to make building competitors harder, but they've now said they're actually still working on it... which leans even further into the idea they may have an architecture that makes the kind of sampling these projects rely on more difficult to implement than normal. Given how fast and cheap they've made access to these models, their current approach is a practical workaround if that's the case. |
|