|
|
|
|
|
by hellovai
726 days ago
|
|
not a noob question, here's how the LLM works: ``` prompt = "..." output = [] do: token_probabilities = call_model(prompt)
best_token = pick_best(token_probabilities)
if best_token == '<END>':
break
output += best_token
while truereturn output ``` basically to support generation they would need to modify pick_best to support constraining. That would make it so they can't optimize the hot loop at their scales. They support super broad output constraints like JSON which apply to everyone, but that leads to other issues (things like chain-of-thought/reasoning perform way worse in structured responses). |
|
That is fairly well establish to be not true.