|
|
|
|
|
by bthornbury
118 days ago
|
|
Something like a perplexity/log-likelihood measurement across a large enough number of prompts/tokens might get you the same in a statistical sense though. I expect those comparison percentages at the top are something like that. |
|