|
|
|
|
|
by concurrentsquar
782 days ago
|
|
OpenAI could either hire private testers or use AB testing on ChatGPT Plus users (for example, oftentimes, when using ChatGPT, I have to select between 2 different responses to
continue a conversation); both are probably much more better (in many aspects: not leaking GPT-4.5/5 generations (or the existence of a GPT-4.5/5) to the public at scale and avoiding bias* (because people probably rate GPT-4 generations better if they are told (either explicitly or implicitly (eg. socially)) it's from GPT-5) to say the least) than putting a model called 'GPT2' onto lmsys. * While lmsys does hide the names of models until a person decides which model generated the best text, people can still figure out what language model generated a piece of text** (or have a good guess) without explicit knowledge, especially if that model is hyped up online as 'GPT-5;' even a subconscious "this text sounds like what I have seen 'GPT2-chatbot' generate online" may influence results inadvertently. ** ... though I will note that I just got a generation from 'gpt2-chatbot' that I thought was from Claude 3 (haiku/sonnet), and its competitor was LLaMa-3-70b (I thought it was 8b or Mixtral). I am obviously not good at LLM authorship attribution. |
|
The only case where detecting a model makes any difference is for vendors who want to boost their own model by hiring people and paying them every time they select the vendor's model.