|
|
|
|
|
by briga
1052 days ago
|
|
One thing to note when making comparisons like this is that LLM output is not deterministic, in the sense that if you ask it the same question 10 times you will get 10 different answers. So the question to ask is not, “is GPT4 better on this one specific question?”, but rather “does GPT4 produce better results on average?”. I would bet that it does, for no other reason than that it is much larger, and LLM performance seems to just scale with size. Also worth noting is that the more detailed your prompt is the better the response will be. Sometimes you have to encourage GPT to get the best results. GPT4 should be able yo handle much more complex and detailed prompts than 3.5 |
|
So we have different population of GPT users.
An average experience might be to get a mixture of spot-on helpful responses and obvious bullshit^H^H^Hallucinations, this population might learn what questions to ask given the limitations of the model. This is really a best case scenario as people can actually get a feel for how to use the technology, strengths and weaknesses etc.
Personally my experience was the first few dozen times I used it I was amazed at the responses, I was on team superintelligence, anyone who is getting lackluster responses is just holding it wrong. But luck changes and over months of use I see now that on average the responses are just OK. But this is the case that leads to disappointment and bitter conspiracy (the superintelligence is being suppressed, give it back!)
Another population had rotten luck to begin with, and got dumb unhelpful response over and over. This population quickly determined that the AI was all hype and stopped exploring (you don't keep going back to the casino if you lose everything your first time...).
This divergence is destructive to the larger discourse, since we have fanboys flummoxed by naysayers and critics bamboozled by hype beasts.