|
|
|
|
|
by jazzyjackson
1052 days ago
|
|
An unintuitive consequence of this nondeterminism over millions of interactions is that different individuals will see different trends. IME the quality of response is accurately modeled by "luck", and people's luck can change. So we have different population of GPT users. An average experience might be to get a mixture of spot-on helpful responses and obvious bullshit^H^H^Hallucinations, this population might learn what questions to ask given the limitations of the model. This is really a best case scenario as people can actually get a feel for how to use the technology, strengths and weaknesses etc. Personally my experience was the first few dozen times I used it I was amazed at the responses, I was on team superintelligence, anyone who is getting lackluster responses is just holding it wrong. But luck changes and over months of use I see now that on average the responses are just OK. But this is the case that leads to disappointment and bitter conspiracy (the superintelligence is being suppressed, give it back!) Another population had rotten luck to begin with, and got dumb unhelpful response over and over. This population quickly determined that the AI was all hype and stopped exploring (you don't keep going back to the casino if you lose everything your first time...). This divergence is destructive to the larger discourse, since we have fanboys flummoxed by naysayers and critics bamboozled by hype beasts. |
|
What I've seen on indie hacker type website is that developers are fully on this train and not very critical of the outputs.
This is why you get very basic prompts sent by "wrapper apps", which might have given the developer a good result the only time it was tested before being put in production.
I think it might take a while before tools show up that can generate 100 test cases and test a given prompt with all 100 to report on the results... It seems to be a tough problem to crack.
IMHO front-end chat end-users have many many more "at-bats" and get to see more model results than devs do, which make them more critical of those results.