Hacker News new | ask | show | jobs
by glenstein 357 days ago
One fascinating aspect of LLMs is they make out-in-the-wild anecdotes instantly reproducible or, alternatively, comparable to results from others with different outcomes.

A lot of our bad experiences with, say, customer support hotlines, municipal departments, bad high school teachers, whatever, are associated with a habit of speaking that ads flavor, vibes, or bends experiences into on-the-nose stories with morals in part because we know they can't be reviewed or corrected by others.

Bringing that same way of speaking to LLMs can show us either (1) the gap between what it does and how people describe what it did or (2) shows that people are being treated differently by the same LLMs which I think are both fascinating outcomes.

3 comments

LLMs are definitely not instantly reproducible. The temperature setting adjust randomness and the models are frequently optimized and fine tuned. You will very different results depending on what you have in your context. And with a tool like Microsoft copilot, you have no idea what is in the context. There are also bugs in the tools that wrap the LLM.

Just because other people on here say “worked for me” doesn’t invalidate OPs claim. I have had similar times where an LLM will tell me “here is a script that does X” and there is no script to be found.

I was intentionally broad in my claim to account for those possibilities, but also I would reject the idea that instant reproducibility is generally out of reach on account of contextual variance for a number of reasons.

Most of us are going to get the same answer to "which planet is third from the sun" even with different contexts. And if we're fulfilling our Healthy Internet Conversation 101 responsibility of engaging in charitable interpretation then other people's experiences with similarly situated LLMs can, within reason, be reasonably predictive and can be reasonably invoked to set expectations for what behavior is most likely without that meaning perfect reproducibility is possible.

I think it really depends on the UI, like if it was in some desktop native experience maybe it accidentally produced a response assuming there would have a code canvas or something and sent the code response under a different JSON key.
We're also seeing a new variant of Cunningham's law:

The best way to get the right answer from an LLM is not to ask it the right question; it's to post online that it got the wrong answer.

> One fascinating aspect of LLMs is they make out-in-the-wild anecdotes instantly reproducible

How? I would argue they do the exact opposite of that.

Asking the number of Rs in the word Strawberry is probably the most famous one.