Hacker News new | ask | show | jobs
by jpwagner 1200 days ago
At some point the bibliography needs to reference the LLM itself which would need to be hosted indefinitely.
1 comments

I do not think it is feasible to make sure prompts are reproducible. Considering that a LLMs are large you can not host every version of the model indefinitely.
ChatGPT when asked this question answers that its responses are probabilistic, so the responses aren't reproducible. I tested that myself, of course. Since it gave me 2 different (but overall equivalent) answers from the same prompt I'd have to agree.
That’s because it’s configured to with non-zero temperature. I’d you use the underlying model API, or the playground, you can get repeatable results when temperature is zero.
Mostly, yes, but there have been reported cases where even with zero temperature, you get nondeterminism, probably due to accumulation of errors due to different operation order on floating points.
You can also generally set an RNG seed to get reproducible results.
if it's not reproducible, it's not science
Some would argue that, due to the unfortunately near-universal deprecation of paper authors after 70-90 years, the actual process of writing any particular paper is not and has never been reproducible. As opposed to experiments, which are reproducible and are not generally contained within the operating weights of a LLM nor the thoughts of a human.
Your observation, while accurate, is a tangent. The point of the bibliography in the context of an academic paper is to reference the academic merit of the work. In the case of science, this would be reproducible experiments (ideally).

Perhaps you would prefer to include the generated text source as an author.

I'd rather not include it at all, for exactly that reason. It's just a writing tool- the paper is either correct or incorrect on the same basis as any other paper. We include bibliographies to ensure that the relevant scientific data is present, but I don't think there's any reason to say that a non-reproducible abstract isn't science
It belongs in the acknowledgments, along with Bob’s wife who did a bit of proofreading, Steve McProfessor who had a chat with the authors once, and whatever software was used for the figures.
This. It’s crazy that people are thinking we should _credit_ the model as if it were an author. It’s a tool, and should be usable without limit.

It would however be nice to have a mode where any output that matches some existing text in the training set could be highlighted, to help one avoid unintentional plagiarism.