| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jpwagner 1200 days ago
	At some point the bibliography needs to reference the LLM itself which would need to be hosted indefinitely.

1 comments

WolfOliver 1200 days ago

I do not think it is feasible to make sure prompts are reproducible. Considering that a LLMs are large you can not host every version of the model indefinitely.

link

m3galinux 1200 days ago

ChatGPT when asked this question answers that its responses are probabilistic, so the responses aren't reproducible. I tested that myself, of course. Since it gave me 2 different (but overall equivalent) answers from the same prompt I'd have to agree.

link

clbrmbr 1200 days ago

That’s because it’s configured to with non-zero temperature. I’d you use the underlying model API, or the playground, you can get repeatable results when temperature is zero.

link

xyzzyz 1200 days ago

Mostly, yes, but there have been reported cases where even with zero temperature, you get nondeterminism, probably due to accumulation of errors due to different operation order on floating points.

link

sdenton4 1200 days ago

You can also generally set an RNG seed to get reproducible results.

link

jpwagner 1200 days ago

if it's not reproducible, it's not science

link

ravi-delia 1200 days ago

Some would argue that, due to the unfortunately near-universal deprecation of paper authors after 70-90 years, the actual process of writing any particular paper is not and has never been reproducible. As opposed to experiments, which are reproducible and are not generally contained within the operating weights of a LLM nor the thoughts of a human.

link

jpwagner 1200 days ago

Your observation, while accurate, is a tangent. The point of the bibliography in the context of an academic paper is to reference the academic merit of the work. In the case of science, this would be reproducible experiments (ideally).

Perhaps you would prefer to include the generated text source as an author.

link

ravi-delia 1200 days ago

I'd rather not include it at all, for exactly that reason. It's just a writing tool- the paper is either correct or incorrect on the same basis as any other paper. We include bibliographies to ensure that the relevant scientific data is present, but I don't think there's any reason to say that a non-reproducible abstract isn't science

link

kergonath 1200 days ago

It belongs in the acknowledgments, along with Bob’s wife who did a bit of proofreading, Steve McProfessor who had a chat with the authors once, and whatever software was used for the figures.

link

clbrmbr 1200 days ago

This. It’s crazy that people are thinking we should _credit_ the model as if it were an author. It’s a tool, and should be usable without limit.

It would however be nice to have a mode where any output that matches some existing text in the training set could be highlighted, to help one avoid unintentional plagiarism.

link