Hacker News new | ask | show | jobs
by OutOfHere 653 days ago
Just because ChatGPT was used to help write a paper doesn't in itself mean that the data or findings are fabricated.
5 comments

Sure, but there are some... pretty egregious cases. https://mashable.com/article/ai-rat-penis-diagram-midjourney...
That’s the funniest piece of writing I’ve read in a longtime, thanks!

I wonder what they were thinking submitting the paper.

They let the machines think for them, that's the whole problem.
True. I am seeing chatgpt used by my colleagues (mostly no English native speakers) day to day and it mostly improves their writing (except for those wotfd that pop up a bit too often [0] like utilize [1]). So not all bad.

I am also hearing that a lot of reviewers and readers use it though. So we are often joking that PhD students (in CS) nowadays only write bullet point from their research. Generate prose that is used to generate bullet points.

[0] https://www.scientificamerican.com/article/chatbots-have-tho...

[1] https://medium.com/learning-data/words-and-phrases-that-make...

Scientific writing is pretty bad usually so I'll count this as an improvement
How can I trust the paper when there is no proper proofreading?
How do you know there is no proper proofreading? There is no way to tell, is there? Just because content was generated by an LLM doesn't in itself mean that it wasn't proofread.
> Methods

> We searched and scraped Google Scholar using the Python library Scholarly (Cholewiak et al., 2023) for papers that included specific phrases known to be common responses from ChatGPT and similar applications with the same underlying model (GPT3.5 or GPT4): “as of my last knowledge update” and/or “I don’t have access to real-time data” (see Appendix A).

If noone bothered to even spot and remove these, you can be pretty sure that no human ever read the whole paper before publication.

IMO, at this point, AI is very necessary as a pre-reviewer to weed out such papers that haven't been proofread. This is at both the journal as well as the preprint levels, preventing them from getting an audience.
You can probably find some quality stuff in your local landfill too, but I am personally unwilling to sift through garbage.
The problem is not that a paper has fabricated content generated by ChatGPT, the problem is that there are many papers and they are polluting scholarship to the point that the base of evidence used in policy-making could be poisoned to the point of uselessness.
Firstly, "fabricated content" is a meaningless phrase. For the sake of argument, I use Github Copilot for "fabricating" every line of code. Does this make my code polluted? No, because I review every line of code, editing what's necessary, and more. It's the same way with scholarship. It doesn't say anything in itself.

Perhaps "unreviewed scholarship" would be a more concerning claim, but I don't yet see the evidence for it being a major concern.