Hacker News new | ask | show | jobs
by uniqueuid 761 days ago
With all the positive comments here, I feel like someone should play the role of the downer.

First of all, it's inevitable that LLMs will be/are used in this way and it's great to see development and discussion in the open! That's really important.

Secondly, this will absolutely destroy some areas of science even more than they have already been.

Why? First, science as all of humankind is always a balance between benevolent and malevolent actors. Science already battles data forgery, p-hacking and replication issues. Giving researchers access to tools like this will mean that some conventional quality assurance processes will fail hard. Double-blind peer review will no longer work when there are 10:1 or 100:1 AI generated to high-quality submissions.

Second, doing analysis and writing a paper is one bottleneck of science, but epistemologically, it's not the important one. There are innumerable ways to analyze extant data and it's completely moot to do any analysis in this way. Simmons, Nelson and Simonsohn / Gelman et al. etc have shown: Given a dataset, (1) the findings you can get are practically always from very negative effects to very positive effects, depending on the setup of the analysis. So having one analysis is pointless, especially without theory. (2) even when you give really good labs the same data and question, almost nobody will get the same result (many labs experiment).

What does this tell us? There are a few parts of science that are extremely important and without them science is not only low-impact, it even has a harmful effect by creating costs for pruning and distilling findings. The really important part are causal analyses, and they practically always involve data collection. That's why sciences with strong experimental traditions fare a bit better - when you need to run a costly experiment yourself in order to publish a paper, this creates a strong incentive to think things through and do high-impact research.

So yeah, we've seen this coming and it must create a big backlash that prevents this kind of research from being published, even if vetted humans.

Source: am a scientist, am a journal editor.

3 comments

Agreed as a former scientist (theoretical high energy physics). I’ve yet to meet one person in related fields who’s enthusiastic about giving paper mills a 2000% productivity boost while giving honest people a 20% boost at best, and by the looks of it, this kind of data-to-mindless-statistical-correlation agents will hit the already bullshit-laden, not very scientific fields the hardest. I’m not sure that future can be deterred though, the cat is already out of the bag.
I just hope that one day we find the jerk who put the poor animal in the bag in the first place.

Sorry, I just had to. Hottest day of the year in the UK today and warm weather causes me to lose inhibition.

Generally speaking, I defer to your expertise point of view in the matter, and I agree that it will be far easier to generate meaningless research that passes the test of appearing meaningful to reviewers than it will be to generate meaningful research that passes the test of appearing meaningful to reviewers.

However, it is an open secret that this is already true, is the thing. Meaningful peer review is already confined to islands within a system that has devolved into generating content. The automation of the process doesn't represent a tipping point, and I don't think that the ethically disclosed production of 'research' by large language models is going to represent a significant part of the problem. The errors of the current system will be reduced to absurdity by the existent ethical norms.

So, in the report, the statement "the power of AI to perform complete _end-to-end_ scientific research" is a blatant lie. Given that your comment seems to be the most reasonable one, and considering that I've seen, over and over, that it's always the domain experts who are the least enthusiastic about AI byproducts, I recalled a saying from the Shogun series:

"Why is it that only those who have never fought in a battle are so eager to be in one?"

Thanks, that's a nice quote.

With regard to the debate, I think it's good not to engage in too much black-and-white thinking. Science itself is a pretty muddy affair, and we still haven't grown beyond simplistic null hypothesis significance testing (NHST), even decades after its problematic implications became clear.

That's why it's so important to look at the macro implications: I.e. how does this shift costs? As another comment nicely put it, LLMs are empowering good science, but they are potentially empowering bad science at an order of magnitude more.

Having a design background, I agree completely. To explain why design matters in this case, we simply need to look at ergonomic factors: literally the “economy of work.” That’s why I pointed out the "end to end" claim as a lie because it’s impossible to assert such things without thorough testing of the applications and continued analysis of its effects on the whole supply chain. Most of those AI byproducts will likely be laughable in the coming decades, similarly to the recurring weird-form-factor boom surrounding whatever device is in vogue. Refer to the video linked in [1] for good examples of weird PC input devices from the 2000s. It takes considerable time for the most viable form-factors to be established, and once that’s achieved, then the designs of the vast majority of products within a category converge to the most ergonomic (and economic) one. What bothers me most is not the advent of novelty and experiments, but the overconfidence and overpromises surrounding what are merely untested product hypotheses for most of AI applications. The negligible marginal cost of producing derivative work in software, fueled by the high availability of accessible tooling and lack of rigorous design and scientific training, is to blame. Never mind the hype cycle, which is natural and expected. In times like these, it is when we most need pragmatic skepticism. I wonder if AI developers at all care to do the bare minimum due diligence required to launch their products. Seems to be a rare thing in SWE in general.

[1] https://youtu.be/Sbtgc6mi44M?si=X2e0DSlxZjC7_YOf