Hacker News new | ask | show | jobs
by uniqueuid 767 days ago
Hi,

thanks for the honest and thoughtful discussion you are conducting here. Comments tend to be simplistic and it's great to see that you raise the bar by addressing criticism and questions in earnest!

That said, I think the fundamental problem of such tools is unsolvable: Out of all possible analytical designs, they create boring existing results at best, and wrong results (i.e. missing confounders, misunderstanding context ...) as the worst outcome. They also pollute science with harmful findings that lack meaning in the context of a field.

These issues have been well-known for about ten years and are explained excellently e.g in papers such as [1].

There is really one way to guard against bad science today, and that is true pre-registration. And that is something which LLMs fundamentally cannot do.

So while tools such as data-to-paper may be helpful, they can only be so in the context of pre-registered hypotheses where they follow a path pre-defined by humans before collecting data.

[1] http://www.stat.columbia.edu/~gelman/research/unpublished/p_...

2 comments

Thanks much for these thoughtful comments and ideas.

I can’t but fully agree: pre-registered hypothesis is the only way to fully guard against bad science. This in essence is what the FDA is doing for clinical trials too. And btw lowering the traditional and outdated 0.05 cutoff is also critical imo.

Now, say we are in a utopian world where all science is pre-registered. Why can’t we imagine AI being part of the process that creates the hypotheses to be registered? And why can’t we imagine it also being part of the process that analyzes the data once it’s collected? And in fact, maybe it can even be part of the process that help collects the data itself?

To me, neither if we are in such a utopian world, nor in the far-from-utopian current scientific world, there is ultimately no fundamental tradeoff between using AI in science and adhering to fundamental scientific values. Our purpose with data-to-paper is to demonstrate and to provide tools to harness AI to speed up scientific discovery while enhancing the values of traceability and transparency and make our scientific output much more traceable and understandable and verifiable.

As of the question of novelty: indeed, research on existing public datasets which we have currently done cannot be too novel. Though scientists can also use data-to-paper with their own fascinating original data. It might help in some aspects of the analysis, certainly help them keep track of what they are doing and how to report it transparently. Ultimately I hope that such co-piloting deployment will allow us delegating more straight forward tasks to the AI and letting us human scientists to engage in higher level thinking and higher level conceptualization.

True, we seem to have a pretty similar perspective after all.

My concern is an ecological one within science, and your argument addresses the frontier of scientific methods.

I am sure both are compatible. One interesting question is what instruments are suitable to reduce negative externalities from bad actors. Pre-registration works, but is limited to few fields where the stakes are high. We will probably similarly see a staggered approach with more restrictive methods in some fields and less restrictive ones in others.

That said, there remain many problems to think about: E.g. what happens to meta-analyses if the majority of findings comes from the same mechanism? Will humans be able to resist the pull of easy AI suggestions and instead think hard where they should? Are there sensible mechanisms for enforcing transparency? Will these trends bring us back to a world in which trust was only based on prestige of known names?

Interesting times, certainly.

> That said, I think the fundamental problem of such tools is unsolvable: Out of all possible analytical designs, they create boring existing results at best, and wrong results (i.e. missing confounders, misunderstanding context ...) as the worst outcome. They also pollute science with harmful findings that lack meaning in the context of a field.

This doesn't seem correct to me at all. If new data is provided and the LLM is simply an advanced tool that applies known analysis techniques to the data, then why would they create “boring existing results”?

I don’t see why systems using an advanced methodology should not produce novel and new results when provided new data.

There is a lot of reactionary or even luddite responses to the direction we are headed with LLMs.

Sorry but I think we have very different perspectives here.

I assume you mean that LLMs can generate new insights in the sense of producing plausible results from new data or in the sense of producing plausible but previously unknown results from old data.

Both these things are definitely possible, but they are not necessarily (and in fact often not) good science.

Insights in science are not rare. There are trillions of plausible insights, and all can be backed by data. The real problem is the reverse: Finding a meaningful and useful finding in a sea of billion other ones.

LLMs learn from past data, and that means they will have more support for "boring", i.e. conventional hypotheses, which have precedent in training material. So I assume that while they can come up with novel hypotheses and results, these results will probably tend to conform to a (statistically defined) paradigm of past findings.

When they produce novel hypotheses or findings, it is unlikely that they will create genuinely meaningful AND true insights. Because if you randomly generate new ideas, almost all of them are wrong (see the papers I linked).

So in essence, LLMs should have a hard time doing real science, because real science is the complex task of finding unlikely, true, and interesting things.

Have you personally used LLMs within agent frameworks that apply CoT and OPA patterns or others from cognitive architecture theories?

I’d be surprised if you have used LLMs beyond the classic chat based linear interface that is commonly used and still have the opinions you do.

In my opinion, once you combine RAG and agent frameworks with raw observational input data they can absolutely do real reasoning, analysis, and create new insights that are meaningful and will be considered genuine new science. This project/group we are discussing have practically proven this with their replication examples. The reason this is possible is because the LLM is not just taught how to repeat information but it can actually reason and analyze at a human level and beyond when utilizing it’s capabilities within a well designed cognitive architecture using agents.