| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Aurornis 297 days ago

I’ve been putting questions into LLM research functions, including Claude’s research mode, and letting them churn until a report appears.

I’ve been starting with topics where I’m already familiar with the answer but want a refreshed. So far, I’m not impressed. Some times the info will be correct. Most of the time it strings together a lot of words from the material it finds but it reads like an undergrad trying to paraphrase the Wikipedia page without understanding the content. Often it will have one bullet point that is completely wrong.

The other problem I’m having is that it’s not very good at identifying poor sources. This is less of a problem with topics like math and engineering, but a big problem with topics like health and medicine where it will pick up alternative medicine and pseudoscience pages and integrate them into the research as if they were real. There are a lot of health and medicine topics where the way pseudoscience people talk about a subject doesn’t match the real science, but they use the same words and therefore catch the same search terms.

An example is the way “dopamine” is used in casual conversation and by influencers in ways that aren’t accurate. Concepts like “dopamine fasting” or claiming things “raise your dopamine” aren’t scientifically accurate but use the same words nevertheless and therefore can get pulled into the training set and searches.

2 comments

HarHarVeryFunny 297 days ago

There are basically three types of responses you can get from an LLM/agent:

1) A response originating from LLM pre-training, in a domain where there has not been any (successful) Rl-for-reasoning post-training. In this case the amount of reasoning around the raw facts "recalled" by the LLM is going to be limited by any reasoning present in the training data.

2) A non-agentic response in a domain like Math Olmypiad problems where the LLM was post-trained with RL to encourage reasoning mirroring this RL training set. This type of domain-specific reasoning training seems to have little benefit to other domains (although in the early LLM days it was said that training on computer code did provide some general benefit).

3) An agentic response, such as from one of these research systems, where it seems the agent is following some sort of generic research / summarization template with proscribed steps. I've never tried these myself, but it seems they can be quite successful in deep diving and gathering relevant source material, but then the ability to reason over this retrieved material is going to come down to the reasoning capability of the underlying model per 1) and 2) above.

Bottom line would seem to be that with today's systems domain specific reasoning capability largely comes down to RL post-training for reasoning in that specific domain, resulting in what some call "jagged" performance - excellent in some areas and very poor in others. Demis Hassabis, for one, seems to be saying that this will not be fixed until architectural changes/additions are made to bring us closer to AGI.

link

laborcontract 297 days ago

Claude's research mode is by far the worst one I've used, I consider it nearly useless. I cannot trust it specifically because Anthropic has a policy of refusing to use reddit for anything, whether it be as a research source or in claude chrome.

Reddit may not be the greatest source for hard science, but for things like "tell me what shoes people are finding helpful for their plantar fasciitis" I appreciate reddit's anecdata of reddit over every other source.

link

palmotea 297 days ago

> I cannot trust it specifically because Anthropic has a policy of refusing to use reddit for anything, whether it be as a research source or in claude chrome.

Reddit wants money for its users' data. Is the reason Anthropic doesn't want to pay Reddit's shareholders for it?

Also Sam Altman owns quite a lot of Reddit stock and was briefly the CEO, so it's not inconceivable he's influenced them not to cooperate with one of his chief rivals.

link

kridsdale1 296 days ago

AFAIK it’s Google that has signed a license deal with Reddit.

link