Hacker News new | ask | show | jobs
by PlasmonOwl 361 days ago
Ok so I am always interested in these papers as a chemist. Often, we find that the LLM are terrible at chemistry. This is because the lived experience of a chemist is fundamentally different from the education they receive. Often, a masters student takes 6 months to become productive at research in a new sub field. A PhD, around 3 months.

Most chemists will begin to develop an intuition. This is where the issues develop.

This intuition is a combination of the chemists mental model, and how the sensory environment stimulates that. As a polymer chemist in a certain system maybe brown means I see scattering hence particles. My system is supposed to be homogeneous so I bin the reaction.

It is often known that good grades don’t make good researchers. That’s because researchers aren’t doing rote recall.

So the issue is this: we ask the LLM how many proton environment in this nmr?

We should ask: I’m intercalating Li into a perovskite using BuLi. Why does the solution turn pink?

4 comments

I think a huge reason why LLMs are so far ahead in programming is because programming exists entirely in a known and totally severed digital environment outside our own. To become a master programmer all you need is a laptop and an internet connection. The nature of it existing entirely in a parallel digital universe just lends itself perfectly to training.

All of that is to say that I don't think the classic engineering fields have some kind of knowledge or intuition that is truly inaccessible to LLMs, I just think that it is in a form that is too difficult right now to train on. However if you could train a model on them, I strongly suspect they would get to the same level they are at today with software.

> I think a huge reason why LLMs are so far ahead in programming

Are they? Last time I checked (couple of seconds ago), they still made silly mistakes and hallucinated wildly.

Example: https://imgur.com/a/Cj2y8km (AI teaching me about the Coltrane operator, that obviously does not exist).

You're using the worst model when it comes to programming, not sure what point you're trying prove here. That's why when someone starts ranting how useless ai models are when it comes to coding I always assume they're just using inferior models.
My question was very simple. Suitable for a simpler model.

I can come up with prompts that make better models hallucinate (see post below).

I don't understand your objection. This is a known fact, LLMs hallucinate shit regardless of the model size.

LLMs are getting better. Are you?

Nothing matters in this business except the first couple of time derivatives.

Maybe I'm not.

However, I'm discussing this within the context of the study presented in the paper, not some future yet-to-be-achieved performance expectation.

If we step outside the context of the paper (not advised), I think any average developer is better than an LLM at energy efficiency. LLMs cheat by consuming more resources than a human. "Better" is quite relative. So, let's keep reasonable.

Are you intentionally sandbagging the LLMs to prove a point, or do you really think 4o-mini is good enough for programming?

Even 2.5 flash easily gets this https://imgur.com/a/OfW30eL

The point is that I can make them hallucinate quite easily. And they don't demonstrate knowing their own limitations.

For example, 2.5 Flash fails to explain the difference between the short ternary operator (null coalescing) and the Elvis operator.

https://imgur.com/a/xKjuoqV

Even when I specify a language (therefore clearing the confusion, supposedly), it still fails to even recognize the Elvis operator by its toupe, and mixes it up the explanation (it doesn't even understand what I asked).

https://imgur.com/a/itr87hM

So, the point I'm trying to make is that they're not any better for programming than they're for chemistry.

Flash is the wrong model for questions like that -- not that you care -- but if you'd like to share the actual prompt you gave it, I'll try it in 2.5 Pro.
"explain me the difference between the short ternary operator and the Elvis operator"

When it failed, I replied: "in PHP".

You don't seem to understand what I'm trying to say and instead is trying to defend LLMs for a fault that is a fact known in the industry at large.

I'm sure that in short time, I could make 2.5 Pro hallucinate as well. If not on this question, on others.

This behavior is inline with the paper conclusions:

> many models are not able to reliably estimate their own limitations.

(see Figure 3, they tested a variety of models of different qualities).

This is the kind of question a junior developer can answer with simple google searches, or by reading the PHP manual, or just by testing it on a REPL. Why do we need a fancy model in order to answer such a simple inquiry? Would a beginner know that the answer is incorrect and he should use a different model?

Also, from the paper:

> For very relevant topics, the answers that models provide are wrong.

> Given that the models outperformed the average human in our study, we need to rethink how we teach and examine chemistry.

That's true for programming as well. It outperforms the average human, but then it makes silly mistakes that could confuse beginners. It displays confidence in being plain wrong.

The study also used manually curated questions for evaluation, so my prompt is not some dirty trick. It's totally inline with the context of this discussion.

They aren't getting any better at programming, so they naturally assume the LLMs aren't, either.
>the lived experience of a chemist is fundamentally different from the education they receive. Most chemists will begin to develop an intuition.

Is this a documentation problem? The LLMs are only trained on what is written down. Seems to track with another comment further down quoting:

"Models are limited in ability to answer knowledge-intensive questions, probably because the required knowledge cannot easily be accessed via papers but rather by lookup in specialized databases, which the humans used to answer such questions"

>using BuLi. Why does the solution turn pink?

I would say odds are because of an impurity. My first guess might be the solvent if there is more in action than reagents or reactants. Maybe could be confirmed or denied by some carefully figured filtration beforehand, which might not even be that difficult. I doubt I would try much further than that unless it was a bad problem.

Although for instance an alternate simple purification like distillation is pretty much routine for pure aniline to get some colorless material, and that's some pretty rough stuff to handle.

Now I once was a young chemist facing AI, I ended up highly focused on going forward in ways that would not be "taken over" by AI, and I knew I couldn't be slow or recession still might catch up with me, plus the 1990's were approaching fast ;)

By the mid 1990's I figured there's no way the stuff they have in this paper had not been well investigated.

I always knew it would take people that had way more megabytes than I could afford.

Sheesh, did I overestimate the progress people were making when I wasn't looking.

Just out of curiosity (not knowing anything about butyllithium other than what I've read on 'Things I Won't Work With'), is this answer from o3-pro even close?

https://chatgpt.com/share/685041db-c324-800b-afc6-5cb2c5ef31...