| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dnnssl2 1040 days ago

Knowledge instillation is probably the holy grail of fine tuning. The hard part is:

1. Generalizing new facts. You can create a question answer pair of: “what is the population of the world in 2023?” “8 billion”, but it may not be able to pick up alternate phrasing or “does the world have 8 billion people on it?”

2. Catastrophic and behavioral forgetting. Continued fine tuning after RLHF and instruction fine tuning may result in the loss of the alignment and instruction following capabilities trained by OpenAI. At worst, it will start spewing random tokens like the example in the post.

I have not yet seen it successfully done, and I suspect that updating fractions (~.1%) of the original weights with PEFT methods won’t help.

2 comments

BoorishBears 1040 days ago

Your answer is not really answering and is liable to confuse someone asking the question this person asked... the answer to their question is a simple: No.

Current fine tuning techniques can only contribute to knowledge indirectly (getting better queries for an external data source for example), you cannot directly embed new facts in the model is any generally efficient/effective manner.

There are toy examples of fine tuning in facts that are not of use outside of academic considerations at this point, and I sense it's contributing to the widespread confusion about fine-tuning's value proposition

link

dnnssl2 1040 days ago

There are a few reputable academic examples of factual editing, such as: https://rome.baulab.info/

I don’t believe that the answer is strictly no. There are still many questions around the fine tuning method and the scale of data, as well as expectations of task accuracy from the perspective of an end user.

link

mikeagb 1040 days ago

I agree that stating outright that the answer is no is a bit too strong of a statement. The general consensus has definitely been that fine-tuning (especially instruction fine-tuning) is primarily to pick up style over facts, but that doesn't mean it's not doable. Continuous pre-training is used to instill new knowledge, and the line where it becomes "fine-tuning" rather from "continuous pre-training" is not obvious to me.

link

joewferrara 1040 days ago

This line between fine tuning and continuous pre-training is what I’m interested in. What is the investment difference between fine tuning, contoured-training and training from scratch? Do you (or anyone else) have any good sources or know of good examples where continuous pre-training is being done?

link

BoorishBears 1040 days ago

> There are toy examples of fine tuning in facts that are not of use outside of academic considerations at this point, and I sense it's contributing to the widespread confusion about fine-tuning's value proposition

The answer for someone asking that question is a strict no. Many people asking this stuff only have access to SFT, so it's a super no for them.

Honestly I don't get this weird obsession right now with LLMs and throwing random roadblocks in any sort of common knowledge of the subject. If someone in CS 101 asked if they could write a game engine in CSS you wouldn't get people lining up to tell them the answer isn't "No." despite it technically being possible (https://github.com/brookjordan/css-game-engine) because we understand that sometimes to enable understanding of a subject you need to setup some solid ground for new entrants to stand on.

Fine-tuning is not for knowledge. If you get comfortable enough to start experimenting with that application, you'll understand that there's some nuance to that statement either way and get to research/tinker/push boundaries armed with enough knowledge to not accept the simple no.

link

Phemist 1040 days ago

Funny, as a corollary to how impractical a game engine would be in CSS: my first pass over the text had me reading CSS as something like C Sharp Sharp. A non-existent language that, according to my brain, still seems like a more likely language to build the game engine in

link

TeMPOraL 1040 days ago

Or, in other words, you were hallucinating, in the LLM sense.

link

Phemist 1040 days ago

What a fool believes, he sees.

https://en.wikipedia.org/wiki/Predictive_coding

link

ozr 1040 days ago

This is simply wrong. Information that did not exist in the model can be added to a model by finetuning, both full and PEFT. It has been repeatedly demonstrated in practice and in multiple papers.

link

BoorishBears 1040 days ago

That is simply noise.

Surely you saw the sibling comment that tries to make the exact same point and did so hours ago, the reply is the same:

The answer for someone asking that question is a strict no. Many people asking this stuff only have access to SFT, so it's a super no for them.

It's no different than teaching the Bohr model of the atom: we know it doesn't hold up to discoveries that you'll come across after it is established, but it doesn't matter because by the time you know enough to revisit the topic, you understand why the answer was a flat no then and can move past it on your own.

OP could have googled the topic but they asked human beings the question. They likely presumed they'd use their human sensibilities to understand the underlying intent of the question instead of parroting a list of toy experiments that would have zero benefit to them.

link

zwaps 1040 days ago

Having literally done it in an enterprise setting (and participated in experiments for some of the largest companies in the world in their respective domain fields), I have to say: your lack of nuance and abundance of arrogance does not come across very well.

It is important to distinguish between something being impossible, infeasible and not well understood. Fine-tuning "for effect" is mostly the latter.

You say "current fine-tuning techniques can only contribute to knowledge indirectly" and then in the next post row back to "except in toy examples" because the former is - literally - not correct.

This is HN. We are not advising clients on how "to get their data into their AI best". We can discuss here the actual technical detail of a thing. An intellectually honest discussion begins with saying: "From a scientific standpoint, and even from a practical standpoint, we are not sure yet, however..."

link

BoorishBears 1040 days ago

"advising clients" is such an odd way of describing "making a complex topic approachable"

But you're correct, this is HN: so much pontificating without producing a single counterfactual implies you should speak for yourself and not the collective.

They said "LLM", but given the context it's an RLHF LLM, and presumably they want a generalized way to add factual information in a way that doesn't cripple the model's general performance (yes, I am being so arrogant as to draw obvious conclusions to give them a useful answer)

No paper on the subject has achieved this, the ones that come close (and by close I mean very far) fall back to BERT sized models which I already addressed below: so please petition your "enterprise" to share their secrets

(wrong crowd to get any gravitas out of the word enterprise btw, we understand it means "constrained usecase with minimal external validation")

link

redox99 1040 days ago

> I have not yet seen it successfully done, and I suspect that updating fractions (~.1%) of the original weights with PEFT methods won’t help.

Nitpick, but although when training LoRAs you're only training 1% or less (depending on rank) of the number of parameters of the entire model, the adapters affect the entire model and after merging the LoRA all of the weights of the model are updated.

link