| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by m3kw9 1205 days ago
	I’m not seeing how 15k q/a training can get you much other than the simplest things. Maybe that’s the point, get the ball rolling for people to add more training data?

3 comments

gamegoblin 1205 days ago

What reasons do you have for believing that is true?

It seems plausible to me that a general autoregressive LLM that is capable of completing text wouldn't take that much fine-tuning to shift it from "text completion" to "instruction following".

After all, the raw GPT3 model can be made to follow instructions with just a few examples.

Consider the prompt:

    What is the capital of France?

Raw GPT3, not the newer instruction-tuned variants, does not understand it's being asked a question. It offers the completion:

    What is the capital of France? If a student answers with a word, 
    she is asked to identify the word. She is not asked whether the 
    capital of France is Paris. On the other hand, if the student
    answers by pointing to a map, she is asked to identify the capital
    of France. She is not asked whether it is Paris.

It just starts appending to the text.

But if you give it a few examples, it happily gets into instruction following mode:

    The following is a transcript between a human and a helpful
    AI assistant who answers questions and obeys commands.

    Human: How many eggs are in a dozen?
    AI: 12
    Human: Say "hello" 3 times
    AI: hello hello hello
    Human: What is the capital of France?
    AI:

GPT3 completes "Paris" here.

If you can get decent instruction/question following behavior out of a 2-shot example prompt, why do you think 15k is small for this?

link

dontupvoteme 1205 days ago

N-shot at inference-time is fundamentally different from training/fine-tuning which is inherently pre-inference-time.

Though it would be interesting to know if OpenAI has a few generic multishot inputs before the prompt.

It's all extremely cryptic what the actual context window and system prompt (assuming chatgpt even is using the same API the proles are given) is with them

link

gamegoblin 1205 days ago

The claim is not that they are fundamentally different or similar, the claim is that one doesn't need that much data to get instruction-following behavior from a raw autoregressive LLM. K-shot prompting shows that the capability to follow instructions is present in the model. It's just a matter of using fine-tuning to keep the model in that frame all the time without a K-shot prompt.

link

m3kw9 1205 days ago

Just saying if you ask for capital of an obscure country that it hasn’t been trained on, you will not get the answer, so 15k will get you come general stuff only within the confines. Also, to code you will need pretty complete documentation for it to ingest and then enough examples on how the code is done

link

gamegoblin 1205 days ago

15k is not the full training corpus. The model is trained on huge swaths of internet text. 15k is just the fine-tuning corpus to show it how to follow instructions. Stuff like world capitals and such are already present in the model weights due to being trained on tons of internet text.

With the raw LLM, you can get the capital of Mongolia with the prompt "The capital of Mongolia is", i.e. text completion. The fine-tuning allows you to get at that information by asking questions or giving commands, e.g. "Tell me the capital of Mongolia"

link

swid 1205 days ago

It's used for fine tuning a pre-trained model. This takes an LLM that is already capable of emulating lots of different kinds of personalities, and narrows it down to act more like the examples. Since the heavy lifting has already been done, 15k examples of a chatbot following instructions they way you want has a significant effect.

link

whimsicalism 1205 days ago

Read about RLHF, i think you are misunderstanding what this will be used for.

link

esafak 1205 days ago

A specific reference would help readers.

link

whimsicalism 1205 days ago

good point! https://huggingface.co/blog/rlhf :)

i think the resources out there so far are not great yet

link