Hacker News new | ask | show | jobs
by gamegoblin 1158 days ago
What reasons do you have for believing that is true?

It seems plausible to me that a general autoregressive LLM that is capable of completing text wouldn't take that much fine-tuning to shift it from "text completion" to "instruction following".

After all, the raw GPT3 model can be made to follow instructions with just a few examples.

Consider the prompt:

    What is the capital of France?
Raw GPT3, not the newer instruction-tuned variants, does not understand it's being asked a question. It offers the completion:

    What is the capital of France? If a student answers with a word, 
    she is asked to identify the word. She is not asked whether the 
    capital of France is Paris. On the other hand, if the student
    answers by pointing to a map, she is asked to identify the capital
    of France. She is not asked whether it is Paris.
It just starts appending to the text.

But if you give it a few examples, it happily gets into instruction following mode:

    The following is a transcript between a human and a helpful
    AI assistant who answers questions and obeys commands.

    Human: How many eggs are in a dozen?
    AI: 12
    Human: Say "hello" 3 times
    AI: hello hello hello
    Human: What is the capital of France?
    AI: 
GPT3 completes "Paris" here.

If you can get decent instruction/question following behavior out of a 2-shot example prompt, why do you think 15k is small for this?

2 comments

N-shot at inference-time is fundamentally different from training/fine-tuning which is inherently pre-inference-time.

Though it would be interesting to know if OpenAI has a few generic multishot inputs before the prompt.

It's all extremely cryptic what the actual context window and system prompt (assuming chatgpt even is using the same API the proles are given) is with them

The claim is not that they are fundamentally different or similar, the claim is that one doesn't need that much data to get instruction-following behavior from a raw autoregressive LLM. K-shot prompting shows that the capability to follow instructions is present in the model. It's just a matter of using fine-tuning to keep the model in that frame all the time without a K-shot prompt.
Just saying if you ask for capital of an obscure country that it hasn’t been trained on, you will not get the answer, so 15k will get you come general stuff only within the confines. Also, to code you will need pretty complete documentation for it to ingest and then enough examples on how the code is done
15k is not the full training corpus. The model is trained on huge swaths of internet text. 15k is just the fine-tuning corpus to show it how to follow instructions. Stuff like world capitals and such are already present in the model weights due to being trained on tons of internet text.

With the raw LLM, you can get the capital of Mongolia with the prompt "The capital of Mongolia is", i.e. text completion. The fine-tuning allows you to get at that information by asking questions or giving commands, e.g. "Tell me the capital of Mongolia"