| Fine tuning GPT-3 is one of the biggest challenges, because it's behind an API. The weights aren't available to researchers, so we can't make it do anything it doesn't already do. But, that's fair. It's OpenAI's weights; they can keep them locked up if they want to. What caught my attention, though, is that supposedly OpenAI is working on a way to support fine-tuning. If you think about the logistics of that, it's a very interesting challenge. The situation is this: 240GB of weights, as a webservice. Each fine-tuning session results in another copy of 240GB. So it clearly doesn't scale -- 1TB per 4 users isn't exactly efficient. Except, not quite. You can solve this by adding additional layers, which you then fine-tune. So the base model is 240GB or whatever, and the extra layers morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B stuck on the end of it. It's a neat idea, because theoretically you'd get two models out of it: you can "break off" the end of the fine-tuned model, and you end up with the original model. So it would be very modular. Are there other models that you can "break apart" to get different sub-models? Sort of like adding slots that give a model different capabilities. |
I am finishing up our fine-tuning API this weekend :).
If anyone on HN would like to try out the fine-tuning API (or want to build something on top of the base API), send me an email (gdb@openai.com) with your use-case and I can try to accelerate you in our invite queue.
PS: We're hiring — if you enjoy building APIs with Python/Go/Kubernetes/Kafka or building front-end interfaces in React, then please get in touch — gdb@openai.com.