Hacker News new | ask | show | jobs
by sillysaurusx 2180 days ago
Fine tuning GPT-3 is one of the biggest challenges, because it's behind an API. The weights aren't available to researchers, so we can't make it do anything it doesn't already do.

But, that's fair. It's OpenAI's weights; they can keep them locked up if they want to. What caught my attention, though, is that supposedly OpenAI is working on a way to support fine-tuning.

If you think about the logistics of that, it's a very interesting challenge. The situation is this: 240GB of weights, as a webservice. Each fine-tuning session results in another copy of 240GB. So it clearly doesn't scale -- 1TB per 4 users isn't exactly efficient.

Except, not quite. You can solve this by adding additional layers, which you then fine-tune. So the base model is 240GB or whatever, and the extra layers morph the output to do what you want. Think of it as a GPT-3 with a GPT-2 1.5B stuck on the end of it.

It's a neat idea, because theoretically you'd get two models out of it: you can "break off" the end of the fine-tuned model, and you end up with the original model. So it would be very modular.

Are there other models that you can "break apart" to get different sub-models? Sort of like adding slots that give a model different capabilities.

5 comments

(I work at OpenAI.)

I am finishing up our fine-tuning API this weekend :).

If anyone on HN would like to try out the fine-tuning API (or want to build something on top of the base API), send me an email (gdb@openai.com) with your use-case and I can try to accelerate you in our invite queue.

PS: We're hiring — if you enjoy building APIs with Python/Go/Kubernetes/Kafka or building front-end interfaces in React, then please get in touch — gdb@openai.com.

Are there any products in the pipeline that you're planning to ship? Asking for prospective candidates.
There's just about infinite surface area with the API — we're trying to build a dead-simple API that developers can plug into any product in order to add intelligence features that would be otherwise impossible.

This requires a lot of traditional software work — API design, writing and maintaining a growing amount of business logic, providing great tools and interfaces to help our users work with the API, excellent documentation and tutorials, scaling and operating backend systems, etc — and machine learning systems work — building serving infrastructure for a great variety of giant neural networks while making the most efficient use of our hardware, allowing our users to interact with these neural networks in increasingly sophisticated ways, etc.

While we're just getting started and have a small team, we are already supporting customers across a wide variety of industries (see https://beta.openai.com/ for a sample) and serving millions of requests per day. We are busy trying to invite folks off a very long waitlist while building out the API to support everyone.

Would love more help :).

Emailed. I think I have an interesting perspective as a pro-hackathonner who regularly uses new technologies to build compelling demos. Haven’t heard back yet from my initial beta application, hope to be able to try it out and explore its potential.
Many ML models are like this (anything used in CV e.g. ResNet, VGG). For example, if you want to classify images as being hot dog or not hot dog (classes that do not exist in ResNet), you can take weights from a pretrained ResNet-50 and finetune the last layer based on a small training set of input images labeled hot dog and not hot dog. This lets you reuse the ResNet's feature detector layers, while plugging in specialized "is this a hot dog or not" fully connected layer.
If I understand correctly, I think you and the other poster are describing transfer learning.
same thing
You could encode the deltas cleverly and likely use much less than 240GB.
Yes, I guess you can have the API to provide you the intermediate layer outputs, instead of the predictions. However, if you then want to finetune your own extra layers using these intermediate outputs as inputs, would the API be able to produce them fast enough for you to do the finetuning of your own layers in reasonable time? That's assuming the extra layers are located on your own servers. Or would OpenAI be willing to actually create the extra layers on their own machines and let you finetune those? In the second scenario, you would need to move your dataset to their servers.

Actually, since they used Azure cloud to train GPT-3, I don't see why they wouldn't just let you pay for spinning up Azure instances to train your extra layers, and connect those to the model.

They can keep their weights secret, sure, but then they should change their name.