| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sanderjd 962 days ago

How does one efficiently learn how to do such things, and what kinds of problems such approaches are fruitful for?

I find there to be a giant gap in learning about this stuff between material that boils down to "use magic words and system prompts to improve results from one of the big models" and "how do LLMs work from first principles".

I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.

5 comments

Tostino 962 days ago

So I described my approach to how I fine tune a specific task below to another user, but I'll copy it here:

> Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses. > Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.

> Measure the quality of your model vs OpenAI for the same inputs, and then swap out the model in your workflow once happy with the results.

link

capableweb 962 days ago

If you do this, be careful how/if you publish your weights trained on OpenAI output as if they look into how it was generated and it becomes clear you broke the ToS, they'll most likely ban you from the platform.

link

leereeves 962 days ago

How would they "look into how it was generated"?

link

capableweb 962 days ago

You train your model, publish it on huggingface and then write in the README:

> This is how I made this model: Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses. > Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.

Just one example.

link

Tostino 962 days ago

I'm not competing with OpenAI in any sense of the word.

link

jiveturkey 962 days ago

I haven't read the ToS, but it may not specifically require competing to be a violation.

link

kcorbitt 962 days ago

If you're looking for a practical guide to getting started with fine tuning, I wrote one a couple of months ago that got pretty popular here on HN. Might be helpful if you're interested in playing around with it! https://news.ycombinator.com/item?id=37484135

link

jachee 962 days ago

The industry term for that middle ground is a “moat”, and the people who are most familiar with it are getting paid for what they know, so they’re not giving it away.

link

sanderjd 961 days ago

I think that may be right, but if so, that seems pretty unusual to me.

I've gone through a few of these "new kinds of software becoming useful" transition periods - most notably applications moving to the web, and then native smart phone applications - and in none of those transitions was there a dearth of resources on how to spin up on doing useful things due to this "moat" concern.

Nobody was protecting their iphone app dev moat by not publishing books and training courses on Objective-C and XCode...

link

Swizec 962 days ago

> I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.

Read papers, build intuition, experiment.

That last part may be the most important.

link

sanderjd 961 days ago

I think this is the disconnect: It doesn't strike me that what I'm talking about has anything to do with "papers". So from your comment, I'm once again left wondering what you mean.

My sense is that I have a much better grasp of the foundational material here, having read in depth books and papers about that, but still can't quite wrap my head around the question of how people are actually "operationalizing" this into useful software.

But to your point about experimentation, it might just be the kind of thing where there is no path to enlightenment besides working on a project and running into and overcoming all the hurdles along the way.

link

danielmarkbruce 962 days ago

huggingface is your friend.

link