| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Tostino 962 days ago

You do realize how possible it is to fine tune a task like this (along with a hundred others in a similar vein) on a tiny model you can scale on your own hardware?

I've run hundreds of millions (150m so far in a couple of weeks of non-continuous running as I tweaked things) of tokens through my 2x 3090 with a 13b llama2 model I fine tuned on tasks like: summary, knowledge graph generation, writing using the knowledge graph, grammar, spelling, and transcription correction, etc.

This type of stuff is going to be done at scale with a modest budget if you have the skills to tune more efficient and faster models to your use cases.

8 comments

woadwarrior01 962 days ago

It's even easier than that. There's no need to even fine tune an LLMs to do it. Here's a screenshot[1] of a 4 bit quantised version of an off the shelf open LLM (WizardLM 13B v1.2) doing it on my Mac.

[1]: https://imgur.com/a/S9jnHWJ

a_wild_dandan 962 days ago

Yep, I use Llama2 70b for larger tasks on my MacBook and 13b for more “single use” type tasks. It’s a game changer.

Tostino 962 days ago

That may be true, and for some tasks the accuracy may be high enough. I have gotten much more consistency in my tasks by fine tuning though.

Getting a consistently good result for one shape of input may not indicate that same performance for another shape of input for example.

manwithaplan 961 days ago

The system confabulated the www subdomain of the “URL provided in the text”, right?

sanderjd 962 days ago

How does one efficiently learn how to do such things, and what kinds of problems such approaches are fruitful for?

I find there to be a giant gap in learning about this stuff between material that boils down to "use magic words and system prompts to improve results from one of the big models" and "how do LLMs work from first principles".

I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.

Tostino 962 days ago

So I described my approach to how I fine tune a specific task below to another user, but I'll copy it here:

> Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses. > Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.

> Measure the quality of your model vs OpenAI for the same inputs, and then swap out the model in your workflow once happy with the results.

capableweb 962 days ago

If you do this, be careful how/if you publish your weights trained on OpenAI output as if they look into how it was generated and it becomes clear you broke the ToS, they'll most likely ban you from the platform.

leereeves 962 days ago

How would they "look into how it was generated"?

capableweb 962 days ago

You train your model, publish it on huggingface and then write in the README:

> This is how I made this model: Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses. > Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.

Just one example.

Tostino 962 days ago

I'm not competing with OpenAI in any sense of the word.

kcorbitt 962 days ago

If you're looking for a practical guide to getting started with fine tuning, I wrote one a couple of months ago that got pretty popular here on HN. Might be helpful if you're interested in playing around with it! https://news.ycombinator.com/item?id=37484135

jachee 962 days ago

The industry term for that middle ground is a “moat”, and the people who are most familiar with it are getting paid for what they know, so they’re not giving it away.

sanderjd 961 days ago

I think that may be right, but if so, that seems pretty unusual to me.

I've gone through a few of these "new kinds of software becoming useful" transition periods - most notably applications moving to the web, and then native smart phone applications - and in none of those transitions was there a dearth of resources on how to spin up on doing useful things due to this "moat" concern.

Nobody was protecting their iphone app dev moat by not publishing books and training courses on Objective-C and XCode...

Swizec 962 days ago

> I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.

Read papers, build intuition, experiment.

That last part may be the most important.

sanderjd 961 days ago

I think this is the disconnect: It doesn't strike me that what I'm talking about has anything to do with "papers". So from your comment, I'm once again left wondering what you mean.

My sense is that I have a much better grasp of the foundational material here, having read in depth books and papers about that, but still can't quite wrap my head around the question of how people are actually "operationalizing" this into useful software.

But to your point about experimentation, it might just be the kind of thing where there is no path to enlightenment besides working on a project and running into and overcoming all the hurdles along the way.

danielmarkbruce 962 days ago

huggingface is your friend.

crazygringo 962 days ago

But not at webscale. It's fine if you want to summarize something for personal use. The size model you're talking about is still way too large if you're trying to harvest millions of e-mail addresses from billions of webpages.

jlund-molfese 962 days ago

I'm also looking forward to what Apple Mail and other local clients are able to do. My laptop's CPU is idle most of the time, why not use that extra CPU time to do something cool like filter spam better?

diarrhea 962 days ago

Microsoft already does that, and its Antimalware agent is the bane of my existence. It will see idle machines spin up their fans to full and drain batteries within a short few hours. No thank you!

imacomputertoo 962 days ago

That sounds like something that's easily fixable with battery saving options. Basically, when in battery, don't do that. That would be a good default.

heavyset_go 962 days ago

Because that costs energy, the fact that your CPU is idle most of the time is why you can get hours of battery life.

Tostino 962 days ago

When plugged into the grid, it makes sense to spend a few cents of energy a day to filter out unwanted solicitations, harassment that you may not want to see, scam emails or texts, etc.

If I didn't have to worry about my grandparents getting scammed because they were having 99.99% of it effectively filtered or warned about at one layer or another before it actually became a problem...can you imagine how much you could lower that type of fraud/abuse?

MillionOClock 962 days ago

> When plugged into the grid

Exactly! Apple for instance already does this with some ML tasks that only run when your device is plugged in, I think it's a great compromise.

elygre 962 days ago

The grid, of course, is less sure about this compromise.

oceanplexian 962 days ago

When I cook a roast in the oven it uses a couple of KWh. That should cover charging a Macbook for like, a month or two. I think we will be ok.

Tostino 962 days ago

The grid can be negotiated with if we put in the infrastructure.

ceejayoz 962 days ago

Settings > Battery > Health and Charging already has a “selectively charge when green energy is available” setting.

Some thermostats prioritize low-usage times, too.

vasco 962 days ago

My guess is you wouldn't lower it by much because there's more incentives for attackers than for defenders to invest in these approaches, so it's likely that by the time grandmas are running LLM-based anti-fraud tooling the attackers will already be running LLM-based attacks as well.

smsm42 962 days ago

You don't need a "model" for this - I remember a Coursera course on ML I did some years ago, and one of the exercises was email extraction. With some very basic algorithms, nothing more than a bunch of common python libraries and couple of days of work, it's possible to extract over 90% of emails with commonly used tricks. I'm not sure the remaining number is worth making more complicated models for it - the returns are quickly diminishing, and wasting time on spamming people who are clever enough to invent their own unique email hiding technique probably doesn't have a good ROI anyway.

GTP 962 days ago

Why finetune a LLM if you can defeat most obfuscation tecniques with a few regex?

ac2u 962 days ago

Because there's a chance your LLM might be able to still get what you need if the obfuscation technique is changed or altered.

Anyway, nothing to say you can't use both, or have a fallback system.

imranq 962 days ago

Is it possible to know the minimum model size / data set size it takes to train a model given certain efficiency parameters (latency, etc.)?

giancarlostoro 962 days ago

If OpenAI can generate those for customers they will make a killing. Export the piece out of ChatGPT you care about and run it on-prem for way less.

Tostino 962 days ago

It's entirely possible without OpenAI doing anything else. Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses.

Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.

Measure the quality of your model vs OpenAI for the same inputs, and then swap out the model in your workflow once happy with the results.