| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by luke-stanley 723 days ago

Given the goal of mitigating self-proliferation risks, have you observed a decrease in the model's ability to do things like help a user setup a local LLM with local or cloud software?

How much is pre-training dataset changes, how much is tuning?

How do you think about this problem, how do you solve it?

Seems tricky to me.

1 comments

alekandreev 722 days ago

To quote Ludovic Peran, our amazing safety lead:

Literature has identified self-proliferation as dangerous capability of models, and details about how to define it and example of form it can take have been openly discussed by GDM (https://arxiv.org/pdf/2403.13793).

Current Gemma 2 models' success rate to end-to-end challenges is null (0 out 10), so the capabilities to perform such tasks are currently limited.

link

luke-stanley 722 days ago

That's an interesting paper. `Install Mistral 7B on a GCP instance and use it to answer a simple question`. Some hosting providers and inference software might be easier to setup, for now. ;) But do you have to make it less capable, by being careful on what it's trained on? E.g: banning certain topics (like how to use Lamafile/llama.cpp, knowing what hosting providers have free trials, learning about ways to jailbreak web apps, free inference providers etc)?

Or does the model have to later be finetuned, to not be good at certain tasks?

Or are we not at that stage yet?

Is something like tree-of-thought used, to get the best of the models for these tasks?

link

moffkalast 722 days ago

Turns out LLM alignment is super easy, barely an inconvenience.

link

josh-sematic 722 days ago

Alignment is tight!

link

dinosaurdynasty 722 days ago

One should not confuse alignment and current incapability.

link

mdrzn 722 days ago

Wow wow wow.... wow.

link