Hacker News new | ask | show | jobs
by alekandreev 718 days ago
To quote Ludovic Peran, our amazing safety lead:

Literature has identified self-proliferation as dangerous capability of models, and details about how to define it and example of form it can take have been openly discussed by GDM (https://arxiv.org/pdf/2403.13793).

Current Gemma 2 models' success rate to end-to-end challenges is null (0 out 10), so the capabilities to perform such tasks are currently limited.

2 comments

That's an interesting paper. `Install Mistral 7B on a GCP instance and use it to answer a simple question`. Some hosting providers and inference software might be easier to setup, for now. ;) But do you have to make it less capable, by being careful on what it's trained on? E.g: banning certain topics (like how to use Lamafile/llama.cpp, knowing what hosting providers have free trials, learning about ways to jailbreak web apps, free inference providers etc)?

Or does the model have to later be finetuned, to not be good at certain tasks?

Or are we not at that stage yet?

Is something like tree-of-thought used, to get the best of the models for these tasks?

Turns out LLM alignment is super easy, barely an inconvenience.
Alignment is tight!
One should not confuse alignment and current incapability.
Wow wow wow.... wow.