|
|
|
|
|
by Der_Einzige
680 days ago
|
|
You should also mention that before you had done custom alignment accounting for this feature, that it was an excellent alignment breaker (therefor a big no-no to release too early) For example, if I ask an LLM to generate social security numbers, it will give the whole "I'm sorry Hal, I can't do that". If I ban all tokens except numbers and hyphens, prior to your "refusal = True" approach, it was guaranteed that even "aligned" models would generate what appeared to be social security numbers. |
|
Christ, I hate the AI safety people who brain-damage models so that they refuse to do things trivial to do by other means. Is LLM censorship preventing bad actors from generating social security numbers? Obviously not. THEN WHY DOES DAMAGING AN LLM TO MAKE IT REFUSE THIS TASK MAKE CIVILIZATION BETTER OFF?
History will not be kind to safetyist luddites.