Hacker News new | ask | show | jobs
by Legend2440 483 days ago
>No, it's a bug/emergent property and interpreting it as a feature is a simple misunderstanding of the software.

'Feature' has a different meaning in machine learning than it does in software. It means a measurable property of data, not a behavior of a program.

E.g. the language, style, tone, content, and semantics of text are all features. If text can be said to have a certain amount of 'evilness', then you have an evilness feature.

https://en.wikipedia.org/wiki/Feature_(machine_learning)

1 comments

Ahh that's true. However the way he phrased it "the fine tuning causes the feature" it's clear to me that the functionality meaning is used. But I can't pinpoint exactly why.

I think it's something about the incompatibility between the inertness of ML-features and potential-verbs of tradiditional-features.

The OP says "be evil" feature, and refers that the finetuning causes it. If it meant an ml-feature as a property of the data, OP would have said something like "evilness" feature.

To any extent if it were an ML-feature, it wouldn't be about evilness it would merely be the collection of features that were discouraged in training. Which at that point becomes somewhat redundant.

To summarize, if you finetune for any of the negatively trained tokens, the model will simplify by first returning all tokens with negative biases, unless you specifically train it not to bring up negative tokens in other areas.