Hacker News new | ask | show | jobs
by nope96 1148 days ago
This space is moving so fast my jaw drops every time I read something about it. For example this line:

"... has discovered empirically that the capability for moral self-correction emerges at around 22B parameters. For more scientific details, the paper is here. [https://arxiv.org/pdf/2302.07459.pdf]"

1 comments

Yah but do those words - as we would typically understand them - mean the same thing when applied to a computational model? A great deal of intellectual legerdemain is going on in many of these descriptions.
I don’t know that it’s sleight of hand. Seems to me that we are finally approaching a point where it might be possible to more precisely define terms like “morality” in a mathematical sense.

This would suggest that moral philosophy could be transformed - in much the same way that natural philosophy turned into physics.

I’m just a layperson and don’t really know what I’m talking about, but I find this all very exciting.

I think it's sleight of hand in that "moral self-correction" is a very complicated phenomenon, and proving that a computing system is doing it would require an incredible amount of detailed theoretical and empirical work. Some of which, yes, might include much more careful definitions of morality. Until that work is done, I think it's somewhere between foolish and negligent to anthropomorphize LLMs.
Humans gave sailboats a gender. Anthropomorphization is our default.
Sure, and when there's no possibility of confusion, I'm all for it. It can be lovely and poetic. Here, though, I think it's dangerous.
I agree with you that there are outcomes that would be less ideal. If it helps, I refer to them as Intelligent Tools. I do prefer the "tool" metaphor (and so does Bing Chat) and I hope that companies like Microsoft rethink their "copilot" and "assistant" metaphors.

I don't think they're "dangerous" per se, I think metaphors matter and we should choose the best ones.

Morality has been defined by mathematical frameworks for centuries. It’s still subjective and basically meaningless
I meant mathematically as in a proof. With LLMs knowledge has become - well, tokenised. We can now study it at an information processing level which likely to be at least similar in structure to how knowledge is organised in the brain. This in turn is likely to give us access to the way knowledge itself works, in a way that was not previously possible.

So we can actually look at concepts like “morality” and see how that is encoded. And I’m confident that this will give us empirical insight into these concepts in a way no philosophy has been able to do until now.

What assures is that this apparent morality is not a side effect of morally-aware or biased training data? The lack of adherence to the scientific process in this field is saddening.
And thus a new moral framework was achieved that endorsed both sexual freedom and honor killing stonings
Morality is biologically and sociologically constructed. It's not consistent between people, social cliques, regions, or nations. It's a nebulous concept. And unlike, say, our understanding of disease processes, which may be fuzzy and inexact at times but has an external truth we can hope to discover, there is no ground truth that we are approaching in moral philosophy. There is no platonic morality that we approximate with our muddled intuitions. Morality is nothing more or less than those muddled intuitions. It cannot be distilled to cold logic.

What AI might enable however is a superior form of democratic process, wherein an AI surveys the entire population through a natural language interface and synthesizes the nuanced and conflicting desires of the entire society. This deliberative democracy process would ameliorate the distorting effects of campaign funding and issues like uninformed or misinformed voters and low voter participation. It could also allow a sort of citizen feedback line-by-line on proposed legislation and government action.

What centuries-old mathematical framework has defined morality?

Why would something being subjective imply it is meaningless?

I suppose they're talking about utilitarianism? But that's hand wavey math at best. Eg the whole "torture someone for 50 years or remove a speck of dust from trillions of eyeballs" debate. Those who chose the torture see utility as strictly additive and I'm not aware of any strict definition of utility which requires this. So any math in that instance is built on a shaky or even illusionary foundation.
If you can see all the links between why decisions are being made I’m not sure why you couldn’t deduce various causes from that data