Hacker News new | ask | show | jobs
by renaissancebro 73 days ago
Honestly this is early proof of concept, I haven't stress tested much beyond getting the mechanisms to run. A lot of the implementation was AI-assisted — I'm more the person who had the idea and kept pushing it in a direction than a deep ML researcher. Your virtue-signaling concern is exactly the kind of thing I don't have an answer to yet, that's part of why I posted it. More looking for people like you who are thinking about this to poke holes in it. I would love to see your approach if you would be willing to share.
1 comments

Empathy is hard to model in my mind curious how you model that mechanically because it seems to require simulating another agent's internal state which is its own unsolved problem. Shame only needs an internal standard and a deviation measure, all internal. Also wondering if your Is/Ought primitives actually survive the RLHF layer or get overridden the same way context does. My uncertainty module doesn't solve the palming problem either — it's more of a smoke detector than a fix. At least flags when the model is hedging.