| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by killthebuddha 1139 days ago

That sounds like a real risk but also the kind of thing you would need to implement a solution for anyways.

It seems like there's two clear paths:

- Allow the model to complete whatever it wants and then anneal the structure into compliance - Force the model into a compliant structure and then anneal the quality

I think both options can make sense in different cases.

One case I'm thinking about where the second option feels simpler is when you want to implement a boolean function using a language model. I'm imagining a probability distribution that looks like:

- (40%) the answer is true - (39%) True - (21%) False

In this case it seems significantly more straightforward to force the model into completing T or F. I guess you then run into the "dangerous case" where you have

- (40%) the answer is false - (39%) True - (21%) False