|
|
|
|
|
by killthebuddha
1139 days ago
|
|
That sounds like a real risk but also the kind of thing you would need to implement a solution for anyways. It seems like there's two clear paths: - Allow the model to complete whatever it wants and then anneal the structure into compliance
- Force the model into a compliant structure and then anneal the quality I think both options can make sense in different cases. One case I'm thinking about where the second option feels simpler is when you want to implement a boolean function using a language model. I'm imagining a probability distribution that looks like: - (40%) the answer is true
- (39%) True
- (21%) False In this case it seems significantly more straightforward to force the model into completing T or F. I guess you then run into the "dangerous case" where you have - (40%) the answer is false
- (39%) True
- (21%) False |
|