|
|
|
|
|
by dotancohen
196 days ago
|
|
Thank you! I'll see about building a test suite. Do you compare models' output subjectively, manually? Or do you have some objective measures? My use case would be to test diagnostic information summaries - the output is free text, not structured. The only way I can think to automate that would be with another LLM. Advice welcome! |
|
This is a real training method (like Group Relative Policy Optimization), so it's a legitimate approach.