|
|
|
|
|
by pants2
195 days ago
|
|
Yeah - things are easy when you can objectively score an output, otherwise as you said you'll probably need another LLM to score it. For summaries you can try to make that somewhat more objective, like length and "8/10 key points are covered in this summary." This is a real training method (like Group Relative Policy Optimization), so it's a legitimate approach. |
|