|
|
|
|
|
by utdiscant
538 days ago
|
|
"We picked the latter, which also gave us our performance metric - percentage of generated comments that the author actually addresses." This metric would go up if you leave almost no comments. Would it not be better to find a metric that rewards you for generating many comments which are addressed, not just having a high relevance? You even mention this challenge yourselves: "Sadly, even with all kinds of prompting tricks, we simply could not get the LLM to produce fewer nits without also producing fewer critical comments." If that was happening, that doesn't sound like it would be reflected in your performance metric. |
|