|
|
|
|
|
by ryeguy_24
329 days ago
|
|
Isn’t there a whole bunch of dependency here related to prompting and methodology that would significantly impact overall performance? My gut instinct is that there are many many ways to architect this around the LLMs and each might yield different levels of accuracy. What do others think? Edit: In reading more, I guess this is meant to be a dumb benchmark to monitor through time. Maybe that’s the aim here instead of viability as an auto close tool. |
|