|
|
|
|
|
by dchu17
214 days ago
|
|
Thought about this too. I think there are two broad LLM capabilities here that are kind of currently tangled up in this eval: 1. Can an LLM navigate a slide effectively (i.e find all relevant regions of interest)?
2. Given a region of interest, can an LLM make the correct assessment? I need to come up with a better test here in general but yep I'm thinking about this |
|