|
|
|
|
|
by oofbey
464 days ago
|
|
This is really pretty cool. LLM's are so bad at images, it just makes sense to use reasoning to improve them. I'd love to see this applied to a bigger model than 3B, because this task is not difficult. But the attention visualization really demonstrates that it's doing what it's supposed to. |
|
https://huggingface.co/spaces/Groundlight/grpo-vlm-decoder