|
|
|
|
|
by ACCount37
297 days ago
|
|
Should be safe to do, as long as none of that is load bearing. If it's the usual naive "massage the image into a hundred tokens and throw that into the context" vision implementation, nothing bad would happen from removing or just freezing them. I've seen "cut off unused vision inputs" done for older multimodals, just not the newer Gemma 3. |
|