|
|
|
|
|
by philomath868
298 days ago
|
|
I hear you loud and clear... Thanks! What about deleting vision layers (e.g. the "multi_modal_projector" and the "vision_tower.vision_model" layers, assuming I go with Gemma 3), since I need just language generation? Would that also be considered a "kick in the balls", or a useful trimming? |
|
I've seen "cut off unused vision inputs" done for older multimodals, just not the newer Gemma 3.