Hacker News new | ask | show | jobs
by pilooch 326 days ago
True but modern models such as gemma3 pan& scan and other tricks such as training from multiple resolutions do alleviate these issues.

An interesting property of the gemma3 family is that increasing the input image siwmze actually does not increase processing memory requirements, because a second stage encoder actually compresses it into fixed size tokens. Very neat in practice.