Safari has its own, built-in implementation of that called Live Text since last year. Just try highlighting any text in the image and voila - it works.
Very interesting that Firefox would have this level of integration with the OS. Firefox of old was criticized specifically for being completelly not native to macOS. Times change. I might give Firefox a shot in the recent future.
https://support.apple.com/en-au/guide/preview/prvw625a5b2c/m...