Verification needs to work the other way around, some kind of verifiable chain of trust for photos and videos from real cameras. Watermarking all generated media is impossible.
I don't really understand why this is so hard or why it wasn't just done from the get go.
Just have Apple and Google digitally sign videos and photos recorded from phones and then have Google and Meta, etc display that they are authentic when shown on their platforms.
You're talking about the metadata of the files, which can always be edited and someone will inevitably try to make software to do exactly that. Also, Adobe's proposal for handling generated content is exactly this and they're not able to get buy-in from other companies.
Edit the metadata in what way? It's a cryptographic hash.
If the bits that make up the video as was recorded by the camera don't match the hash anymore, then you know it was modified. That doesn't mean it's fake, it just means use skepticism when viewing. On the other hand the ones that have not been modified and still match can be trusted.
Essentially 0% of professional photography or videography uses "straight out of the camera" (SOOC) JPEGs or video. It's always raw photos or "log" video, then edited to look like what the photographer actually saw. The signal would be so noisy as to be useless.
Sure they could, but then you trim the video by 2 seconds, tweak the colors, or just send it over WhatsApp, which recompresses the file with its own encoder. The hash breaks instantly. Cryptography protects bits, but video is about visual meaning. The slightest pixel modification kills the hardware signature. Plus, it does absolutely nothing to fix the "analog hole" problem - a scammer can just point that cryptographically signed iphone camera at a high-quality deepfake playing on a monitor
It becomes a hard problem quickly when you introduce editing, and most photos and videos on social media are edited. I'm not sure how it would work. It seems more feasible than universal watermarks, though.