Back in the day of email gateways between different networks, there used to be a terrible problems with all the tin-pot dictator IBM SYSADMINs at BITNET sites who maintained their own personal styles of ASCII<=>EBCDIC translation tables, so all the email that passed through their servers got corrupted.
EBCDIC based IBM mainframe SYSADMINs on BITNET were particularly notorious for being pig-headed and inconsiderate about communicating with the rest of the world, and thought they knew better about the characters their users wanted to use, and that the rest of the world should go fuck themselves, and scoffed at all the unruly kids using ASCII and lower case and new fangled punctuation, who were always trying to share line printer pornography and source code listings through their mainframes.
"HARRUMPH!!! IF I AND O ARE GOOD ENOUGH FOR DIGITS ON MY ELECTRIC TYPEWRITER, THEN THEY'RE GOOD ENOUGH FOR EMAIL! NOW GET OFF MY LAWN!!!" (shaking fist in air while yelling at cloud)
It was especially a problems for source code. That was one of the reasons for "trigraphs".
>Trigraphs were proposed for deprecation in C++0x, which was released as C++11. This was opposed by IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were retained in C++0x. Trigraphs were then proposed again for removal (not only deprecation) in C++17. This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM. Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.
I don't think you want your backups in Google Docs either, given that Google may decide to ban you for TOS violations at any time.
I really do think videos would would work, reliably, given sufficient redundancy. Again, we have QR codes already, so this is a proven idea. You can't make QR codes unreadable without removing lots of perceptual visual details. The risk, as with using Google docs, isn't that Google will change their encoding, but that Google will just take down the videos for service misuse.
I think it would be comparatively more difficult for Google to detect this stuff in a video compared to a text document, because you expect some videos to be long and large. The entirety of the Encyclopedia Britannica comes out to less than 500 MB in a .txt document, so using any reasonable amount of space in a Google Doc should quickly raise red flags.
youtube probably doesn't save the originals (though they could in some cold-storage tape drives, perhaps). But even still, it's not difficult to imagine that there may at some point exist a compression algorithm that can be applied to existing compressed video that could change a couple bits around in whatever encoding scheme you've chosen. Depending on the file type, that could be enough to corrupt the whole thing.
Sure you can get around this by adding ECC, but that isn't implemented here.
EBCDIC based IBM mainframe SYSADMINs on BITNET were particularly notorious for being pig-headed and inconsiderate about communicating with the rest of the world, and thought they knew better about the characters their users wanted to use, and that the rest of the world should go fuck themselves, and scoffed at all the unruly kids using ASCII and lower case and new fangled punctuation, who were always trying to share line printer pornography and source code listings through their mainframes.
"HARRUMPH!!! IF I AND O ARE GOOD ENOUGH FOR DIGITS ON MY ELECTRIC TYPEWRITER, THEN THEY'RE GOOD ENOUGH FOR EMAIL! NOW GET OFF MY LAWN!!!" (shaking fist in air while yelling at cloud)
It was especially a problems for source code. That was one of the reasons for "trigraphs".
https://stackoverflow.com/questions/1234582/purpose-of-trigr...
https://en.wikipedia.org/wiki/Digraphs_and_trigraphs
>Trigraphs were proposed for deprecation in C++0x, which was released as C++11. This was opposed by IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were retained in C++0x. Trigraphs were then proposed again for removal (not only deprecation) in C++17. This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM. Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.