I'm not sure where it gets the known good hash from without making an HTTP call to the originating URL, though. The trustworthiness of the file hash is the same as the file itself if they have the same origin.
I would also consider to what degree such a system ends up looking like a half-baked, distributed Cloudflare anyways. Like yes, I'm sure we could build some kind of incredibly complicated, reputation-based, distributed link preview caching system. Or the host could just fix their Cloudflare (or accept dying under traffic load).
My generalized experience has been that untrusted, secure, distributed systems are incredibly difficult to build, and that it's probably not worth doing for something as trivial as URL previews. Just let the request die and swap the preview for a message like "This site was down as of $lastTimeItCheckedForAPreview". Maybe change the message so it shows the URL but doesn't make it an <a> element so people can't just click on it to discourage sending them further traffic.
Or worst case, they could try a fallback to reliable, centralized sources for that info. See if Google Search has a cached copy, or archive.org, or whatever else. It's not decentralized, but I also think it should be fine to use non-decentralized features as a fallback for optional features. I've got more confidence that archive.org hasn't tinkered with their version than some random Mastodon instance.
I would also consider to what degree such a system ends up looking like a half-baked, distributed Cloudflare anyways. Like yes, I'm sure we could build some kind of incredibly complicated, reputation-based, distributed link preview caching system. Or the host could just fix their Cloudflare (or accept dying under traffic load).
My generalized experience has been that untrusted, secure, distributed systems are incredibly difficult to build, and that it's probably not worth doing for something as trivial as URL previews. Just let the request die and swap the preview for a message like "This site was down as of $lastTimeItCheckedForAPreview". Maybe change the message so it shows the URL but doesn't make it an <a> element so people can't just click on it to discourage sending them further traffic.
Or worst case, they could try a fallback to reliable, centralized sources for that info. See if Google Search has a cached copy, or archive.org, or whatever else. It's not decentralized, but I also think it should be fine to use non-decentralized features as a fallback for optional features. I've got more confidence that archive.org hasn't tinkered with their version than some random Mastodon instance.