Hacker News new | ask | show | jobs
by lwneal 1283 days ago
Currently, whenever anyone posts a tweet including a URL, Twitterbot accesses that URL and if censored content is found (keywords associated with Mastodon) then the tweet is blocked.

It appears that this behavior is pretty simple and can be defeated by, for example, the following nginx configuration:

        server {
            if ($http_user_agent ~* "Twitterbot") {
                return 200 ElonIsGreat420TSLAToTheMoon;
            }
            return 301 https://mastodon.social$request_uri;
        }
4 comments

If they start spoofing the UA one can also look at $server_protocol and if != HTTP/2.0 then do something different. I don't know what Twitter supports but most search engine and chat platform crawler bots can only speak HTTP/1.1. All the mainstream browsers support 2.0. This is assuming http/2.0 is enabled in the web server.
I doubt it's "as advanced" as doing keyword matches on the page in question, though they might well do some of that as well. I suspect the main reason for hitting the link is to resolve redirects.

I run a plain Mastodon install on https://m.galaxybound.com/ and I can post links to it just fine. If it can't even detect a standard Mastodon install, it's not a very successful search for blocked content.

It seems to me they're stupidly still maintaining a blacklist of the larger instances.

Note that I've "even" tweeted links to a post on my Mastodon instance that contained links to a blocked instance, and Twitter didn't even detect that.

A trick malware distributors use is adding a JavaScript-based time delay to their phishing pages. It's slightly more annoying for scanners to detect than just an UA switch.
Use TinyURL. Shortened links seem to work.