| I think I'm pretty careful to say that this is a simplified version of Twitter. Of the features you list: - spam detection: I agree this is a reasonably core feature and a good point. I think you could fit something here but you'd have to architect your entire spam detection approach around being able to fit, which is a pretty tricky constraint and probably would make it perform worse than a less constrained solution. Similar to ML timelines. - ad relevance: Not a core feature if your costs are low enough. But see the ML estimates for how much throughput A100s have at dot producting ML embeddings. - web previews: I'd do this by making it the client's responsibility. You'd lose trustworthiness though so users with hacked clients could make troll web previews, they can already do that for a site they control, but not a general site. - blocks/mutes: Not a concern for the main timeline other than when using ML, when looking at replies will need to fetch blocks/mutes and filter. Whether this costs too much depends on how frequently people look at replies. I'm fully aware that real Twitter has bajillions of features that I don't investigate, and you couldn't fit all of them on one machine. Many of them make up such a small fraction of load that you could still fit them. Others do indeed pose challenges, but ones similar to features I'd already discussed. |
Actually a good example of how difficult the problem is. A very common attack is to switch a bit.ly link or something like that to a malicious destination. You would also DoS the hosts... as the Mastodon folks are discovering (https://www.jwz.org/blog/2022/11/mastodon-stampede/)
For blocks/mutes, you have to account for retweets and quotes, it's just not a fun problem.
Shipping the product is much more difficult that what's in your post. It's not realistic at all, but it is fun to think about.