Wow this is pretty amazing... I imagine your proxy bill must be pretty huge! I always wondered how companies like Clearview scraped Instagram etc at scale. Do you add user to a queue, get all of that users posts, then add everyone they follow/everyone that is following them to the queue, and repeat? With Twitter I know from experience you can predict what the next snowflake IDs will be so in theory you could enumerate the whole site. If I recall correctly Tiktok has a similar ID scheme but I think people weren't able to figure out what some of the last bits represented.
Filtering works pretty well, and the design is well executed. For those who complain about scraping, if this is public data, as the OP mentioned, then I don't see how it is different from Google.
A clean search would benefit the creators, and give them more visibility. The first thing I had in mind was searching by keyword and number of followers to see which ones would fit a startup to use influencer marketing. Imagine doing that manually on TikTok or through Google.
Fun fact: I did the UI Design, frontend/backend web development, database, servers, scraping and everything else by myself. It took me few months but i guess its worth it.
That's not about public data. Reference [7] talked about creating fake accounts to scrape.
> hiQ had prevailed on the Computer Fraud and Abuse Act (CFAA) “unauthorized access” issue related to public website data but was facing a ruling that it had breached LinkedIn’s User Agreement due to its scraping and creation of fake accounts (subject to its equitable defenses).
Yes, I did receive some backlash, which is understandable. However, this is a very useful tool for content creators to find out what's trending on TikTok. It is also used by brands to analyze market sentiment about them.
I can't share the specifics, but i can explain the methodology in private. My contacts are on the website.