| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by theashbhat 3153 days ago

Hey! One of the creators here!

Your right - twitter could do (and probably are doing) everything we're doing. They have billions of dollars and hundreds of engineers.

The value of building a model is that we can do a wide analysis on bot like activity. Separately launching botcheck.me as something that users can use is incredibly valuable from the ML side. Users essentially hand classify a bunch of false positives for us (to further train on) and also give us an idea of how are model is doing.

We aren't just doing sentiment analysis and you're right - NLP is hard. Fortunately at UC Berkeley we have some amazing CS professors that have been incredibly helpful in advising us while building this.

We're using LSTMs to learn the weights of various words. We've been using high confidence heuristics to generate our training data that aren't based primarily on tweet content.

One such example is looking at compromised accounts that have had their usernames changed.

Here's an analysis: https://medium.com/@robhat/an-analysis-of-propaganda-bots-on...

Methodology: https://medium.com/@robhat/identifying-propaganda-bots-on-tw...

We want to release a portion of training data so that others can build similar services. Let me know if you have any more questions :)

1 comments

paulmd 3152 days ago

I would very much like an API that provides this service for Reddit accounts as well. I suppose since the data is freely available I need to get off my butt and write it myself though...

link