Hacker News new | ask | show | jobs
by Udo 4658 days ago
I think we - and to some degree the Twitter platform itself - are using hash tags redundantly, and this algorithm is just a manifestation of this redundancy that is killing data quality. These pathological tweets do tend to look like the example sentence provided, maybe even more extreme:

  #Swayy #Launches Into Public #Beta To Curate #Content For Your #SocialMedia Audience
Now, all of these words would be reachable with a normal search, so why do we over-tag everything? Are users really going to see what other Tweets have been recently tagged #Content? It makes even less sense with product names like #Swayy.

A more reasonable approach would be to tag things that are not part of the sentence itself:

  We're launching into public beta to curate content for social media! #Swayy
Or inline, on occasion, to express that you're taking part in a meme:

  Dear gods, #IHateIt when it's cold outside
We don't need algorithmic help to find hash tags in these cases either, and I'm arguing that automatically converting every third word into a hash tag doesn't do Twitter feeds any good, quality-wise.
3 comments

#Content means you're talking about content, and it's not just a word you used in passing. Similarly, #launches adds your tweet to the conversation people are having about launches, instead of making people sort through missile launches, some guy who launches into a story, and misspelled lunches to get to relevant tweets.
At some degree, word frequency counting does classify correctly what the story is about, so in my opinion it do adds quality to it. Think of it like regular tagging. However, the threshold for how many hashtags should be produced with this method should be very low, as it offers no contextual or meta information about it.
I totally agree. Look at Stackoverflow's way. I think they scan for the common tag names appear in title and text and suggest tags. OP's presentation really doesn't impress me.