Hacker News new | ask | show | jobs
by frenchmajesty 234 days ago
OP here. I agree! I should've called out why I did _not_ follow that approach as many others have commented the same.

The main reason why is that I needed the classification to be ongoing. My system pulled over thousands of tweets per day and they all needed to be classified as they came for some downstream tasks.

Thus, I couldn't embed all tweets, then cluster, then ...

2 comments

Do the labels need to be static once the system has started? If not would be interesting to relabel embedding clusters once each hits a certain critical mass of tweets, or do so somewhat continuously.
Makes sense, I appreciate the comment. Well written article. Subscribed.