Hacker News new | ask | show | jobs
by apodolny 996 days ago
We've found that rolling our own spam model can be very effective (especially at improving precision). Many sites have their own quirks around what material counts as spam which leads to false positives or negatives wherever your site differs from the norm. No need to go to GPT4 though. We've found even low compute algorithms like random forest perform quite well at the task. You do have to create your own training set, but even a few hundreds or thousands of manually sorted examples can work pretty well.