Hacker News new | ask | show | jobs
by rrebelo 2728 days ago
Regular expressions do cover most of the basic cases.

It will not handle some of them. But I discovered that partisan politics follow a Pareto rule, of sorts: 80% of the talk is around a small set of words. If you remove the adequate 80%, what remains is very ineffective, grotesque and pathetic communication. It is not enough to get people excited or willing to fight.

The tricky parts is to keep changing the set of words and regular expressions. Particularly on the months before an election the terms to filter go through intense change. After that they remain very stable.

Edit: I am trying now to use the Levenshtein distance[1] algorithm to preemptively detect the tricks you describe, of people deliberately changing some word in order to fool the regular expressions.

[1] https://en.wikipedia.org/wiki/Levenshtein_distance