Hacker News new | ask | show | jobs
by spoiler 4082 days ago
I don't find this very useful. It's too naïve for a real-world usecase.

I didn't look at the implementation, but the "classy party" looks like it simply matches for a sequence of 'a', 's', and 's' bytes in a string.

It would be better it it tokenized the sentence using punctuation and white-space as terminators. So, it would detect `big-ass sandwich` and `smart-ass person` but not `classy party` or `bass instrument`.

Furthermore, it would be cool if you created a configuration format for this kind of thing, so one could do something like this (excuse the config format, I realise it's probably shit and problematic):

    [smart][big][fat]ass
    !sex[ual]+education
which would detect all of the following: smartass, bigass, fatass, and ass itself. The second rule would not filter `sex(?:ual)` token followed by an `education` token. You get the idea

These are just some heat-of-the-moment ideas, because I think this is exciting and could be useful. :-)

1 comments

Thanks. This quick idea worked for my cases, because there were few potential false positives. But your idea around using a regex style matcher should be good.