Hacker News new | ask | show | jobs
by JoeCoder_ 3916 days ago
Here is the TL;DR. This regex matches Tarzan but not "Tarzan":

    "Tarzan"|(Tarzan)
You can also include more than one case of what you don't want to match. This one also finds only the cases of Tarzan that don't match the first three patterns:

    Tarzania|--Tarzan--|"Tarzan"|(Tarzan)
You can even use more complex regexes. This matches all words not in an image tag:

    <img[^>]+>|(\w+)
And likewise this matches anything not surrounded by <b> tags:

    <b>[^<]*</b>|([\w\s]+)
2 comments

That's not exactly it, the regexp does match both "Tarzan" and Tarzan, but the capture group 1 will only be set for strings that contain Tarzan without quotes. So by examining that group after matching you know in which case you are. (that also means you can only use this "trick" when you can examine capture groups, i.e. not generally in editors).

    <img[^>]+>|(\w+)
...has some bugs. The part on the left mistakenly matches `<imgasvaasdf>` and mistakenly misses `<img>`. Better would be:

    <img\b.*?>|(\w+)