Hacker News new | ask | show | jobs
by texaslonghorn5 1438 days ago
I typed in

> Email

> English → RegEx

> 95

> GO

> \w+@\w+\.\w+

That's an interesting email regex.

2 comments

That's not that wrong though.

What does it do for queries like "all male English names" or "comfortable temperature range"?

For "comfortable temperature range", it generates:

  (\d+\.?\d*)\s?-\s?(\d+\.?\d*)
And for "all male English names", it generates:

  [A-Z][a-z]+
The first one might be good, but the latter seems rather unsophisticated.
All male English names is just a list. It doesn’t follow a grammar. Its regex would be akin to

    (John|Jane|Alice|Bob)
and so on. It’s not a case for regex. In fact, I’ve found success in replacing regex with regular string operators (length, contains substring, doesn’t contain substring, starts with a capital letter, …) of the language at hand, then do final regex passes for whatever is left at the end. It’s infinitely more readable and debuggable. I’ve grown to avoid regex when possible.
I typed "an email address" and it came up with

    [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
which is similar, but has some interesting differences. That shows it's black-box association. Then I tried "my email address" and this came out, line break included

   is john.doe@example.com
   My email address is \w+\.\w+@\w+\.\w+
"An identifier" vs "a Javascript identifier" does work as expected, but "a number" and "a floating point number" don't. "A quoted string" doesn't escape the quotes inside, but if you add "with escaped quotes" it does.

So, it's cute, and might set you on the right track, as long as you study the output a bit.