Hacker News new | ask | show | jobs
by sprizzle 4660 days ago
It's silly to use BeautifulSoup to parse the page when you could use a simple RegEx:

<td class=\"title\"><a href=\"(.?)\"(.?)>(.?)</a>(.?)</td>

5 comments

"HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain"

http://stackoverflow.com/questions/1732348/regex-match-open-...

I am willing to sacrifice my soul and everything that is holy.
Regex to parse HTMl is probably the single worst thing you can do.
Crafting a wide purpose regex to parse whatever HTML comes in is bad.

Building a regex to extract relevant data from simple, fixed-form page data, bypassing tags irrelevant to the problem at hand is not.

...until the HTML changes.

I haven't look at their parsing code, so I have no idea if it is any better than using a regex, but if the regex assumes too much, simply reordering the attributes in a tag (or something similar) could break a regex-based solution.

Some people, when confronted with a problem... bah you know the rest.
Arg, there should be asterisks after every period.
BeautifulSoup is great, as long as you're using open source HTML5 parser from Google. https://github.com/google/gumbo-parser