| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sprizzle 4660 days ago
	It's silly to use BeautifulSoup to parse the page when you could use a simple RegEx: <td class=\"title\"><a href=\"(.?)\"(.?)>(.?)</a>(.?)</td>

5 comments

kaeawc 4660 days ago

"HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain"

http://stackoverflow.com/questions/1732348/regex-match-open-...

link

michaelmcmillan 4660 days ago

I am willing to sacrifice my soul and everything that is holy.

link

karangoeluw 4660 days ago

Regex to parse HTMl is probably the single worst thing you can do.

link

lloeki 4660 days ago

Crafting a wide purpose regex to parse whatever HTML comes in is bad.

Building a regex to extract relevant data from simple, fixed-form page data, bypassing tags irrelevant to the problem at hand is not.

link

untothebreach 4659 days ago

...until the HTML changes.

I haven't look at their parsing code, so I have no idea if it is any better than using a regex, but if the regex assumes too much, simply reordering the attributes in a tag (or something similar) could break a regex-based solution.

link

joshbaptiste 4660 days ago

Some people, when confronted with a problem... bah you know the rest.

link

sprizzle 4660 days ago

Arg, there should be asterisks after every period.

link

Goranek 4659 days ago

BeautifulSoup is great, as long as you're using open source HTML5 parser from Google. https://github.com/google/gumbo-parser

link