Hacker News new | ask | show | jobs
by Giroflex 3040 days ago
I actually had the same experience. I was scraping a large number of pages and upon profiling my script, I found out that bs4 was really slow. Changing the parser from the default to lxml helped things a bit, but I decided I would just try a regex to check quickly whether things could be better. Lo and behold, it was much faster. It's true that it's impossible to parse HTML in its entirety with regex, but if you're looking to extract only a portion of data from a page with a known structure, a bit of regex might be the way to go.