|
|
|
|
|
by westurner
1477 days ago
|
|
BeautifulSoup is an API for multiple parsers
https://beautiful-soup-4.readthedocs.io/en/latest/#installin... : BeautifulSoup(markup, "html.parser")
BeautifulSoup(markup, "lxml")
BeautifulSoup(markup, "lxml-xml")
BeautifulSoup(markup, "xml")
BeautifulSoup(markup, "html5lib")
Looks like lxml w/ xpath is still the fastest with Python 3.10.4 from "Pyquery, lxml, BeautifulSoup comparison" https://gist.github.com/MercuryRising/4061368 ; which is fine for parsing (X)HTML(5) that validates<(EDIT: Is xml/html5 a good format for data serialization?
defusedxml ... Simdjson, Apache arrow.js) |
|