|
|
|
|
|
by mnmkng
2059 days ago
|
|
Actually, if you're scraping at any scale above a hobby project, most of your web scraping hours would now be spent on avoiding bot detection, reverse engineering APIs and trying to make HTTP requests work where it seems only a browser can help. The time spent "working with strings" is not even noticeable to me. I scrape for a living and I work with JS, because currently, it has the better tools. |
|
I'm currently working to turn my hobby scraper into something profitable. "Working with strings" is already the least of my concern. I've spent most of the time with finding an architecture / file structure that allows me to
- easily handle markup changes on source-pages and
- quickly integrate new sources
I've feared it would be impossible to handle unexpected structural changes from a multitude of sources. Turns out that rarely happens. Like, once every x years per source's page-type.