Hacker News new | ask | show | jobs
by existencebox 2818 days ago
The "funny" part is, while reading this, I kept thinking of a similar tool I built myself as a wrapper around python+beautifulsoup. Definitely parts of what the OP needed were compelling to me (I found regular subpatterns in my scraping work that could be encapsulated really well in certain bits of syntactic sugar, but concluded that a minimal json-like structure for defining a scrape was both sufficient and let me have a graphical "scrape builder" in a UX far more readily than if I actually wrote the scrape as code.

There's the usual amount of HN cynicism in the thread which I'm not sure is 200% off mark, but I think there are some good concepts in "Scraping primitives" that can be contemplated that the OP took an interesting angle on. (or rather; not the angle I took, so interesting to me.)

1 comments

Is your beautifulsoup wrapper open source?
If other people would find it beneficial it can be; I had admittedly seen it as "just a mess of sugar to help me scrape easier" (with all the typical nervousness of showing others "imperfect" code). I'm aptly in the process of cleaning it up, writing tests and finishing an MVP UI. I can do a show HN in a few weeks once I've found the time to get it ship-shape.
You definitely need to share. Web scraping is tedious. As more ideas we have, as more options we have to come up with a better solution for that.