Hacker News new | ask | show | jobs
by vikram 6645 days ago
I'm working on something similar. Turns out scrapping a small part of the problem. I don't use beautifulsoup. Turns out you can transform html of a page into a list, which can easily be scrapped.

Now that I have used it to extract data out of many different types of pages. I'm looking to turn it into a dsl. So that the code looks natural. Currently it's just functions which search for tags in html. You can then easily filter some or others. here is an example

(extract-all page [(and (tagp _ :a) (classp _ "jdtd4"))])