|
|
|
|
|
by vikram
6645 days ago
|
|
I'm working on something similar. Turns out scrapping a small part of the problem. I don't use beautifulsoup. Turns out you can transform html of a page into a list, which can easily be scrapped. Now that I have used it to extract data out of many different types of pages. I'm looking to turn it into a dsl. So that the code looks natural. Currently it's just functions which search for tags in html. You can then easily filter some or others. here is an example (extract-all page [(and (tagp _ :a) (classp _ "jdtd4"))]) |
|