|
|
|
|
|
by benhoff
460 days ago
|
|
I used this recently to download websites, stuffed them into a sqlite db, processed them with Mozllia's readability library, and then used the result and an llm to ask questions of the webpage itself. It was helpful to take each step in chunks, as I didn't have a complete processing pipeline when I started. I had wondered if there was an easier or better way to do this, as I probably would have liked to get the sitemap, pass the sitemap to an llm, then only download selected html pages vs the entire website. |
|