Hacker News new | ask | show | jobs
by fabiandesimone 3931 days ago
Thank you. Well, I'm doing several things:

1) I check whether or not the page we just scrapped has any of the tags we are looking for.

2) We then extract any information within those tags (images, etc.)

3) We follow trough every link and if it's not in the seen/scrapped list, we add them to the queue.

Not sure if this helps to narrow it down.

Thanks!