| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jabbs 559 days ago
	Thank you so much. In some cases I was able to standardize where the title and location are located on the page (Greenhouse, Lever, etc.). But this mostly uses a valid dataset of job descriptions that matches lists of phrases for plain text within a page (with markup removed). Also the scraper remembers companies and career pages that have job listings. It will prioritize those companies that have listings and visit more often than those that don't. Currently there are 4 worker services that visit about 10k company websites per day (each).