| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ganeumann 4425 days ago

VCdelta author here. I'm too lazy to do it by hand.

It's a generic scraper in Python with a "DSL" for the type of scraping it does. The DSL is, at its base, css selectors with certain assumptions about how the sites are structured (and then a way to make exceptions.) I can code a site's scraper in about 3 or 4 minutes using this. There are some sites that are impossible to scrape this way because they don't have the information in a machine-readable form (i.e. just images), or they block bots, or they have no data on portfolio companies at all.

And a couple of scrapers break every week, as you'd imagine, and get fixed a couple of weeks later, so the data is not complete by a long shot. Also, since new VC funds seem to start up every week, it doesn't follow even a substantial subset of all the funds. Wish I had more time to spend on it.