Hacker News new | ask | show | jobs
by ignacioelola 4425 days ago
You've written scrapers for all this http://neuvc.com/labs/vcdelta/vcs.html VC pages? I can almost feel the pain...
2 comments

VCdelta author here. I'm too lazy to do it by hand.

It's a generic scraper in Python with a "DSL" for the type of scraping it does. The DSL is, at its base, css selectors with certain assumptions about how the sites are structured (and then a way to make exceptions.) I can code a site's scraper in about 3 or 4 minutes using this. There are some sites that are impossible to scrape this way because they don't have the information in a machine-readable form (i.e. just images), or they block bots, or they have no data on portfolio companies at all.

And a couple of scrapers break every week, as you'd imagine, and get fixed a couple of weeks later, so the data is not complete by a long shot. Also, since new VC funds seem to start up every week, it doesn't follow even a substantial subset of all the funds. Wish I had more time to spend on it.

It says he "looks" at them, so he probably does it manually.
That text is written from the point of view of 'neubot', so its not 'man'-ual.