| I'm the founder of parsehub. We are doing well and are independently owned. I think there are 3 things that contribute to this: 1. It is very easy to make a prototype that looks "magical" but very hard to build something that works in real applications. There are an enormous amount of quirks that a browser allows, and each site you encounter will use a different set of those quirks. Sites also tend to be unreliable, so whatever you build has to be very resistant to errors. 2. There is a technological wall that every company in this space reaches where it is not yet possible to mass-specialize for different websites. So even if you're able to build a tool that works very well on any individual website, the technology is not there yet to be able to generalize the instructions across websites in the same category. So if a customer wants to scrape 1000 websites, they still have to build custom instructions for each website (5-10x reduction in labor vs scripting) when what they really want/is economically viable for them is to build a single set of instructions that will work for all similar websites (10000x reduction in labor vs scripting). This is something that we're working on for the next version of parsehub, but is still a couple years away from launch. 3. Many of the YC startups you hear about have raised funding from investors and have short term pressures to exit. The combination of the three makes it very tempting to give up and sell. |
Just curious, in your experimentation, have you found it necessary to train a new model for each "category"? Or have you found a way to generalize it?