| Nice work with the write up and thank you for sharing this. The post is interesting, but I think the problem with the YCRank approach currently is that the labelling appears to be subjective opinion, at least if I understand correctly. Based on the post, you've trained the classifier by labelling a couple of examples of company descriptions you liked better than each other, based on subjective assessments like "harder to execute" or revenue growth that aren't part of the data you're running the classifier against. If so, you've done a nice job of training a classifier to predict which companies you personally are more likely to be interested in. To improve this, you could use past YC batch company descriptions and success data to have more useful examples and labels for training the classifier based on past data, and which isn't so subjective. That might produce some interesting predictions that are more generalizable (although I think you may need more data points than the description and basic metadata). If I've misunderstood, it would be interesting to know a little more detail about how the data was labelled. I've based this on the following: "To investigate this, I made a neural network, YCRank, trained it on a handful of hand-labeled pairwise comparisons, and then used the learned comparator to sort the companies in the most recent W’22 batch." And then: "I biased my ranking towards what was “harder to execute” on" and "I also tended to rank favorably companies that were already making monthly recurring revenue with double-digit growth rates". Those may or may not be good criteria. Based on that, this is essentially what you could call a "DudeRank Classifier" because as The Dude in the Big Lebowski says, "Yeah, well, that's just like, your opinion, man" :) As I suggested above, it might be more interesting to label the example pairs and train the classifier based on the original company descriptions of known past successful and unsuccessful YC companies. Possibly there is some signal in the company descriptions and limited metadata from Demo Day alone sufficient to predict successful companies from a batch. Good luck! Disclaimer: I am in the W22 batch. Our startup (Andi) ranks pretty well here. And this also is just, like, my opinion :) [Edit: You could also test the classifier against historical batches to improve it then also!] |
Yes, but isn't human VC investing already just a big DudeRank classifier?