|
I would not put so much stock in the first mover effect. I can't bring it to mind immediately, but there was an excellent podcast that brought up how the second movers oftentimes do better in the end, at least as a company. Case in point -- once upon a time, there was a big race called Dawnbench for training CIFAR10 to 94% accuracy in the shortest amount of time a little while ago by Stanford University. During that time, there was a lot of cool movement, and there were a few notable people who really moved the bar (Chen Wang is underrecognized for their contributions, while David Page is relatively well known for his, which indeed truly are excellent). I remember reading Page's notes on it and thinking that I could never come up with the caliber of ideas that he brought to the training table for these networks, and plus, 24 seconds on a V100?!?! Crazy. That was years ago that I saw it. I didn't touch it at all -- not anyone really did, transformers were sorta the big thing now, and still are. And the one or two times I did try to do anything with it...anything I tried made it worse, and I really struggled with his code (it's very functionally-written stylistically, very cool but didn't jive with my rapid experimentation style). In any case, I thought maybe I could do better though if I really and truly took a cool crack at it. And even if I didn't, I sorta needed a good living resume to prove that I could make a good software project. So I reimplemented it in a more hackable (to me, at least) kind of way ala karpathy's nanoGPT (and was almost way too meticulous with writing, organizing, and documenting my code), reorganized and streamlined a few things, and moved it to a more-accessible-to-me GPU, an A100. ~18.1 seconds or so (17.2 with some other open-source code). So that was the line. Since then, every single time it feels like I've found all that I can find, there's something else (eventually, at least) waiting behind that wall for me. 18.1 seconds turned to 12.7, which I thought was about as far as I could go. Then 12.7 turned to 12.3, which turned to ~9.91 seconds. Then ~9.91 seconds became, incredibly, ~7.7 seconds or so. Earlier this week I released an update that brought it to roughly ~6.97-6.99 seconds or so. That is unreal, to me. At first, I was numb to how much things could improve, now I'm sorta in denial. The throughput is totally insane, roughly 88,389 training images through the GPU _every second_. This also means that our step time is roughly ~11.35 microseconds per batch, which is...blistering, to stay the least. Hard really for me to wrap my own head around it. I'd say from the experience that I've had, I've felt similar feelings to what you've talked about here, especially if someone already with a lot of followers from a more hype point of view does something like glue huggingface code together, make a fancy GIF that's well stylized, and gets a ton of adoration from it. But that said, the market for quality software is small, and the market for hype is large. Not that the above project doesn't have hype, but it's meant to be more valuable as a researcher's workbench than a toy. It did thankfully get a huge boost early on because Karpathy tweeted it out, but even the last release, for example, maybe got 10 likes on Twitter, and an additional 10-20 (or 30) stars on Github from the sum total interactions (including a Reddit post), even if that. But! The good thing in some senses is that the people that I get to talk to if I'm proactive, that like this software, are often people who are known or are skilled in their field of work. And I honestly don't have too many warm fuzzies about that from lived experience as that is new to me. But I can say that I appreciate the opportunity. Everytime I've thought about going down the hype/vaporware road just to get eyes on the project(s) I do, I have to ask myself -- "Do I want these eyes on the project? Do I want this kind of attention from this kind of person to make up most of my interactions and what I am building?" Sure, if you have to feed a family, that sort of make sense. And we have to feed ourselves and our emotional needs too. But maybe we can be okay with being content with the smaller audience, as it is. At least, that's what I'm working towards, though I do fear that I'll stumble and give in to the allure of chasing the hype every now and again. And if I do, I'm sure that particular extreme emptiness (of a sort) will help pull me back towards just working on being content with the little things I have. I want to close with a video that was made almost exclusively for you, and would like to ask you to watch it in its entirety if you have the time. It talks about content creation (which is what we do, in a sense), but is taught in a way that is very general and I think is the best take I've ever heard on this topic in a condensed/beginner-friendly way, that I can remember at least. It should not only help alleviate some of your concerns or negative feelings from the shipping arms-race, it'll give you clarity on good next-step solutions that will help hopefully contextualize and give a good 'path forward' to making software that people like. I really cannot recommend this video enough, the wisdom is simple, practical, distilled, and hard-won (and has certainly helped me, I am glad I got to learn this earlier rather than later): https://youtu.be/lNzWsp5UUPA Happy to discuss or offer any thoughts on any questions. I do recommend the video first, I often enjoy talking about that kind of particular topic. |