|
|
|
|
|
by samsartor
383 days ago
|
|
I'm doing my PhD in ML shit. Before that I was a systems programming guy, lots of C++, bit of CUDA, big fan of Rust. On the side I'm obsessed with RISC-V. Own a couple of boards. I made a stupid little cuda-like-compiler on top of the RISC-V vector extensions, just for fun. What I'm saying is, tensorrent couldn't find a more excitable third-party developer if they grew one in a lab. And you know what? I can't make heads or tails out of all their various abstractions. I've tried! I've read the docs, I've read the examples, I've gone to meetups. I think OP is right that "one more abstraction bro" probably doesn't solve the problem. At a guess, the problem isn't a technical one, it is an organizational one. They don't have anybody to stand in for me, or devs like me (eg dumb people). There is no product leadership on the API design. Just a lot of really brilliant engineers obsessively tuning for their own usecases, unwilling to ever trade-off a hit in performance or expressivity for readability or writeability. |
|
I don't think anyone is seriously training an NN on TT hardware at the moment and I think that's an issue. I think tinygrad works not only because geohot is one hell of an engineer but also because comma dogfoods it. TT's engineers are absolutely brilliant (from reading their commits) but I think they are stretched too thin. Bounties are not gonna work - you can't expect an outsider with no internal access/bandwidth/knowledge to suddenly make e.g. Mixtral work as the issue spans at least across tt-xla/tt-mlir. And to agree with ^ training is a kind of artifact where good CX can only be derived from strong leadership and a leaner view of the stack. NVIDIA accumulated that over the decades and the rest are trying to catch up by aggressive hiring (not to say that hiring is necessary). e.g. Annapurna has a presence on the CMU campus when I was there and has the Anthropic team to test it out.
I'm an incredibly excited third-party developer as I think the pitch appeals a lot to grad students (who do model research) who need to run small experiments within the 13B range and reasonably scale them up to draw the first half of the scaling curve.
I lose too much productivity to abstractions and incomplete e2e support in TT's current shape. I'd love to give it another go in 6 months.