| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by liaopeiyuan 432 days ago

This is my sentiment too after trying to get a Blackhole to run a recent VLM (like Pixtral) over the weekend. Not just unit tests, but actual training loops. I write a lot of JAX in my day job to train large models but I used to do a bit of ML compiler development, which I guess also puts me in the dumb people crowd. I'm equally impressed by how smooth the lower-level setup is and frustrated by how little progress I was able to make towards the seemingly last mile of "just rewrite the code a little bit more bro I just need to get rid of this one hlo op because it's not supported."

I don't think anyone is seriously training an NN on TT hardware at the moment and I think that's an issue. I think tinygrad works not only because geohot is one hell of an engineer but also because comma dogfoods it. TT's engineers are absolutely brilliant (from reading their commits) but I think they are stretched too thin. Bounties are not gonna work - you can't expect an outsider with no internal access/bandwidth/knowledge to suddenly make e.g. Mixtral work as the issue spans at least across tt-xla/tt-mlir. And to agree with ^ training is a kind of artifact where good CX can only be derived from strong leadership and a leaner view of the stack. NVIDIA accumulated that over the decades and the rest are trying to catch up by aggressive hiring (not to say that hiring is necessary). e.g. Annapurna has a presence on the CMU campus when I was there and has the Anthropic team to test it out.

I'm an incredibly excited third-party developer as I think the pitch appeals a lot to grad students (who do model research) who need to run small experiments within the 13B range and reasonably scale them up to draw the first half of the scaling curve.

I lose too much productivity to abstractions and incomplete e2e support in TT's current shape. I'd love to give it another go in 6 months.