Hacker News new | ask | show | jobs
by cjbgkagh 848 days ago
I wasn't as optimistic that there would be a broad adoption of some of the more advanced techniques I was working on so I did figure back in 2013 that most people would stick to the GEMMs and Convs with rather simple loss functions - I had a hard enough time explaining BPR triplet loss to people. Now with LLMs people will be doubling down on GEMMs for the foreseeable future.

My customers won't touch non-commodity hardware as they see it as a potential vector for vendors to screw them over, and they're not wrong about that. In a post apocalyptic they could just pull a graphics card out of a gaming computer to get things working again which gives them a strong feeling of security. Having very capable GPU cards as a commodity means I can re-use the same ops for my training and inference which roughly halves my workload.

My approach to hardware companies is that I'll believe it when I see it, I'll wait until something is publically available that I can buy off the shelf before looking too closely at it's architecture. NVidia with their Tensor Cores got so good so quickly that I never really looked too closely at alternatives. I'm kind of hopeful that AMD SoC would provide a good edge compute option so I might give that a go.

I had a look at tenstorrent given this article and the Grendel architecture seems interesting.