Hacker News new | ask | show | jobs
by cr4zy 1750 days ago
Trillion parameter networks are mentioned a few times, but Tesla is deploying much smaller networks than that (like tens of millions IMU). Trillion param networks are mostly transformers like GPT-3 (actually 175B) etc... that are particularly heavy vs Conv as they have no weight sharing. Tesla is definitely starting to use transformers though, e.g. for camera fusion and evidenced by their focus on matrix multiply in dojo asic's vs the conv asics they have in the on-vehicle chips.
1 comments

Yup, there's plenty of ML architectures that try to save on parameters size, achieving better generalization (less overfitting) at the expense of slightly costlier training and inference. The memory constraints on Tesla Dojo might not be a big deal after all.