Hacker News new | ask | show | jobs
by ersiees 2171 days ago
I would really like a thorough analysis on how expensive it is to multiply large matrices, which is the most expensive part of a transformer training for example according to the profiler. Is there some Moore’s law or similar trend?