|
|
|
|
|
by ersiees
2171 days ago
|
|
I would really like a thorough analysis on how expensive it is to multiply large matrices, which is the most expensive part of a transformer training for example according to the profiler. Is there some Moore’s law or similar trend? |
|