|
|
|
|
|
by anon291
598 days ago
|
|
It's not really. Anyone who's ever done any low-level assembly coding on modern chips knows that it is already a herculean engineering effort. The idea that your customers, who are experts in machine learning models (like transformers, activation functions, etc) are going to feel comfortable with memory hierarchies, synchronization, floating point precision, etc is just crazy. |
|
AMD did approximately nothing with ROCm.
Investing $10-20m of developer time into making ROCm work reliably easily would have paid for itself 100x.