Hacker News new | ask | show | jobs
by boloust 1364 days ago
The course basically goes through the exact steps you described. The main "GPU optimisation" is to basically say "okay, now that we know how to implement matrix multiplication, let's use the optimised pytorch implementation instead".