Hacker News new | ask | show | jobs
by tysam_and 904 days ago
Minor potential performance benefit -- it looks like you might be able to fuse the x_proj and dt_proj weights here as x_proj has no bias. This is a thing that's possibly doable simply at runtime if there's any weight-fiddling reqs, I'm guessing the single kernel + bias will still run faster in the end (not sure though! <3 :')))) )