Hacker News new | ask | show | jobs
by riley_s8 589 days ago
totally agree. It doesn't make any sense to use linear(softmax(linear(x))) to replace linear(x) while claiming to be more explainable and more scalable.