Hacker News new | ask | show | jobs
by joennlae 946 days ago
Author here:

Thank you for the feedback :-) A lot of the work regarding the comparison with „simple“ approximate matrix multiplication has been done in the preceding paper: https://arxiv.org/abs/2106.10860

While I share your enthusiasm regarding the potential, we have to be careful about the limiting factors. Our main contributions on the algorithmic side are the reformulation of Maddness such that it is differentiable (autogradable), and we can use it in e2e DNN training, as decision trees are not differentiable.

We are still in the process of understanding how to optimise the training. In the next step, we want to look into transformers as, for now, we only looked into ResNets for easy comparability.

If you are a student at ETH Zurich and want to work on this -> reach out to me

2 comments

Thanks for pointing that out :) When I first read the paper, I thought that 4. DIFFERENTIABLE MADDNESS was still part of the 3. BACKGROUND section.

Also, I have to admit that I don't quite understand that section, even after trying a 2nd time. The text implies that Sc would be 15x4 and Hc would be 16x15 but in the illustration it looks like 3x2 and 4x3. I guess I'll have to read Zhang [37] first because like this, I'm not sure what the selection matrix and description matrix do here. That said, (8) and following is easy to understand again. You use the softmax to create an approximately correct gradient but use the hard maximum for calculation the forward pass values.

As you are the author: why the name Stella Nera / Black Star?
Not the author but

https://www.youtube.com/watch?v=N8JCMJQ1jyw&list=OLAK5uy_lYv...

was a Platin hit in Switzerland, where the ETH Zürich is located.