Hacker News new | ask | show | jobs
by sdenton4 1935 days ago
I've recently been doing some work on a model that mixes a big convolutional model with a second piece that we can actually analyze with the signal processing toolkit. What I see in the signal-processing side was a mix of really interesting good behavior (including one cool trick I hadn't seen before), right along side a bunch of obviously-dumb pathological behavior. (Basically, glaring discontinuities which induce the model to introduce /other/ discontinuities to course-correct... with really bad impacts on output quality.) I found some nice tweaks that get rid of /most/ of the pathological behavior in the sig-proc side, and produce some really nice results.

But for the main black-box network? No such visibility. It is undoubtedly doing all kinds of stupid shit that I have no insight into. And I would /love/ to be able to sort it out and get rid of the dumb parts, because I a) want tiny models that run hella fast on slow-ass phones, and b) would love to eliminate sources of spurious correlations to provide better predictions. Getting rid of the dumb parts means more compute to do the smart things.

The high-level theory answers maaaybe tell me about convergence but not whether the NN converges to anything reasonable. Giant models provide a huge amount of flexibility, which help the model find answers, but the answers may not be any good, and we really don't have robust ways to know. (eg, aggregates over big eval datasets may tell me something could be better, but not what or why.)