Hacker News new | ask | show | jobs
by 37ef_ced3 2013 days ago
Google those _mm512_... intrinsics (they are part of GCC) to see what they mean. The code you pasted is converting single-precision floats to half-precision floats, and storing the half-precision floats to memory, 32 at a time. That's filter packing, which happens during initialization (and never during inference)

I agree, if you don't know anything about how convolution is implemented (filter packing, data packing, matrix multiplication, sum unpacking), you could be lost. But it's very shallow compared to a JIT or CUDA library scheme, and a knowledgeable ML performance engineer would have no difficulty

The inference function (at the end of the C file) is a series of blocks, each block corresponding to a convolution or other complex operation. It's straightforward to see which, by looking at where the weights come from (a field in a struct that has the same name as the layer in your graph)

If you use perf top (for example) you can see which convolution was most expensive, and why. Does the shape of the tensor produce many small partial blocks around the edge, so the packing is inefficient (a lot of tile overhang), for example? You can see that by glancing at the code and seeing that there are many optimized blocks around the edges. As a rule, if NN-512 generates small code for a tensor (few edge cases) you have chosen an efficient tensor shape, with respect to the tile

Or you might find that batch normalization is being done at inference time (as in DenseNet), instead of being integrated into the convolution weights (as in ResNet), because there's fanout from the source and a ReLU in between. You can see that easily in the generated code (the batch norm fmadd instructions will appear in the packing or unpacking code)

Is the matrix multiplication slow because there are too few channels per group (as in ResNeXt)? Easy to see in perf, make your groups bigger. Are you using an inefficient filter shape, so we have to fall back to a slower general purpose convolution? You can easily see whether Winograd or Fourier was used

And so on

1 comments

I’m truly baffled as to why such a sophisticated and useful package is being distributed and advertised by an anonymous individual.
can happen if you're in a toxic workplace that will be more baffled that you have done awesome stuff in your free time.
Probably they’re afraid because it might be related to their day job :/
> Probably they’re afraid because it might be related to their day job :/

A slightly more common scenario is an employer that insists on "we own everything, related to your job or not, that you do even on your own time and equipment" clauses in employee contracts even though such clauses don't happen to be enforceable in the relevant jurisdiction.

Rather than having to "clear through your manager and legal" every little thing to get it added to your contract's personal IP whitelist, publishing anonymously makes perfect sense, where the plan is to de-anonymize after employment ends, at which point (should said now-former-employer have a hissy fit), their own counsel will eventually inform them they don't have a leg to stand on. After sending at least one threatening letter, of course.

Another solution is to spam your manager (and legal) with every trivial 'invention' that pops into your head until they relent[0][1], but that can burn though political capital you may prefer to use for other purposes, and will probably only narrow the scope rather than remove the unenforceable clause.

[0] https://cr.yp.to/patents/tarzian.html (my favorite is invention #12)

[1] As examples I was seriously tempted to use: "Python, but with 1-based indexing", "LinkExchange, but for Wingmen", and "ROT-13 Markdown".

it need not be related to your job. some employer might ask that since you're skilled enough to do such thing, then you should have been performing extraordinarily on the job, even if you are already delivering what the job asks for, and just as good as your peers. at worst, some struggling poorly managed startup might even "turnaround" and eventually you don't own your side passion project anymore.