Hacker News new | ask | show | jobs
by cycomanic 384 days ago
The issue with optical neuromorphic computing is that the field has been doing the easy part, i.e. the matrix multiplication. We have known for decades that imaging/interference networks can do matrix operations in a massively parallel fashion. The problem is the nonlinear activation function between your layers. People have largely been ignoring this, or just converted back to electrical (now you are limited again by the cost/bandwidth of the electronics).
1 comments

Seems hard to imagine there’s not some non-linear optical property they could take advantage of
The problem is intensity/power, as discussed previously photon-photon interactions are weak, so you need very high intensities to get a reasonable nonlinear response. The issue is, that optical matrix operations work by spreading out the light over many parallel paths, i.e. reducing the intensity in each path. There might be some clever ways to overcome this, but so far everyone has avoided that problem. They said we did "optical deep learning" what they really did was an optical matrix multiplication, but saying that would not have resulted in a Nature publication.
There is, and people have trained purely optical neural networks:

https://arxiv.org/abs/2208.01623

The real issue is trying to backpropagate those nonlinear optics. You need a second nonlinear optical component that matches the derivative of the first nonlinear optical component. In the paper above, they approximate the derivative by slightly changing the parameters, but that means the training time scales linearly with the number of parameters in each layer.

Note: the authors claim it takes O(sqrt N) time, but they're forgetting that the learning rate mu = o(1/sqrt N) if you want to converge to a minimum:

    Loss(theta + dtheta) = Loss(theta) + dtheta * dLoss(theta) + O(dtheta^2)
                         = Loss(theta) + mu * sqrtN * C (assuming Lipschitz continuous)
    ==>     min(Loss)    = mu * sqrtN * C/2