Hacker News new | ask | show | jobs
by nerdponx 1528 days ago
It's weirder than that. You typically are differentiating a loss function of strings and various opaque weights. You are optimizing the loss function over the weight space, so in some informal contrived sense you are actually differentiating with respect to the string.