| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1esk 2779 days ago
	like needing to explicitly write a wrapper for the backwards calculation for custom layers, which you don’t need to do in Keras for example Not sure I understand - you will need to write a backwards pass regardless if you use Keras, PyTorch, or anything else. With Keras, you would need to modify the underlying backend code (e.g. with tf.RegisterGradient or tf.custom_gradient). With Pytorch you write the backward() function, which is about the same amount of effort.

1 comments

mlthoughts2018 2779 days ago

You missed the point entirely. When you compose operations in Keras, it automatically generates the backpropagation implementation, you do not need RegisterGradient, custom_gradient or anything else if you are making new operations or layers as the composition of existing operations (whether that is logical indexing, concatenation, math functions, whatever).

In PyTorch, you still do have to define the backward function and worry about bookkeeping the gradient, clearing gradient values at the appropriate time, and explicitly calling to calculate these things in verbose optimizer invocation code.

I encourage you to check out how this works in Keras, because it is simply just factually different than what you are saying, in ways that are specifically designed to remove certain types of boilerplate or overhead or bookkeeping that are required by PyTorch.

link

p1esk 2779 days ago

No, you're wrong about Pytorch. If your custom op is a combination of existing ops, you don't need to define a custom backward pass. This is true for any DL framework with autodiff. For more details, look at this answer [1].

Regarding more verbose Pytorch code for the update step, compare:

In Tensorflow:

loss = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_logits)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

sess.run(optimizer)

In PyTorch:

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

loss = nn.CrossEntropyLoss()(output, label)

optimizer.zero_grad()

loss.backward()

optimizer.step()

In my opinion, PyTorch makes the parameter update process a lot easier to understand, control, and modify (if needed). For example what if you want to modify gradients right before the weight update? In PyTorch I'd do it right here in my code after the loss.backward() statement, while in TF I'd have to modify the optimizer code. Which option would you prefer?

[1] https://stackoverflow.com/questions/44428784/when-is-a-pytor...

link

marmaduke 2779 days ago

> PyTorch, you still do have to define the backward function and worry about bookkeeping the gradient, clearing gradient values at the appropriate time, and explicitly calling to calculate these things in verbose optimizer invocation code

I’ve definitely never had to do that. Where do you get this from?

link