Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()
- Use
torch.nn.utils.clips_grad_norm_
(which is in-place) instead oftorch.nn.utils.clips_grad_norm
(which returns a copy and has been deprecated).
References
https://discuss.pytorch.org/t/proper-way-to-do-gradient-clipping/191