admin管理员组文章数量:1295693
I am having a weird issue with PyTorch's autograd functionality when implementing a custom loss calculation on a second order differential equation. In the code below, predictions of the neural network are checked if they satisfy a second order differential equation. This works fine. However, when I want to calculate the gradient of the loss with respect to the predictions, I get an error indicating that there seems to be no connection between loss and u
in the computational graph.
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set
allow_unused=True
if this is the desired behavior.
This doesn't make sense because the loss is directly dependent and calculated with the prior derivatives that originate from u
. Deriving the loss with respect to u_xx
and u_t
works, deriving to u_x
does NOT. We verified that .requires_grad
is set to True
for all variables (X
, u
, u_d
, u_x
, u_t
, u_xx
).
Why does this happen, and how to fix this?
Main code:
# Ensure X requires gradients
X.requires_grad_(True)
# Get model predictions
u = self.pinn(X)
# Compute first-order gradients (∂u/∂x and ∂u/∂t)
u_d = torch.autograd.grad(
u,
X,
grad_outputs=torch.ones_like(u),
retain_graph=True,
create_graph=True, # Allow higher-order differentiation
)[0]
# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1] # ∂u/∂x and ∂u/∂t
# Compute second-order derivative ∂²u/∂x²
u_xx = torch.autograd.grad(
u_x,
X,
grad_outputs=torch.ones_like(u_x),
retain_graph=True,
create_graph=True,
)[0][:, 0]
# Diffusion equation (∂u/∂t = κ * ∂²u/∂x²)
loss = nn.functional.mse_loss(u_t, self.kappa * u_xx)
## THIS FAILS
# Compute ∂loss/∂u
loss_u = torch.autograd.grad(
loss,
u,
grad_outputs=torch.ones_like(loss),
retain_graph=True,
create_graph=True,
)[0]
# Return error on diffusion equation
return loss
Model:
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential [1, 1] --
├─Linear: 1-1 [1, 50] 150
├─Tanh: 1-2 [1, 50] --
├─Linear: 1-3 [1, 50] 2,550
├─Tanh: 1-4 [1, 50] --
├─Linear: 1-5 [1, 50] 2,550
├─Tanh: 1-6 [1, 50] --
├─Linear: 1-7 [1, 50] 2,550
├─Tanh: 1-8 [1, 50] --
├─Linear: 1-9 [1, 1] 51
==========================================================================================
Total params: 7,851
Trainable params: 7,851
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
==========================================================================================
What we have already tried:
Reverted to an older PyTorch version (tested on 2.5.0, and 1.13.1). Same issue.
Putting .requires_grad_(True)
after every variable assignment. This did not help.
We also tried to replace the tensor slicing by multiplying with zero/one vectors without results. We though this slicing might disturb the computational graph breaking the connection to u
.
# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1] # ∂u/∂x and ∂u/∂t
# Extract derivatives alternative
u_x = torch.sum(
torch.reshape(torch.tensor([1, 0], device=u_d.device), [1, -1]) * u_d,
dim=1,
keepdim=True,
)
u_t = ...
Thanks for your help!
I am having a weird issue with PyTorch's autograd functionality when implementing a custom loss calculation on a second order differential equation. In the code below, predictions of the neural network are checked if they satisfy a second order differential equation. This works fine. However, when I want to calculate the gradient of the loss with respect to the predictions, I get an error indicating that there seems to be no connection between loss and u
in the computational graph.
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set
allow_unused=True
if this is the desired behavior.
This doesn't make sense because the loss is directly dependent and calculated with the prior derivatives that originate from u
. Deriving the loss with respect to u_xx
and u_t
works, deriving to u_x
does NOT. We verified that .requires_grad
is set to True
for all variables (X
, u
, u_d
, u_x
, u_t
, u_xx
).
Why does this happen, and how to fix this?
Main code:
# Ensure X requires gradients
X.requires_grad_(True)
# Get model predictions
u = self.pinn(X)
# Compute first-order gradients (∂u/∂x and ∂u/∂t)
u_d = torch.autograd.grad(
u,
X,
grad_outputs=torch.ones_like(u),
retain_graph=True,
create_graph=True, # Allow higher-order differentiation
)[0]
# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1] # ∂u/∂x and ∂u/∂t
# Compute second-order derivative ∂²u/∂x²
u_xx = torch.autograd.grad(
u_x,
X,
grad_outputs=torch.ones_like(u_x),
retain_graph=True,
create_graph=True,
)[0][:, 0]
# Diffusion equation (∂u/∂t = κ * ∂²u/∂x²)
loss = nn.functional.mse_loss(u_t, self.kappa * u_xx)
## THIS FAILS
# Compute ∂loss/∂u
loss_u = torch.autograd.grad(
loss,
u,
grad_outputs=torch.ones_like(loss),
retain_graph=True,
create_graph=True,
)[0]
# Return error on diffusion equation
return loss
Model:
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
Sequential [1, 1] --
├─Linear: 1-1 [1, 50] 150
├─Tanh: 1-2 [1, 50] --
├─Linear: 1-3 [1, 50] 2,550
├─Tanh: 1-4 [1, 50] --
├─Linear: 1-5 [1, 50] 2,550
├─Tanh: 1-6 [1, 50] --
├─Linear: 1-7 [1, 50] 2,550
├─Tanh: 1-8 [1, 50] --
├─Linear: 1-9 [1, 1] 51
==========================================================================================
Total params: 7,851
Trainable params: 7,851
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
==========================================================================================
What we have already tried:
Reverted to an older PyTorch version (tested on 2.5.0, and 1.13.1). Same issue.
Putting .requires_grad_(True)
after every variable assignment. This did not help.
We also tried to replace the tensor slicing by multiplying with zero/one vectors without results. We though this slicing might disturb the computational graph breaking the connection to u
.
# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1] # ∂u/∂x and ∂u/∂t
# Extract derivatives alternative
u_x = torch.sum(
torch.reshape(torch.tensor([1, 0], device=u_d.device), [1, -1]) * u_d,
dim=1,
keepdim=True,
)
u_t = ...
Thanks for your help!
Share edited Feb 12 at 14:06 simon 5,5411 gold badge16 silver badges29 bronze badges asked Feb 12 at 13:05 Wout RomboutsWout Rombouts 1,7392 gold badges12 silver badges18 bronze badges1 Answer
Reset to default 0So I think that the issue is how you extract the derivatives. Whether you use the slicing version or the alternative you mentioned the same issue persists, being the resulting u_x
and its subsequent u_xx
are not used directly in the loss fn beyond u_xx
, so torch can't find how to get from u
to u_xx
through u_x
.
This leaves you with two options:
- If it is acceptable that part of u's gradient is unused (0) you can set the allow_unused flag to True in the calculation of u's loss
loss_u = torch.autograd.grad(loss, u, grad_outputs=torch.ones_like(loss), allow_unused=True, retain_graph=True, create_graph=True)[0]
- If you want the gradient to fully propogate through u including u_x, you can add an extra loss term that enforces the dependency
u_x_loss = alpha * mse_loss(u_x, u_x.detach())
loss = mse_loss(u_t, self.kappa * u_xx) + u_x_loss
Note: this extra u_x_loss
term does not need to be mse_loss specifically, but any loss fn that enforces the dependency (e.g. MAE loss, Gradient penalty, etc.)
you can set alpha
to be really small if you don't want it to have a large effect, but be careful if it's too small you can get weird values.
TLDR
doesn't matter how you extract the derivatives either with slicing or the scalar multiplication alternative, you are breaking the graph in either case. Either allow unused values in the gradient calculation, or add a u_x loss term to enforce the graph remaining intact.
本文标签:
版权声明:本文标题:python - Gradient computation with PyTorch autograd with 1st and 2nd order derivatives does not work - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741597462a2387498.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论