admin管理员组

文章数量:1295693

I am having a weird issue with PyTorch's autograd functionality when implementing a custom loss calculation on a second order differential equation. In the code below, predictions of the neural network are checked if they satisfy a second order differential equation. This works fine. However, when I want to calculate the gradient of the loss with respect to the predictions, I get an error indicating that there seems to be no connection between loss and u in the computational graph.

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

This doesn't make sense because the loss is directly dependent and calculated with the prior derivatives that originate from u. Deriving the loss with respect to u_xx and u_t works, deriving to u_x does NOT. We verified that .requires_grad is set to True for all variables (X, u, u_d, u_x, u_t, u_xx).

Why does this happen, and how to fix this?

Main code:

# Ensure X requires gradients
X.requires_grad_(True)

# Get model predictions
u = self.pinn(X)

# Compute first-order gradients (∂u/∂x and ∂u/∂t)
u_d = torch.autograd.grad(
    u,
    X,
    grad_outputs=torch.ones_like(u),
    retain_graph=True,
    create_graph=True,  # Allow higher-order differentiation
)[0]

# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1]  # ∂u/∂x and ∂u/∂t

# Compute second-order derivative ∂²u/∂x²
u_xx = torch.autograd.grad(
    u_x,
    X,
    grad_outputs=torch.ones_like(u_x),
    retain_graph=True,
    create_graph=True,
)[0][:, 0]

# Diffusion equation (∂u/∂t = κ * ∂²u/∂x²)
loss = nn.functional.mse_loss(u_t, self.kappa * u_xx)

## THIS FAILS
# Compute ∂loss/∂u
loss_u = torch.autograd.grad(
    loss,
    u,
    grad_outputs=torch.ones_like(loss),
    retain_graph=True,
    create_graph=True,
)[0]

# Return error on diffusion equation
return loss

Model:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               [1, 1]                    --
├─Linear: 1-1                            [1, 50]                   150
├─Tanh: 1-2                              [1, 50]                   --
├─Linear: 1-3                            [1, 50]                   2,550
├─Tanh: 1-4                              [1, 50]                   --
├─Linear: 1-5                            [1, 50]                   2,550
├─Tanh: 1-6                              [1, 50]                   --
├─Linear: 1-7                            [1, 50]                   2,550
├─Tanh: 1-8                              [1, 50]                   --
├─Linear: 1-9                            [1, 1]                    51
==========================================================================================
Total params: 7,851
Trainable params: 7,851
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
==========================================================================================

What we have already tried:

Reverted to an older PyTorch version (tested on 2.5.0, and 1.13.1). Same issue.

Putting .requires_grad_(True) after every variable assignment. This did not help.

We also tried to replace the tensor slicing by multiplying with zero/one vectors without results. We though this slicing might disturb the computational graph breaking the connection to u.

# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1]  # ∂u/∂x and ∂u/∂t

# Extract derivatives alternative
u_x = torch.sum(
    torch.reshape(torch.tensor([1, 0], device=u_d.device), [1, -1]) * u_d,
    dim=1,
    keepdim=True,
)
u_t = ...

Thanks for your help!

I am having a weird issue with PyTorch's autograd functionality when implementing a custom loss calculation on a second order differential equation. In the code below, predictions of the neural network are checked if they satisfy a second order differential equation. This works fine. However, when I want to calculate the gradient of the loss with respect to the predictions, I get an error indicating that there seems to be no connection between loss and u in the computational graph.

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

This doesn't make sense because the loss is directly dependent and calculated with the prior derivatives that originate from u. Deriving the loss with respect to u_xx and u_t works, deriving to u_x does NOT. We verified that .requires_grad is set to True for all variables (X, u, u_d, u_x, u_t, u_xx).

Why does this happen, and how to fix this?

Main code:

# Ensure X requires gradients
X.requires_grad_(True)

# Get model predictions
u = self.pinn(X)

# Compute first-order gradients (∂u/∂x and ∂u/∂t)
u_d = torch.autograd.grad(
    u,
    X,
    grad_outputs=torch.ones_like(u),
    retain_graph=True,
    create_graph=True,  # Allow higher-order differentiation
)[0]

# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1]  # ∂u/∂x and ∂u/∂t

# Compute second-order derivative ∂²u/∂x²
u_xx = torch.autograd.grad(
    u_x,
    X,
    grad_outputs=torch.ones_like(u_x),
    retain_graph=True,
    create_graph=True,
)[0][:, 0]

# Diffusion equation (∂u/∂t = κ * ∂²u/∂x²)
loss = nn.functional.mse_loss(u_t, self.kappa * u_xx)

## THIS FAILS
# Compute ∂loss/∂u
loss_u = torch.autograd.grad(
    loss,
    u,
    grad_outputs=torch.ones_like(loss),
    retain_graph=True,
    create_graph=True,
)[0]

# Return error on diffusion equation
return loss

Model:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               [1, 1]                    --
├─Linear: 1-1                            [1, 50]                   150
├─Tanh: 1-2                              [1, 50]                   --
├─Linear: 1-3                            [1, 50]                   2,550
├─Tanh: 1-4                              [1, 50]                   --
├─Linear: 1-5                            [1, 50]                   2,550
├─Tanh: 1-6                              [1, 50]                   --
├─Linear: 1-7                            [1, 50]                   2,550
├─Tanh: 1-8                              [1, 50]                   --
├─Linear: 1-9                            [1, 1]                    51
==========================================================================================
Total params: 7,851
Trainable params: 7,851
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.03
Estimated Total Size (MB): 0.03
==========================================================================================

What we have already tried:

Reverted to an older PyTorch version (tested on 2.5.0, and 1.13.1). Same issue.

Putting .requires_grad_(True) after every variable assignment. This did not help.

We also tried to replace the tensor slicing by multiplying with zero/one vectors without results. We though this slicing might disturb the computational graph breaking the connection to u.

# Extract derivatives
u_x, u_t = u_d[:, 0], u_d[:, 1]  # ∂u/∂x and ∂u/∂t

# Extract derivatives alternative
u_x = torch.sum(
    torch.reshape(torch.tensor([1, 0], device=u_d.device), [1, -1]) * u_d,
    dim=1,
    keepdim=True,
)
u_t = ...

Thanks for your help!

Share edited Feb 12 at 14:06 simon 5,5411 gold badge16 silver badges29 bronze badges asked Feb 12 at 13:05 Wout RomboutsWout Rombouts 1,7392 gold badges12 silver badges18 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

So I think that the issue is how you extract the derivatives. Whether you use the slicing version or the alternative you mentioned the same issue persists, being the resulting u_x and its subsequent u_xx are not used directly in the loss fn beyond u_xx, so torch can't find how to get from u to u_xx through u_x.

This leaves you with two options:

  1. If it is acceptable that part of u's gradient is unused (0) you can set the allow_unused flag to True in the calculation of u's loss

loss_u = torch.autograd.grad(loss, u, grad_outputs=torch.ones_like(loss), allow_unused=True, retain_graph=True, create_graph=True)[0]

  1. If you want the gradient to fully propogate through u including u_x, you can add an extra loss term that enforces the dependency
u_x_loss = alpha * mse_loss(u_x, u_x.detach())  
loss = mse_loss(u_t, self.kappa * u_xx) + u_x_loss 

Note: this extra u_x_loss term does not need to be mse_loss specifically, but any loss fn that enforces the dependency (e.g. MAE loss, Gradient penalty, etc.)

you can set alpha to be really small if you don't want it to have a large effect, but be careful if it's too small you can get weird values.

TLDR

doesn't matter how you extract the derivatives either with slicing or the scalar multiplication alternative, you are breaking the graph in either case. Either allow unused values in the gradient calculation, or add a u_x loss term to enforce the graph remaining intact.

本文标签: