admin管理员组

文章数量:1123430

I have a feed forward neural network that takes 1404 input data (468 3D facial landmark points flattened as [x1, y1, z1, x2, y2, z2, ...]) and meant to regress 3 values as output. All 1404 input data belongs to one sample and I feed many samples to this network. These samples belong to different subjects. For each subject, I have the input data from different angles np.arange(-40, 41, 10). If I plot the ground truth data for these angles, it makes a cosine curve. For each sample of each subject, I have 3 sets of ground truth data that I obtained from SVD. So, each of these sets is a cosine curve with different parameters. I used the following architecture to train a FFN to regress three values. However, after training the model using the data of 1300 subjects, the model doesn't fit very well.

self.input_size = 1404
self.total_output_size = 3

self.encoder = nn.Sequential(
nn.Linear(input_size, 2048),   
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(2048, 4096),          
nn.Tanh(),
nn.Dropout(0.1),  

nn.Linear(4096, 4096),          
nn.Tanh(),
nn.Dropout(0.1),                 

nn.Linear(4096, 4096),          
nn.Tanh(),
nn.Dropout(0.1),  

nn.Linear(4096, 4096),          
nn.Tanh(),
nn.Dropout(0.1),               

nn.Linear(4096, 2048),          
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(2048, 1024),          
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(1024, 512),          
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(512, 256),           
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(256, 128),           
nn.Tanh(),
nn.Dropout(0.1),

nn.Linear(128, 64),            
nn.Tanh(),
nn.Dropout(0.1),            

nn.Linear(64, self.total_output_size)
)

For the above network, I used MSE loss function, SGD optimizer with lr = 1e-4, batch size = 256, weight_decay=1e-5, scheduler = StepLR(optimizer, step_size=10, gamma=0.9) and gradient clipping. The val loss starts from 0.084 and ends in 0.005 after 100 epoches. I tweaked all aformentioned hyperparams but very negligible or no improvement.

I started with simple architectures: 1404 -> 512 -> ReLU -> 256 -> ReLU -> 128 -> ReLU -> 3 1404 -> 512 -> ReLU -> 256 -> ReLU -> 128 -> ReLU -> 64 -> ReLU -> 3 with increasing 128 or 64 blocks but the model doesn't learn anything. Using different optimizer rather than SGD didn't help. Based on my experiments diamaond shape network worked better for this task.

Does anyone have any insight or advice to help me with this problem? Thank you in advance!

本文标签: neural networkFFN performs very poor for regressing a cosine shape functionStack Overflow