Rprop
Optimization technique that uses the sign of gradient
Most of the gradient based solvers use the magnitude and the sign of the gradient of the objective function. This heuristic often works well but there is no guarantee that this is always the good choice.
To see that is might work extremely badly, let us see the following examples,
Red: 0.01x^2, Blue: 100x^2
These functions have same minima but the gradients are drastically different. The gradient for the blue curve explodes, while for the red, it vanishes.
To tackle this scenario, we can make use of the sign of the gradient, disregarding the magnitude.
Rprop uses the sign of the gradient to update the step-size for each of the weights.
If the first and second gradients have the same direction, we can be relatively sure that we are moving in the right direction then the step size for that particular weight can be increased multiplicatively by a factor that is greater than 1, (for example, 1.2).
If the gradients don't have the same sign, then the step-size should decreased multiplicatively (e.g. by a factor 0.5).

Major Parameters

Etas

They are the factors by which the step-size is scaled.
Its better to keep the first eta less than 1 and the other greater than 1 such that the step-size is decreased when the direction of the gradients are different and increased when the direction of gradients are the same.

Step sizes

They define the maximum and minimum step size in the training process.
Usually, the default value of 0.000001 and 50 are the correct choices.

Code Implementation

1
# importing the library
2
import torch
3
import torch.nn as nn
4
โ€‹
5
x = torch.randn(10, 3)
6
y = torch.randn(10, 2)
7
โ€‹
8
# Build a fully connected layer.
9
linear = nn.Linear(3, 2)
10
โ€‹
11
# Build MSE loss function and optimizer.
12
criterion = nn.MSELoss()
13
โ€‹
14
# Optimization method using Rprop
15
optimizer = torch.optim.Rprop(linear.parameters(), lr=0.01, etas=(0.5, 1.2),
16
step_sizes=(1e-06, 50))
17
โ€‹
18
# Forward pass.
19
pred = linear(x)
20
โ€‹
21
# Compute loss.
22
loss = criterion(pred, y)
23
print('loss:', loss.item())
24
โ€‹
25
optimizer.step()
Copied!
Last modified 4mo ago