ReduceLROnPlateau
Reduce learning rate when a metric has stopped improving.
ReduceLROnPlateau is a scheduling technique that monitors a quantity and decays the learning rate when the quantity stops improving.
The improvement of the quantity is based on whether it increases or decreases by a certain minimum amount. This minimum amount is the threshold.

Major Parameters

Mode

The user is able to define one of the two modes : min and max.
If max is chosen, then the learning rate is decayed once the monitored quantity stops increasing by a certain minimum threshold.
If min is chosen, then the learning rate is decayed once the monitored quantity stops decreasing by a certain minimum threshold.

Factor

It is the factor by which the learning rate is decreased when the quantity stops improving.
The factor value should be greater than 0 and less than 1. If the value is greater than 1, then the learning rate will explode. If the factor is 1, then it would never decay the learning rate.

Patience

It is number of epochs with no improvement after which the learning rate is reduced. If the patience is 10, then it ignores the first 10 epochs with no improvement in the quantity and reduces the learning rate in the 11th epoch.

Threshold

It is the minimum value by which the quantity should change in order to count as an "improvement". For example, if threshold is 0.001 and the monitored quantity changes from 0.003 to 0.0025, then this is not counted as improvement.

Threshold Mode

The user is able to choose rel or abs for the threshold mode.
It essentially defines the way in which a dynamic threshold is calculated.
Mathematically in rel mode:
dynamic threshold = best * (1+ threshold) in 'max' mode or best * (1- threshold) in 'min' mode.
In abs mode:
dynamic threshold = best + threshold in 'max' mode or best - threshold in 'min' mode.

Cooldown

It is the number of epochs that the tool would wait after the reduction of the learning rate before resuming the normal operations.

Min LR

It is the minimum learning rate for all the parameters. The learning rate would be this constant minimum once it reaches it.

Eps

It is the minimum amount of decay set for the learning rate. difference between the previous learning rate and the current learning rate is less than Eps, then this decay is ignored and the previous learning rate is used.

Code Implementation

1
import torch
2
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
3
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
4
scheduler=torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel',
5
cooldown=0, min_lr=0, eps=1e-08)
6
for epoch in range(20):
7
for input, target in dataset:
8
optimizer.zero_grad()
9
output = model(input)
10
loss = loss_fn(output, target)
11
loss.backward()
12
optimizer.step()
13
scheduler.step()
Copied!
โ€‹
Last modified 4mo ago