CosineAnnealingLR
Sets the learning rate according to the cosine annealing schedule.
Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.
This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,
ηt=ηmin+12(ηmax−ηmin)(1+cos(TcurTmaxπ))\eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min}) \left (1+cos \left( \frac{T_{cur}}{T_{max}} \pi \right) \right)
Where
ηt\eta_t
is the current learning rate,
ηmax\eta_{max}
and
ηmin\eta_{min}
are the maximum and the minimum learning rates respectively,
TcurT_{cur}
is the current number of accumulated epochs.
From the above equation, we can see that once
Tcur=TmaxT_{cur}=T_{max}
, the learning rate becomes
ηmin\eta_{min}
.

Major Parameters

T Max

T Max is the maximum number of iterations that is used in the aforementioned function.
Learning rate with different T max. Source: https://arxiv.org/abs/1608.03983v5
Note that the
T0T_0
in the given figure is
TmaxT_{max}
.

Eta Min

It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.

Code Implementation

1
import torch
2
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
3
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
4
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
5
for epoch in range(20):
6
for input, target in dataset:
7
optimizer.zero_grad()
8
output = model(input)
9
loss = loss_fn(output, target)
10
loss.backward()
11
optimizer.step()
12
scheduler.step()
Copied!
Last modified 4mo ago