CyclicLR
Scheduling technique that cycles the learning rate between two boundaries.
Learning rate is the most important hyperparameter in training deep neural networks. CyclicLR eliminates the need to tune the learning rate. The learning rate cycles between a set boundaries with a certain frequency.

Major Parameters

Base LR

It is the initial learning rate - the lower end of the boundary. The learning will not be lower than the Base LR.

MAX LR

As the name suggests, it is the higher end of the boundary of the Cyclic LR. Hence, the learning rate can't be higher than MAX LR.

Step Size Up

Step size up is the number of iterations passed when increasing the learning rate from Base LR to MAX LR.

Step Size Down

Step size down is the number of iterations passed when decreasing the learning rate from MAX LR to Base LR.
If Step Size Down is set to null, then it's value is set to that of Step Size Up.

Mode

There are different techniques in which the learning rate can be varied between the two boundaries. These techniques are defined by the mode. The three modes available are:
  • triangular
  • triangular2
  • exp_range

Triangular

Let us explain the Triangular mode with a figure:
Triangular Cyclic LR Source:https://arxiv.org/abs/1506.01186
โ€‹
We can see that the maximum and minimum bound of the learning rates and the step size in which the learning rate reaches from Base LR to Max LR (here, step up size = step down size). A triangular wave function has been used to cycle the learning rate which is the Triangular Mode.

Triangular2

Triangular2 is another basic triangular cycle, similar to Triangular, but it scales the initial amplitude by half in each cycle. (Cycle is the number of iterations in which the initial learning rate is reached).
Triangular 2 cyclic LR

Exp Range

Exp Range is another type of cycle that scales the initial amplitude according to the set gamma and the number of cycles. The initial amplitude is scaled by
gammacyclesgamma^{cycles}

Gamma

It is the constant used in the exp_range to scale the amplitude.
Setting gamma greater than 1 can potentially cause the learning rate to explode to a very high value. Setting it to 1 will make it behave as the Triangular Mode. Therefore setting it to value lower than 1, but near to it (e.g. 0.99994) would be more effective.

Scale Mode

The scale mode defines whether the scaling of the amplitudes of the learning rate due to used mode happens every iteration or cycle.

Base Momentum

It is the lower momentum boundary when using the cyclic momentum. The momentum value is inversely proportional to the learning rate. Hence a MAX LR would have a base momentum.

Max Momentum

The maximum momentum that is used in the training process. Since it varies inversely with learning rate, the max momentum is applied in case of Base LR.
The default value of Base Momentum and Max Momentum are 0.8 and 0.9 respectively.

Code Implementation

1
import torch
2
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
3
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
4
scheduler =torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr,
5
step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0,cycle_momentum=false)
6
โ€‹
7
for epoch in range(20):
8
for input, target in dataset:
9
optimizer.zero_grad()
10
output = model(input)
11
loss = loss_fn(output, target)
12
loss.backward()
13
optimizer.step()
14
scheduler.step()
Copied!
Last modified 4mo ago