Base Learning Rate

The size of the steps your optimizer takes on the loss landscape

The learning rate defines how large the steps of your optimizer are on your loss landscape. The base learning rate defines at which learning rate your optimizer starts before applying any methods like momentum or dampening.

Choosing the right learning rate means balancing the trade-off between reaching the minimum quickly and not making such big steps that you miss it.

A high learning rate speeds up your optimizer but you might miss the Minimum.

A valid approach to finding the right base learning rate is to start at 0.1 and then get smaller by a fraction of 10 with each experiment until the results don't improve anymore, ceteris paribus. But that's no silver bullet, there are many other approaches out there as well.

A learning rate of 0.1 means that the weights in the network are updated by 10% of the estimated weight error each iteration.

Further resources