# Annealing rate

When using gradient descent it is comfortable to have the learning rate decreasing exponentially, that is . Setting is then important. Suppose I have chosen desirable and after steps. How to find the corresponding without a calculator?

First of all usually and are easy to calculate (cause they are frequently just powers of ten, simple, isn’t it?). Then – easy to remember. Finally is usually close to one, which means that . Here is an equation that binds all these guys:

It is even simpler to use it in the opposite direction: just remember that your learning rate becomes ten times smaller each iterations.

Advertisements

## Leave a Reply