When using gradient descent it is comfortable to have the learning rate decreasing exponentially, that is $\epsilon_n = \theta^{n} \epsilon_0$. Setting $\theta$ is then important. Suppose I have chosen desirable $\epsilon_0$ and $\epsilon_n$ after $n$ steps. How to find the corresponding $\theta$ without a calculator?
First of all usually $\log_{10} \epsilon_0$ and $\log_{10} \epsilon_n$ are easy to calculate (cause they are frequently just powers of ten, simple, isn’t it?). Then $\log 10 = 2.3$ – easy to remember. Finally $\theta$ is usually close to one, which means that $\log \theta \approx \theta - 1$. Here is an equation that binds all these guys:
$n = \frac{-\log_{10} \epsilon_n + \log_{10} \epsilon_0}{-2.3 \log_{10} (1 - \theta)}$
It is even simpler to use it in the opposite direction: just remember that your learning rate becomes ten times smaller each $2.3 \cdot 10^{-\log_{10} (1 - \theta)}$ iterations.