Skip to content
April 9, 2014 / like2think

Annealing rate

When using gradient descent it is comfortable to have the learning rate decreasing exponentially, that is \epsilon_n = \theta^{n} \epsilon_0. Setting \theta is then important. Suppose I have chosen desirable \epsilon_0 and \epsilon_n after n steps. How to find the corresponding \theta without a calculator?

First of all usually \log_{10} \epsilon_0 and \log_{10} \epsilon_n are easy to calculate (cause they are frequently just powers of ten, simple, isn’t it?). Then \log 10 = 2.3 – easy to remember. Finally \theta is usually close to one, which means that \log \theta \approx \theta - 1. Here is an equation that binds all these guys:

n = \frac{-\log_{10} \epsilon_n + \log_{10} \epsilon_0}{-2.3 \log_{10} (1 - \theta)}

It is even simpler to use it in the opposite direction: just remember that your learning rate becomes ten times smaller each 2.3 \cdot 10^{-\log_{10} (1 - \theta)} iterations.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: