Imagine a kid playing a game called “Guess The Word!”. There is a pile of cards, with a sequence of seven coherent words written on each of them, the first and the last three on one side, the middle one on the other side. He picks a card, read the six words on the front and makes a guess about the skipped one. Then he flips the card and sees the correct answer.

A funny developing game, isn’t it? Requires, however, a certain language command to start. If the kid is properly motivated to play a lot, for example if he competes with his friends, it might improve his language skills.

My point is that this is exactly the way current machine leaning works. The process described above repeats a recent method of word representation learning.

In machine learning we choose a game with simple rules, we design a simple player that can learn to play better just by playing a lot. We leave it to play the game all night long, and in the morning we want to see it doing much better. And finally, we also expect that it will not only be good in this particular game, but also learn something useful, like a child did.

An interesting aspect of this comparison is that it justifies the pretraining phase. Because an infant can not play “Guess The Word!”. And also a savage from an isolated island can not play it. Or rather he can, but in a weird way: he might develop his own weird explanation of what these paintings mean and why “the” is always prepended to best. Make it a question of survival for this poor guy, and he would play not worse than me. But most probably he would never understand that these “words” stands for objects, colors, actions, etc.

So this is not a game you play with your 1-year old baby. Usually you find something a way simpler to “pretrain” it. The games you usually play with it involve input from different modalities: it hears, sees, touches. Another thing is that sometimes it even plays itself, stimulated by so well-known to everybody hate of boredom…

The challenge for machine learning is to organize such games.

To be continued.

When using gradient descent it is comfortable to have the learning rate decreasing exponentially, that is $\epsilon_n = \theta^{n} \epsilon_0$. Setting $\theta$ is then important. Suppose I have chosen desirable $\epsilon_0$ and $\epsilon_n$ after $n$ steps. How to find the corresponding $\theta$ without a calculator?

First of all usually $\log_{10} \epsilon_0$ and $\log_{10} \epsilon_n$ are easy to calculate (cause they are frequently just powers of ten, simple, isn’t it?). Then $\log 10 = 2.3$ – easy to remember. Finally $\theta$ is usually close to one, which means that $\log \theta \approx \theta - 1$. Here is an equation that binds all these guys:

$n = \frac{-\log_{10} \epsilon_n + \log_{10} \epsilon_0}{-2.3 \log_{10} (1 - \theta)}$

It is even simpler to use it in the opposite direction: just remember that your learning rate becomes ten times smaller each $2.3 \cdot 10^{-\log_{10} (1 - \theta)}$ iterations.

I really missed this one: to let sb down. Having it heard in a song of Bob Marley I did not know the meaning:

to disappoint someone by failing to do what you agreed to do or were expected to do

…brought you by Cambridge dictionary. Now I have a good translations for Russian подводить, hooray!

Here come three words with close meanings (at least for my taste): smug, arrogant, conceited. Let’s look at definitions from Cambridge dictionary:

smug – too pleased or satisfied about something you have achieved or something you know

arrogant – unpleasantly proud and behaving as if you are more important than, or know more than, other people

conceited – too proud of yourself and your actions and abilities

I think arrogant stands out being more description of person’s behaviour, not of his self-esteem. I also feel that one can be smug only for reason (some achievement), and can be conceited just because he evaluates himself not fairly.

P.S. A discussion on this topic: http://english.stackexchange.com/questions/58710/arrogant-vs-conceited.

1. ICA – I almost understood what it is but… still a hole.
2. Kalman Filter – everybody know what it is except me.
3. Decision Trees – however it seems that no deep theory exist in this area at all.
4. Boosting  – the idea is potentially so powerful… yum-yum.
5. Graphical Models – finally, this term!
6. Collaborating Filtering – as a remarkable application area for automatic learning techniques.
1. I want to learn to use vim as TeX editor. So far I have used Texmaker for this purpose. Eventually I realized that I don’t use most of its features. At the same time I miss advanced editing facilities of vim very much (and also my carefully selected plugins). The only reason why I still use Texmaker is its ability to jump to the place in pdf corresponding to given position in tex file and vice versa. But this feature is implemented using standard interface called synctex. A number of plugins for vim partially know to how to work with this guy, and what I actually have to do is find the right one. Hopefully some of them will do the job.
2. I want to master python packaging tool aka distutils. The idea is that it would be nice to reuse own code without copying files and hacking with PYTHONPATH. Allow other people to try it fast is another benefit which I’m going to earn this way.

The word which I have some original right to hate being a colorblind: tinge – a very slight amount of colour or feeling (stolen from Cambridge dictionary), used so frequently in books to refine appearance descriptions.