OOM means Observable Operator Model, yet another model of discrete-time discrete-value stochastic process. As long as I’m studying a tutorial on this topic I’m going to write a few notes to highlight those ideas that look most important for me.

At first OOM is about predictors. Predictors are a family of functions $g_b$, which for every process realization prefix $b$ map each possible word $a$ into it’s probability to go after $b$. In other words they represent all the possibles states of a process, where state is an entity that determines future of the process in an unique way.

Let’s consider the most stupid process ever: constant process. For it all the predictors coincide. The same situation is observed for “coin-tossing” process: future is always the same. However it differs from the previous case because predictor is at least non-degenerate.

The easiest dependency between the future and the past happens in a Markov chain. For it we will have as many different predictors as many states there are in it. However, it’s still a finite number. Even if we consider all the Markov processes (allowing in this way dependency on earlier-than-previous symbols), the number of different predictors will be easily bounded.

In order to see more complicated set of predictors we should try something more powerful, for example Hidden Markov Models (HMM). It’s easy to obtain a beautiful set that contains all the predictors of the process described by HMM. Let’s consider predictors corresponding to each state of hidden chain. We claim, that all the predictors lie in the subspace spanned over these functions. Then reason is that the past of the process up to current moment gives posterior probability for each of the states, and thus all the predictors can be expressed as a mixtures of the state predictors.

And here comes OOM. The main idea is: let’s considered all the processes, for which all the predictors lie in $m$-dimensional space. That means that the future of the process can be encoded by $m$ real numbers. That’s a nice property but there is one more even nicer. For each alphabet character $a$ let’s consider an operator $t_a$ that works in a way $t_a(g_b) = P(a|b) g_{ba}$. It turns out that this guy is linear!