## Forward Equations for the Segment HMM

### February 21, 2009

This is a derivation of the forward equations for the Segment HMM to follow on from my last post. Again I should say that this mostly based on Kevin Murphy’s Segment HMM tutorial which you should probably be reading instead of these posts.

My aim, for this post, is to describe the forward algorithm for segment HMMs, which will be used, along with the backward algorithm, in the Baum-Welch algorithm for estimation the paramters of the model. All the notation follows on from the post preceding this one, which defined the segment HMM. We’ll begin by defining the forward variable:

$\alpha_t(q) = p(y_{1:t},Q_t=q)$

This is the joint probability of the $t$th state and the observations up to and including $y_t$. We want to know this probability for each different state, so we can figure out which one is more likely given our observations. We’ll use this to estimate the parameters of the model later on.

The point of the forward algorithm is to be able to calculate this probability at each point in time in a way that doesn’t melt our computers. To do this, we simply define an algorithm that expresses $\alpha_t(q)$ down as a function of past $\alpha$s. This is possible because we can exploit the structure in our model – namely that, given the abstract state, our observation sequences are independent from one another. To do this we’re going to need to include information about the duration of each segment, so let’s get that over with first:

$\alpha_t(q) = \sum_l p(y_{1:t},Q_t=q,L_t=l)$

where $l$ is the duration of the segment. This allows us to split up the current segment from the rest of the observations:

$\alpha_t(q) =\sum_l p(y_{t-l+1:t}|Q_t=q,L_t=l)p(y_{1:t-l},Q_t=q,L_t=l)$

which is  great, because now the first term is our observation distribution. OK, noting that $\alpha_{t-l}(q') = p(y_{1:t-l},Q_{t-l}=q')$, we can re-write $\alpha_t(q)$ in terms of the past $\alpha$s by introducing all of the durations and all of the states:

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(y_{1:t-l},q,l,Q_{t-l}=q',L_{t-l}=l')$

(I’ve dropped the $Q_t=q,L_t=l$ notation in favour of just $q,l$ because these equations started to take up more than one line). This we can split up again, to give

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(q,l|y_{1:t-l},q',l')p(y_{1:t-l},q',l')$

which, because $Q$ and $L$ are independent of the observations, gives

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(q,l|q',l')p(y_{1:t-l},q',l')$

Now we apply an assumption that $p(q,l|q',l') = p(q|q')p(l|q)$ to give

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'} p(q|q')p(l|q)p(y_{1:t-l},q',l')$

which allows us to split up the sum a bit:

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)p(l|q)\sum_{q'}p(q|q')\sum_{l'}p(y_{1:t-l},q',l')$

and, hey presto:

$\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)p(l|q)\sum_{q'}p(q|q')\alpha_{t-l}(q')$

which is great – the $\alpha$ variable written in terms of its previous entries, and the model distributions! Don’t forget that to actually do this on more than just a few observations you’re going to need to use logs. Check out Kevin’s tutorial paper for the details.