Forward Equations for the Segment HMM

February 21, 2009

This is a derivation of the forward equations for the Segment HMM to follow on from my last post. Again I should say that this mostly based on Kevin Murphy’s Segment HMM tutorial which you should probably be reading instead of these posts.

My aim, for this post, is to describe the forward algorithm for segment HMMs, which will be used, along with the backward algorithm, in the Baum-Welch algorithm for estimation the paramters of the model. All the notation follows on from the post preceding this one, which defined the segment HMM. We’ll begin by defining the forward variable:

\alpha_t(q) = p(y_{1:t},Q_t=q)

This is the joint probability of the tth state and the observations up to and including y_t. We want to know this probability for each different state, so we can figure out which one is more likely given our observations. We’ll use this to estimate the parameters of the model later on.

The point of the forward algorithm is to be able to calculate this probability at each point in time in a way that doesn’t melt our computers. To do this, we simply define an algorithm that expresses \alpha_t(q) down as a function of past \alphas. This is possible because we can exploit the structure in our model – namely that, given the abstract state, our observation sequences are independent from one another. To do this we’re going to need to include information about the duration of each segment, so let’s get that over with first:

\alpha_t(q) = \sum_l p(y_{1:t},Q_t=q,L_t=l)

where l is the duration of the segment. This allows us to split up the current segment from the rest of the observations:

\alpha_t(q) =\sum_l p(y_{t-l+1:t}|Q_t=q,L_t=l)p(y_{1:t-l},Q_t=q,L_t=l)

which is  great, because now the first term is our observation distribution. OK, noting that \alpha_{t-l}(q') = p(y_{1:t-l},Q_{t-l}=q'), we can re-write \alpha_t(q) in terms of the past \alphas by introducing all of the durations and all of the states:

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(y_{1:t-l},q,l,Q_{t-l}=q',L_{t-l}=l')

(I’ve dropped the Q_t=q,L_t=l notation in favour of just q,l because these equations started to take up more than one line). This we can split up again, to give

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(q,l|y_{1:t-l},q',l')p(y_{1:t-l},q',l')

which, because Q and L are independent of the observations, gives

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'}p(q,l|q',l')p(y_{1:t-l},q',l')

Now we apply an assumption that p(q,l|q',l') = p(q|q')p(l|q) to give

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)\sum_{q',l'} p(q|q')p(l|q)p(y_{1:t-l},q',l')

which allows us to split up the sum a bit:

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)p(l|q)\sum_{q'}p(q|q')\sum_{l'}p(y_{1:t-l},q',l')

and, hey presto:

\alpha_t(q) =\sum_{l} p(y_{t-l+1:t}|q,l)p(l|q)\sum_{q'}p(q|q')\alpha_{t-l}(q')

which is great – the \alpha variable written in terms of its previous entries, and the model distributions! Don’t forget that to actually do this on more than just a few observations you’re going to need to use logs. Check out Kevin’s tutorial paper for the details.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: