## Segment HMMs

### February 21, 2009

I’ve been finding out about segment HMMs. They’re a generalisation of explicit state duration HMMs and a specific case of the hierarchical HMM. I’ll describe them a bit in this post, and then describe the forward algorithm in the next post. Both posts will be culled mostly from Kevin Murphy’s Segment HMM tutorial. All of the errors in this post will be mine, as I’m trying to interpret Kevin’s stuff without using the Dynamic Bayesian Networks, which I’m starting to think is a horrible mistake on my part.

Instead of this post you should probably read a tech report from Cambridge called The Theory of Segmental Hidden Markov Models, which is from the early 90s, and also stuff about hierarchical HMMs, which I think generalise segment HMMs, starting with The Hierarchical Hidden Markov Model: Analysis and Applications by Fine, Singer and Tishby in 1998 and maybe Kevin’s 2001 NIPS paper Linear Time Inference in Hierarchical HMMs.

A segment HMM is like a regular HMM except that each state, known as an *abstract* or *generalised* state, can emit observation sequences, rather than just a single observation. They generalise ‘explicit state duration’ HMMs as the joint probability of the output sequence can be any distribution (i.e. another HMM, a state space model, a mixture, or whatever), whereas in an explicit state duration HMM the observations are assumed to be independent given the underlying abstract state.

They’re hard to draw as graphical models, as the number of observations emitted from each abstract state is controlled by a distribution, , where is the length of an observed subsequence and is the abstract state. So, as each abstract state emits observations the graph has a dynamic topology. You can actually introduce a variable that allows you to draw these things out as valid graphical models – see Kevin’s paper for more of that.

To try and think about the different levels of this structure I’m using two different time indexes. One time is where the abstract state lives and the other time is where the observations live. I’m only really using to describe the model, where it makes sense to think about different time scales. When we actually come to write down the forward algorithm, we’ll not know when the states transitioned, so we’ll have to consider the possibility of the states existing at every point in the observed sequence.

So the segment HMM (SHMM) can be described using four distributions:

- the initial state distribution
- the transition distribution
- the duration distribution
- the observation distribution

where is the start of a segment. So we have two time scales – a Markov chain that happily gets on with its transitions at every -tick. In the observed time, however, we’re actually seeing whole sequences of observations goverend by a single model. Splitting it up this way introduces a really clear structure, avoiding the need to use heavy self-transition probabilities, giving us control over the distribution of durations and allowing us to specify complex dynamic structures within each segment without complicating the simple underlying switching process.