Inside–outside algorithm

For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).

Inside and outside probabilities

[edit]

The inside probability is the total probability of generating words , given the root nonterminal and a grammar :[1]

The outside probability is the total probability of beginning with the start symbol and generating the nonterminal and all the words outside , given a grammar :[1]

Computing inside probabilities

[edit]

Base Case:

General case:

Suppose there is a rule in the grammar, then the probability of generating starting with a subtree rooted at is:

The inside probability is just the sum over all such possible rules:

Computing outside probabilities

[edit]

Base Case:

Here the start symbol is .

General case:

Suppose there is a rule in the grammar that generates . Then the left contribution of that rule to the outside probability is:

Now suppose there is a rule in the grammar. Then the right contribution of that rule to the outside probability is:

The outside probability is the sum of the left and right contributions over all such rules:

References

[edit]
  1. ^ a b Manning, Christopher D.; Hinrich Schütze (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press. pp. 388–402. ISBN 0-262-13360-1.
[edit]