Belief propagation
From Wikipedia, the free encyclopedia
Belief propagation, also known as the sum-product algorithm, is an iterative algorithm for computing marginals of functions on a graphical model most commonly used in artificial intelligence and information theory. Judea Pearl in 1982[1] formulated this algorithm on trees, and Kim and Pearl (in 1983)[2] on polytrees. Pearl (1988)[3] has then suggested this algorithm as an approximation for general (loopy) network. It is an efficient inference algorithm on trees and has demonstrated empirical success in numerous applications including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability. It is commonly used in pairwise Markov random fields (which have a maximum clique size of 2), Bayesian networks, and factor graphs.
Recall that the marginal distribution of a single random variable Xi is simply the summation of a joint distribution over all variables except Xi, and let be an assignment of all variables in the joint distribution:
For the purposes of explaining this algorithm, consider the marginal function, which is simply an unnormalized marginal distribution with a generic global function :
Contents |
[edit] Exact algorithm for trees
This algorithm functions by passing positive real vector valued messages across edges in a graphical model. More precisely, in trees: a vertex sends a message to an adjacent vertex if (a) it has received messages from all of its other adjacent vertices and (b) hasn't already sent one. So in the first iteration, the algorithm sends messages from all leaf nodes to each of the lone vertices adjacent to those respective leaves and continues sending messages in this manner until all messages have been sent exactly once, hence explaining the term propagation. It is easily proven that all messages will be sent (there are twice the number of edges of them). Upon termination, the marginal of a variable is simply the product of the incoming messages of all its adjacent vertices. A simple proof of this fact, though somewhat messy, can be done by mathematical induction.
The message definitions will be described in the factor graph setting, as the algorithms for other graphical models are nearly identical. Since factor graphs have variable and factor nodes, there are two types of messages to define:
A variable message is a real-valued function that is a message sent from a variable to a factor, and defined as
A factor message is a real-valued function that is a message sent from a factor to a variable, and defined as
where N(u) is defined as the set of neighbours (adjacent vertices in a graph) of a vertex u. is an assignment to the vertices affecting fm (i.e. vertices in N(fm)).
As mentioned in the description of the algorithm, the marginal of Xi can be computed in the following manner:
One can also compute the marginal of a factor fj, equivalently, the marginal of the subset of variables Xj in the following manner:
[edit] Approximate algorithm for general graphs
Curiously, nearly the same algorithm is used in general graphs. The algorithm is then sometimes called "loopy" belief propagation, because graphs typically contain cycles, or loops. The procedure must be adjusted slightly because graphs might not contain any leaves. Instead, one initializes all variable messages to 1 and uses the same message definitions above, updating all messages at every iteration (although messages coming from known leaves or tree-structured subgraphs may no longer need updating after sufficient iterations). It is easy to show that in a tree, the message definitions of this modified procedure will converge to the set of message definitions given above within a number of iterations equal to the diameter of the tree.
The precise conditions under which loopy belief propagation will converge are still not well understood; it is known that graphs containing a single loop will converge to a correct solution. [4] Several sufficient (but not necessary) conditions for convergence of loopy belief propagation to a unique fixed point exist. [5] There exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations. Techniques like EXIT charts can provide an approximate visualisation of the progress of belief propagation and an approximate test for convergence.
There are other approximate methods for marginalization including variational methods and Monte Carlo methods.
One method of exact marginalization in general graphs is called the junction tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree. The basic premise is to eliminate cycles by clustering them into single nodes.
[edit] Related algorithm and complexity issues
A similar algorithm is commonly referred to as the Viterbi algorithm, but also known as the max-product or min-sum algorithm, which solves the related problem of maximization, or most probable explanation. Instead of attempting to solve the marginal, the goal here is to find the values that maximises the global function (i.e. most probable values in a probabilistic setting), and it can be defined using the arg max:
An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxima in the definitions.
It is worth noting that inference problems like marginalization and maximization are NP-hard to solve exactly and approximately (at least for relative error) in a graphical model. More precisely, the marginalization problem defined above is #P-complete and maximization is NP-complete.
[edit] Relation to free energy
The sum-product algorithm is related to the calculation of free energy in thermodynamics. A probability distribution
(as per the factor graph representation) can be viewed as a measure of the internal energy present in a system, computed as
The free energy of the system is then
It can then be shown that the points of convergence of the sum-product algorithm represent the points where the free energy in such a system is minimized. Similarly, it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation.
[edit] Generalized belief propagation (GBP)
Belief propagation algorithms are normally presented as messages update equations on a factor graph, involving messages between variable nodes and their neighboring factor nodes and vice versa. Considering messages between regions in a graph is one way of generalizing the belief propagation algorithm. There are several ways of defining the set of regions in a graph that can exchange messages. One method uses ideas introduced by Kikuchi in the physics literature, and is known as Kikuchi's cluster variation method.
Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields (messages). This generalization leads to a new kind of algorithm called Survey Propagation (SP), which have proved to be very efficient in NP-complete problems like satisfiability and graph coloring.
The cluster variational method and the survey propagation algorithms are two different improvements to belief propagation. The name generalized survey propagation (GSP) is waiting to be assigned to the algorithm that merges both generalizations.
[edit] References
- Frey, Brendan (1998). Graphical Models for Machine Learning and Digital Communication. MIT Press
- David J.C. MacKay (2003). Exact Marginalization in Graphs. In David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, pp. 334–340. Cambridge: Cambridge University Press.
- Mackenzie, Dana (2005). Communication Speed Nears Terminal Velocity New Scientist. 9 July 2005. Issue 2507 (Registration required)
- Yedidia, J.S. and Freeman, W.T. and Weiss, Y. Constructing free-energy approximations and generalized belief propagation algorithms, IEEE Transactions on Information Theory, vol.51(7), pp.2282-2312, July 2005.
- Yedidia, J.S.; Freeman, W.T.; Weiss, Y., Understanding Belief Propagation and Its Generalizations, Exploring Artificial Intelligence in the New Millennium, ISBN 1558608117, Chap. 8, pp. 239-236, January 2003 (Science & Technology Books)
- Graphical models, Chapter 8 of Pattern Recognition and Machine Learning by Christopher M. Bishop
- Koch, Volker M. (2007). A Factor Graph Approach to Model-Based Signal Separation --- A tutorial-style dissertation
- ^ Pearl, J. (1982) Reverend Bayes on inference engines: A distributed hierarchical approach. Proceedings American Association of Artificial Intelligence National Conference on AI, Pittsburgh, PA, 133--136.
- ^ Kim, J.H. and Pearl, J., (1983) A computational model for combined causal and diagnostic reasoning in inference systems, Proceedings IJCAI-83, Karlsruhe, Germany, 190--193.
- ^ Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Revised Second Printing) San Francisco, CA: Morgan Kaufmann.
- ^ Y. Weiss. Correctness of Local Probability Propagation in Graphical Models with Loops. Neural Computation, 2000.
- ^ J. Mooij & H. Kappen. Sufficient Conditions for Convergence of the Sum–Product Algorithm. IEEE Transactions on Information Theory 53(12):4422-4437, 2007