Gossip protocol

From Wikipedia, the free encyclopedia

To comply with Wikipedia's quality standards, this article may need to be rewritten.
Please help improve this article. The discussion page may contain suggestions.

This article or section may contain original research or unverified claims.
Please improve the article by adding references. See the talk page for details. (September 2007)

The factual accuracy of this article is disputed.
Please see the relevant discussion on the talk page. (March 2008)

A gossip protocol is a style of computer-to-computer communication protocol inspired by the form of gossip seen in social networks. Modern distributed systems often use gossip protocols to solve problems that might be difficult to solve in other ways, either because the underlying network has an inconvenient structure, is extremely large, or because gossip solutions are sometimes the most efficient ones available.

The term epidemic protocol is sometimes used as a synonym for a gossip protocol, because gossip spreads information in a manner similar to the spread of a virus in a biological community.

1 Gossip Communication
2 Examples
3 Epidemic Algorithms
4 Biased Gossip
5 See also
6 References

[edit] Gossip Communication

We use the term protocol to describe a structured interchange of messages between agents running on a set of computers in a network, designed to achieve some goal. A gossip protocol is a protocol designed to mimic the way that information spreads when people gossip about some fact.

For example, imagine a group of office workers. Ted comments to Sally that he believes Fred is starting to dye his mustache. Sally tells Jill, while Ted repeats the news to Sam. Notice that as people move about and share the news, the number of individuals who know about the rumor doubles "round by round". Of course this is just an approximation; it assumes that employees run into one-another at a constant rate, and doesn't capture the issue of gossiping twice to the same person (perhaps Ted tries to tell his story to Mark, only to find that Mark already heard it from Jill). However, the approximation is a reasonably good one. In computer systems, we typically implement protocols such as this with some form of random "peer selection": at some wired-in frequency, each machine picks some other machine at random and shares any hot rumors.

The power of gossip is that information spreads so robustly. For example, even if Jill had trouble understanding Sally (perhaps she was whispering), she'll probably run into someone else soon and can always learn the news that way.

Expressing these ideas in more technical terms, a gossip protocol is one that satisfies the following conditions:

The core of the protocol involves periodic, pairwise, inter-process interactions
The information exchanged during these interactions is of (small) bounded size
When agents interact, the state of one or both changes in a way that reflects the state of the other. For example, if A pings B just to measure the round-trip time for messages from A to B and back, it isn’t a gossip interaction.
Reliable communication is not assumed
The frequency of the interactions is low compared to typical message latencies, so that the protocol costs are negligible
There is some form of randomness in the peer selection. Peer selection might occur within the full node set, or might be performed in a smaller set of neighbors.

It is useful to distinguish three prevailing styles of gossip protocol.

Dissemination (rumor-mongering) protocols. These use gossip to spread information; they basically work by flooding agents in the network, but in a manner that produces bounded worst-case loads:

a) Event dissemination protocols use gossip to carry out multicasts. They report events, but the events don’t actually trigger the gossip (since gossip runs periodically). One concern here is the potentially high latency from when the event occurs until it is delivered.

b) Background data dissemination protocols continuously gossip about information associated with the participating nodes. Typically, propagation latency isn’t a concern, perhaps because the information in question changes slowly or there is no significant penalty for acting upon slightly stale data.

Anti-entropy protocols for repairing replicated data, which operate by comparing replicas and reconciling differences.
Protocols that compute aggregates. These compute a network-wide aggregate by sampling information at the nodes in the network and combining the values to arrive at a system-wide value – the largest value for some measurement nodes are making, smallest, etc. The key requirement is that the aggregate must be computable by fixed-size pairwise information exchanges; these typically terminate after a number of rounds of information exchange logarithmic in the system size, by which time an all-to-all information flow pattern will have been established. As a side effect of aggregation, it is possible to solve other kinds of problems using gossip; for example, there are gossip protocols that can arrange the nodes in a gossip overlay into a list sorted by node-id (or some other attribute) in logarithmic time using aggregation-style information exchanges. Similarly, there are gossip algorithms that arrange nodes into a tree, and then compute aggregates such as "sum" or "count" by gossiping in a pattern biased to match the tree structure.

This is a rather inclusive definition. Many protocols that predate the earliest use of the term “gossip” fall within this definition. For example, Internet routing protocols often use gossip-like information exchanges.

It is interesting to realize that a gossip substrate can easily implement a standard routed network: nodes could “gossip” about traditional point-to-point messages, effectively tunneling normal traffic through the gossip layer. Bandwidth permitting, this implies that a gossip system can potentially support any classic protocol or implement any classical distributed service. Nonetheless, when we talk of gossip protocols, we rarely intend such a broadly inclusive interpretation. More typically we have in mind protocols that run in a regular, periodic, relatively lazy, symmetric and decentralized manner; the high degree of symmetry among nodes is particularly characteristic. Thus, to give a simple example, while one could run a 2-phase commit protocol over a gossip substrate, piggybacking the messages on gossip traffic, doing so would be at odds with the spirit, if not the wording, of the definition.

Frequently, the most useful gossip protocols turn out to be those with exponentially rapid convergence towards a state that “emerges” with probability 1.0. For example, a classic distributed computing problem involves building some form of tree whose inner nodes are the nodes in a network, and whose edges represent links between computers (for routing, as a dissemination overlay, etc). Not all tree-building protocols are gossip protocols (for example, spanning tree constructions in which a leader initiates a flood), but gossip offers a good decentralized solution that can be useful in many situations.

The term convergently consistent is sometimes used to describe protocols that achieve exponentially rapid spread of information. Notice that for this purpose, a protocol must propagate any new information to all nodes that will be affected by the information within time logarithmic in the size of the system (the “mixing time” must be logarithmic in system size).

[edit] Examples

Suppose that we want to find the object that most closely matches some search pattern, within a network of unknown size, but where the computers are linked to one-another and where each machine is running a small agent program that implements a gossip protocol.

To start the search, a user would ask the local agent to begin to gossip about the search string. (We're assuming that agents either start with a known list of peers, or retrieve this information from some kind of a shared web site.)
Periodically, at some rate (let's say ten times per second, for simplicity), each agent picks some other agent at random, and gossips with it. Search strings known to A will now also be known to B, and vice versa. In the next "round" of gossip A and B will pick additional random peers, maybe C and D. This round-by-round doubling phenomenon makes the protocol very robust, even if some messages get lost, or some of the selected peers are the same or already know about the search string.
On receipt of a search string for the first time, each agent checks its local machine for matching documents.
Agents also gossip about the best match, to date. Thus, if A gossips with B, after the interaction, A will know of the best matches known to B, and vice versa. Best matches will "spread" through the network.

If the messages might get large (for example, if many searches are active all at the same time), a size limit should be introduced. Also, searches should "age out" of the network.

It should be easy to see that within logarithmic time in the size of the network (the number of agents), any new search string will have reached all agents. Within an additional delay of the same approximate length, every agent will learn where the best match can be found. In particular, the agent that started the search will have found the best match.

For example, in a network with 25,000 machines, we can find the best match after about 30 rounds of gossip: 15 to spread the search string and 15 more to discover the best match. A gossip exchange could occur as often as once every tenth of a second without imposing undue load, hence this form of network search could search a big data center in about 3 seconds.

In this scenario, searches might automatically age out of the network after, say, 10 seconds. By then, the initiator knows the answer and there is no point in further gossip about that search.

Gossip protocols have also been proposed for such tasks as maintaining databases or other kinds of files in consistent states, counting the number of nodes in a network of unknown size, spreading news robustly, organizing nodes according to some structuring policy, building so-called overlay networks, computing aggregates, sorting the nodes in a network, electing leaders, etc.

[edit] Epidemic Algorithms

Gossip protocols can be used to propagate information in a manner rather similar to the way that a viral infection spreads in a biological population. Indeed, the mathematics of epidemics are often used to model the mathematics of gossip communication. The term epidemic algorithm is sometimes employed when describing a software system in which this kind of gossip-based information propagation is employed.

[edit] Biased Gossip

Above, we described a purely random peer-selection scheme for gossip: when agent A decides to run a gossip round, it picks some peer B uniformly and at random within the network as a whole (or launches a message on a random walk that will terminate at a random agent). More commonly, gossip algorithms are designed so that agents interact mostly with nearby agents, and only sometimes with agents that are far away (in terms of network delay). These biased gossip protocols need to ensure a sufficient degree of connectivity to avoid the risk of complete disconnection of one side of a network from the other, but if care is taken, can be faster and more efficient than protocols that are purely random. Moreover, as a purely practical question, it is much easier to maintain lists of peers in ways that might be somewhat biased.

[edit] See also

Gossip protocols are just one of many distributed computing protocols. See also virtual synchrony, distributed state machines, Paxos algorithm, database transactions.

[edit] References

Here are some additional references to recent work from the gossip community. The paper by Demers [3] is considered by most researchers to be the first to have really recognized the power of these protocols and to propose a formal treatment of gossip.

[1] Correctness of a Gossip-based Membership Protocol. André Allavena, Alan Demers and John Hopcroft. Proc. 24th ACM Symposium on the Principle of Distributed Computing (PODC 2005).

[2] Bimodal Multicast. Kenneth P. Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu and Yaron Minsky. ACM Transactions on Computer Systems, Vol. 17, No. 2, pp 41-88, May, 1999.

[3] Epidemic algorithms for replicated database management. Alan Demers, et al. Proc. 6th ACM PODC, Vancouver BC, 1987.

[4] Lightweight probabilistic broadcast. Patrick Eugster, Rashid Guerraoui, S. B. Handurukande, Petr Kouznetsov, Anne-Marie Kermarrec. ACM Transactions on Computer Systems (TOCS) 21:4, Nov 2003.

[5] Kelips: Building an Efficient and Stable P2P DHT Through Increased Memory and Background Overhead. Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, Robbert van Renesse. Proc. 2nd International Workshop on Peer-to-Peer Systems (IPTPS '03)

[6] Systematic Design of P2P Technologies for Distributed Systems. Indranil Gupta, Global Data Management, eds: R. Baldoni, G. Cortese, F. Davide and A. Melpignano, 2006.

[7] Efficient and Adaptive Epidemic-Style Protocols for Reliable and Scalable Multicast. Indranil Gupta, Ayalvadi J. Ganesh, Anne-Marie Kermarrec. IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 7, pp. 593-605, July, 2006.

[8] T-Man: Gossip-based overlay topology management. Márk Jelasity and Ozalp Babaoglu. Engineering Self-Organising Systems: Third International Workshop (ESOA 2005), Springer-Verlag LNCS 3910 (2006).

[9] Gossip-based aggregation in large dynamic networks. Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu. ACM Transactions on Computer Systems, 23(3):219–252, August 2005.

[10] Ordered slicing of very large overlay networks. Márk Jelasity and Anne-Marie Kermarrec. IEEE P2P, 2006.

[11] Proximity-aware superpeer overlay topologies. Gian Paolo Jesi, Alberto Montresor, and Ozalp Babaoglu. Proc SelfMan 06. Spinger-Verlag LNCS 399, Dublin, Ireland, June 2006.

[12] Spatial gossip and resource location protocols. David Kempe, Jon Kleinberg, Alan Demers. Journal of the ACM (JACM) 51: 6 (Nov 2004).

[13] Gossip-Based Computation of Aggregate Information. David Kempe, Alin Dobra, Johannes Gehrke. Proc. 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS). 2003.

[14] Active and Passive Techniques for Group Size Estimation in Large-Scale and Dynamic Distributed Systems. Dionysios Kostoulas , Dimitrios Psaltoulis, Indranil Gupta, Ken Birman, Al Demers. Elsevier Journal of Systems and Software, 2007.

[15] Build One, Get One Free: Leveraging the Coexistence of Multiple P2P Overlay Networks. Balasubramaneyam Maniymaran, Marin Bertier and Anne-Marie Kermarrec. Proc. ICDCS, June 2007.

[16] Peer counting and sampling in overlay networks: random walk methods. Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, Ayalvadi Ganesh. Proc. 25th ACM PODC. Denver, 2006.

[17] Chord on Demand. Alberto Montresor, Márk Jelasity, and Ozalp Babaoglu. Proc. 5th Conference on Peer-to-Peer Computing (P2P), Konstanz, Germany, August 2005.

[18] Introduction to Expander Graphs. Michael Nielsen. http://www.qinfo.org/people/nielsen/blog/archive/notes/expander_graphs.pdf. Technical report, June 2005.

[19] Building low-diameter P2P networks. G. Pandurangan, P. Raghavan, Eli Upfal. In Proceedings of the 42nd Symposium on Foundations of Computer Science (FOCS), 2001.

[20] Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. Robbert van Renesse, Kenneth Birman and Werner Vogels. ACM Transactions on Computer Systems (TOCS) 21:2, May 2003.

[21] Exploiting Semantic Proximity in Peer-to-peer Content Searching. S. Voulgaris, A.-M. Kermarrec, L. Massoulie, M. van Steen. Proc. 10th Int'l Workshop on Future Trends in Distributed Computing Systems (FTDCS 2004), Suzhou, China, May 2004.

Although this textbook is old, many gossip researchers cite it as an authoratve source for information about the mathematical modelling of gossip and epidemic protocols:

The Mathematical Theory of Epidemics. N.J.T. Bailey, 1957. Griffen Press.

Categories: Distributed computing

See also ebooksgratis.com: no banners, no cookies, totally FREE.

Gossip protocol

From Wikipedia, the free encyclopedia

Contents

[edit] Gossip Communication

[edit] Examples

[edit] Epidemic Algorithms

[edit] Biased Gossip

[edit] See also

[edit] References

Views

Navigation

Interaction

Search