Event Correlation

From Wikipedia, the free encyclopedia

Event Correlation is a technique for making sense of a large number of events and pinpointing the few events that are really important in that mass of information. It has been notably used in Telecommunications and Industrial Process Control since the 1970s, in Network Management and Systems Management since the 1980s, in Service Level Management and Event-Based Systems since the 1990s, and in Business Activity Monitoring since the early 2000s.

In Network Management, Systems Management and Service Level Management, Event Correlation usually takes place inside the Management Platform.

In ITIL parlance, Event Correlation is part of Support Management.

Event Correlation is implemented by a piece of software known as the Event Correlator. This tool is automatically fed with events originating from managed elements, monitoring tools or the Trouble Ticket System. Each event captures something special (from the event source standpoint) that happened in the domain of interest to the Event Correlator (e.g., the reboot of a device, a Service-Level Objective that is not met for a given customer, or the CPU of an e-business server that is used at 100% for over 15 minutes).

An event may convey an alarm or report an incident (which explains why Event Correlation used to be called Alarm Correlation), but not necessarily. It may also report that a situation goes back to normal, or simply send some information (e.g., policy P has been updated on device D). The severity of the event is an indication given by the event source to the event destination of the priority that this event should be given while being processed.

Upon receiving events, the Event Correlator discards those that it deems irrelevant. Next, it merges duplicate events and aggregates events that globally tell the same story. Finally, the Event Correlator performs Root Cause Analysis to identify, through dependency analysis, what events can be explained by a single one (the root cause).

At this stage, the Event Correlator is left with at most a handful of events that need to be acted upon. Strictly speaking, Event Correlation ends here. However, by language abuse, the Event Correlators found on the market (e.g., in Network Management) can also include problem-solving capabilities, in order to be able to trigger corrective actions or further investigations automatically. Such functionality is not covered here.

Event correlation can be decomposed into four steps:

Event Filtering
Event Aggregation
Event Masking
Root Cause Analysis

1 Event Filtering
2 Event Aggregation
3 Event Masking
4 Root Cause Analysis
5 Role of Event Correlation in Integrated Management
6 References
7 See Also

[edit] Event Filtering

Event Filtering consists in discarding events that are deemed to be irrelevant by the Event Correlator. For instance, a number of bottom-of-the-range devices are difficult to configure and occasionally send events of no interest to a centralized management platform (e.g., printer P needs A4 paper in tray 1). Another example is the filtering of informational or debugging events by an Event Correlator that is only interested in availability and faults.

[edit] Event Aggregation

Event Aggregation (also known as Event De-duplication) consists in merging duplicates of the same event. Such duplicates may be caused by network instability (e.g., the same event is sent twice by the source because the first instance was not acknowledged sufficiently quickly, but both instances eventually reach the event destination). Another example is temporal aggregation, when the same event is sent over and over again by the source until the problem is solved.

[edit] Event Masking

Event Masking (also known as Topological Masking in Network Management) consists in ignoring events pertaining to systems that are downstream of a failed system. For example, servers that are downstream of a crashed router will fail availability polling.

[edit] Root Cause Analysis

Root Cause Analysis is the last and most complex step of Event Correlation. It consists in analyzing dependencies between events, based on a model of the environment and dependency graphs, to detect whether some events can be explained by others. For example, if database D runs on server S and this server gets durably overloaded (CPU used at 100% for a long time), the event “the SLA for database D is no longer fulfilled” can be explained by the event “Server S is durably overloaded”.

[edit] Role of Event Correlation in Integrated Management

The point of Integrated Management is to integrate the management of networks, systems and IT services in organizations. The Event Correlator plays a key role in this integration, for only there do network, system and service events come together. For instance, this is where the failure of a service can be ascribed to a specific failure in the underlying IT infrastructure.

Most Event Correlators can receive events from Trouble Ticket Systems. However, only some of them are currently able to notify Trouble Ticket Systems when a problem is solved, which partly explains the difficulty for Service Desks to keep updated with the latest news. The integration of management in organizations requires communication between the Event Correlator and the Trouble Ticket System to work both ways.

[edit] References

M. Hasan, B. Sugla and R. Viswanathan, "A Conceptual Framework for Network Management Event Correlation and Filtering Systems", in Proc. 6th IFIP/IEEE International Symposium on Integrated Network Management (IM 1999), Boston, MA, USA, May 1999, pp. 233–246.
H.G. Hegering, S. Abeck and B. Neumair, Integrated Management of Networked Systems, Morgan Kaufmann, 1998.
G. Jakobson and M. Weissman, "Alarm Correlation", IEEE Network, Vol. 7, No. 6, pp. 52–59, November 1993.
S. Kliger, S. Yemini, Y. Yemini, D. Ohsie and S. Stolfo, "A Coding Approach to Event Correlation", in Proc. 4th IEEE/IFIP International Symposium on Integrated Network Management (ISINM 1995), Santa Barbara, CA, USA, May 1995, pp. 266–277.
J.P. Martin-Flatin, G. Jakobson and L. Lewis, "Event Correlation in Integrated Management: Lessons Learned and Outlook”, Journal of Network and Systems Management, Vol. 17, No. 4, December 2007.
M. Sloman (Ed.), "Network and Distributed Systems Management", Addison-Wesley, 1994.

[edit] See Also

Categories: Evaluation methods

See also ebooksgratis.com: no banners, no cookies, totally FREE.

Event Correlation

From Wikipedia, the free encyclopedia

Contents

[edit] Event Filtering

[edit] Event Aggregation

[edit] Event Masking

[edit] Root Cause Analysis

[edit] Role of Event Correlation in Integrated Management

[edit] References

[edit] See Also

Views

Navigation

Interaction

Search