|
With advances in computer technology and sensor technology, various applications employ automated real-time, rule-based systems in which application-critical data are collected from various sensors and are processed and correlated based on predefined rules for identifying problems, diagnosing their root causes, and taking corrective action. Such applications include those used for computer availability and performance management, intrusion detection, and other surveillance tasks. In this paper, we focus on the off-line data analysis for constructing and maintaining correlation rules, and we apply our techniques to systems management tasks.
Figure 1 illustrates the event management task for a complex computer system and our vision of how it can be improved. The area above the dashed line depicts a typical on-line monitoring system such as Tivoli's TEC1 and CA's Unicenter.2 A raw event is generated when the state of a device changes (e.g., the router's interface goes down) or an exceptional event occurs (e.g., CPU utilization goes above 90 percent). Raw events generated by various devices and sensors flow into the event management system for automated operation (e.g., Reference 3), in which raw events are parsed and stored. Then a correlation engine uses correlation rules to interpret these events. Some events are filtered, others are coalesced. Some of them trigger notification mechanisms, such as alarms and e-mails, and some trigger automatic recovery actions. Correlation rules are structured as if-then statements. An example of a correlation rule is the following:
-
If a network card failure event is followed by an interface failure event occurring on a router, within a five-minute window, then send an e-mail to the network support supervisor.
The condition, or if part, describes a situation in which actions are to be taken. The action, or then part, details what is to be done if the conditions of the if part are satisfied. In our example, the action to be taken is sending an e-mail for notification.
Figure 1
The if part of a rule is usually difficult to construct, because we must know the situation to be addressed before an action can be identified. Two broad approaches are used to specify the condition part of rules. The first approach is based on models, such as exploiting topology information to make inferences about connectivity (see, for example, Reference 4). Even so, there remains a broad range of problems that cannot be modeled easily. The second approach is a knowledge-based approach, in which human experts are consulted for constructing correlation rules. For example, Thoenen et al.5 developed an event management and design methodology that has been widely used by IGS (IBM Global Services) consultants over the past three years. The core of this methodology is a graphic representation called Event Relationship Network (ERN) that describes how events are correlated. For example, an ERN graph can be used to represent the if part of our previous example, in which network card failure and interface failure are drawn as two nodes, and a link between these two events indicates that these two events are correlated. More details are found in the section ERN constructor, later in this paper. Clearly, ERNs provide a means for an expert to design correlation rules graphically at a higher level, independent of a specific correlation engine. In current practice ERNs are constructed manually and may not always be complete or correct.
We describe in this paper an alternative approachdata-driven rule constructionin which we apply data mining in order to identify patterns used in constructing correlation rules. The lower part of Figure 1 illustrates the three components involved: Event Browser, Event Miner, and Event Correlation Constructor. Event Browser6 provides an interactive environment for a user to visualize and explore large volumes of event data. Event Miner7 employs data mining techniques to automatically search for patterns. Event Correlation Constructor takes as inputs existing ERNs, if any, as well as the results of Event Miner, and produces annotated ERNs that will be further presented to domain experts to review and construct correlation rules. Further, these tools are integrated to provide (A) seamless operation and early insights through visualization, (B) pattern discovery through event mining, and (C) assistance in knowledge representation of ERNs. Preliminary results have been reported previously. References 6 and 7 address the system aspect of Event Browser and Event Miner. Reference 8 discusses our early results of pattern discovery. References 9 and 10 describe several case studies. This paper provides a more systematic treatment of the problems we address and the algorithms employed. We also discuss how our data mining approach can be used together with traditional knowledge-based approaches.
Specifically, we discuss pattern discovery from large data sets of historical events. We define several types of patterns that are pertinent to systems management and propose algorithms for efficiently discovering those patterns. We note here that there are some key differences between our work and previous work, especially regarding frequent pattern discovery and frequent association rules11,12 in data mining. In applying data mining to systems management tasks, we face some fairly unique challenges. For example, severe operation problems are of great interest in this domain. However, they do not occur frequently enough in well-maintained production environments. Hence the popular frequent itemset mining requires different types of patterns that also address important but less frequently occurring phenomena. Furthermore, event data usually have multiple attributes, usually from six to thirty. There is no apparent way to transform event data to transactional data. Hence this motivates us to propose a type of pattern that explores all possible selections of attributes for grouping and itemizing events.
We also discuss knowledge validation and completion from historical event data. This is important because considerable domain knowledge may be required to correlate event data. For example, ERNs provide a graphic representation of event correlation. To utilize existing correlation knowledge while exploring historical data, we develop a mixed system that is capable of validating existing knowledge as well as constructing new knowledge.
The remainder of the paper is organized as follows. In the next section we describe traditional data mining and related work as they pertain to mining events. Then we provide a motivating example and discuss the unique aspects of event mining. In the section that follows, we focus on the three main data patterns and the algorithms used for their discovery. In the next section we present a unified framework to handle multiple attributes. In the section ERN Constructor we develop our method for validating correlation knowledge. Our conclusions are contained in the last section.
Overview of data mining
In this section we provide a brief overview of data mining as it pertains to the analysis of event data. We begin by describing the market-basket analysis, the context in which data mining was first proposed. Then, we discuss efficiency considerations, a topic of particular importance given the large size of event histories.
Market-basket analysis. Market-basket analysis13 originates from analyzing data from supermarkets, in which each supermarket customer has a basket of purchased goods. The main goal is to find association rules, according to which purchasing a set of items indicates that another set of items is likely to be also purchased. For example, as one early study found, when diapers are purchased, beer is frequently purchased as well. Association rules indicate a one-way dependency. For example, purchasers of beer are, in general, not particularly inclined to buy diapers.
We introduce some notations. Let I be the set of items that can be purchased. Thus, each market basket contains a subset of these items. We use T = {Ti}Ni=1, where the i th transaction Ti I, to denote a set of N market baskets.
A key data mining problem is to find sets of items, typically referred to as itemsets or patterns, with occurrences above a predefined threshold called minimum support (minsup). A second and closely related problem is prediction, in which we are looking for patterns that have a high probability of predicting that a given item will be in the same basket. The metric used is confidence, and it is expressed as a conditional probability.
Generic level-wise search algorithm. A naive approach of frequent itemset mining is to exhaustively examine every possible pattern. This is computationally infeasible because the search space is huge, O(nk) to be precise, where n is the number of distinct items and k is the maximal length of an itemset. It is not uncommon that n can be 1000 or more. Thus, this brute-force search is computationally intractable even for modest k values.
Fortunately, the search for frequent patterns can be made more efficient. Doing so rests on the following observation: The support for a pattern E can be no greater than the support for its subset. Put differently, if one subset of E is not frequent, then E cannot be frequent. This means that if we find a pattern with low support, there is no need to consider any pattern that contains that pattern. This property, referred to as downward closure property14 or anti-monotonicity,15 guarantees that we will not miss any frequent pattern while eliminating search space based on the current level. A level-wise search algorithm called apriori 11,13 was developed to efficiently discover frequent patterns from large market-basket data. Clearly, such a level-wise algorithm can be generalized to discover any type of pattern satisfying the downward closure property. Algorithm 1 shows a generic level-wise algorithm.
Algorithm 1: Generic level-wise algorithm
Input: Transaction data T
Output: All patterns
-
Find all qualified patterns of size 1: Q1; i = 2.
-
Ci = candidate patterns at level i based on qualified patterns at level i
- 1.
-
Scan data to obtain the count of each candidate pattern in Ci.
-
Find qualified patterns at level i. Qi = {c|c
Ci and c is qualified}.
-
i = i + 1; go to (2) until there is no more qualified pattern.
A typical level-wise algorithm has four steps. The initialization step finds qualified patterns with the smallest size. This is often done by exhaustive search. Then, the candidate patterns (Ci ) are constructed by using qualified patterns in the previous level. This can be done through joining patterns in Ci1 followed by a pruning operation.13 Next, data are scanned to count instances of candidate patterns. Some care is needed here to efficiently count candidate patterns. See Reference 13 for details. Last, the qualified patterns (Qi ) are found by checking the qualification condition. For frequent patterns, we need only to check whether the count of a candidate is above minsup.
We note that the downward closure property holds for some patterns and not for others. In particular, downward closure does not hold for the confidence of association rules and neither does the 2 test. Also, we note that although we focus on level-wise search algorithms in this paper, much work has been developed to improve the algorithm by exploring different search strategies (see References 1618 and references therein). All this work is built on the downward closure property of frequent itemsets.
Related work. Our approach makes use of data mining. Data mining is a mixture of statistical, machine learning, and data management techniques that provides a way to mine categorical data so as to find interesting combinations (e.g., the condition cold start trap is often preceded by the condition CPU threshold violated). Considerable work has been done in mining transactional data (e.g., supermarket purchases), much of which is based on References 11 and 13. For event data, time plays an important role. Follow-on research has pursued two directions that address this requirement. The first, sequential data mining (see for example Reference 19), takes into account the sequences of events rather than just their occurrence. The second, temporal data mining (e.g., Reference 12), considers the time between event occurrences. Data mining has been applied to numerous domains. Reference 20 discusses predictive rules for capital markets. Reference 21 describes approaches to finding patterns in Web accesses. Reference 22 discusses prediction of defects in disk drives. Reference 12 addresses sequential mining in the context of telecommunications events. The last one, although closely related to our interests, uses event data to discover temporal associations, not to identify characteristic patterns and their interpretation. Thus, although much foundational work has been done in data mining and some consideration has been given to mining event data, no one has studied the specific patterns that arise in enterprise event management.
Event mining
Figure 2A is a scatter plot of event data collected from a corporate intranet over a three-day period. The events consist of SNMP (Simple Network Management Protocol) traps such as threshold violated, connection-closed, port-up, and port-down. The x-axis represents time whereas the y-axis designates the host where the event originated. There are 149 hosts, numbered 1 through 149. A dot at (x, y) means an event occurred on host y at time x. Note that although this plot contains a considerable amount of information, few patterns are easily identifiable.
Figure 2
Figure 2B shows the same data, and the hosts are algorithmically ordered in a way to reveal patterns (the algorithms are described in References 23 and 24). Many of these patterns are used for constructing correlation rules. For example, pattern 1 consists of threshold violated and threshold reset events that occur every 30 seconds. Such a pattern may be indicative of hosts nearing their capacity limits. Pattern 2 has a cloud-like appearance that consists of port-up and port-down events generated as a result of mobile users connecting to, and disconnecting from, hubs. Such patterns are probably of little interest to the operations staff and hence should be filtered out since they represent normal behavior. Pattern 3, consisting of events occurring every day at 2:00 P.M., represents SNMP request and authentication failure events. This is most likely due to an improperly configured monitor. Pattern 4 is a series of link-up and link-down events, the result of a software problem on a group of hubs.
In a well-managed installation, errors are rare. Thus months of data are needed to identify actionable abnormalities. The volume of data can be substantial. For example, several installations we examined routinely collect five million events per week. Given the large volume of data and the different time scales at which patterns may be present, it is difficult to systematically identify patterns by relying only on visual inspection by a human. Clearly, automatic pattern discovery is needed.
For event data, the previously described concept of a market basket does not apply. However, each event has a timestamp and so looking for patterns means looking at events that co-occur within a time range. These ranges may be time windows (either fixed or variable size) or they may be contiguous segments of data that are characterized in some other way. In the data mining literature, this problem is referred to as temporal mining or temporal association.12
It is usually the case that we are looking for patterns related to problem-related situations, which occur infrequently. The frequently occurring patterns in systems management are usually related to normal operation. This requires that patterns be defined differently. In the next section we define three types of patterns pertinent to system management tasks: event burst, periodic pattern, and mutually dependent pattern.
When dealing with event mining, we often need to consider multiple attributes for characterizing membership in itemsets (or patterns). The event type attribute describes the nature of the event. The event origin attribute specifies the source of the event, a combination of the host where the event originated and the process and/or application that generated the event. In addition to type and origin, there is a plethora of other attributes that depend on these two, such as the port associated with a port down event and the threshold value and metric in a threshold violated event. The section Multiattribute frequent pattern mining, later, develops a framework and an efficient mining algorithm for systematically exploring patterns with various membership definitions.
When there exists rich domain knowledge, and especially correlation knowledge associated with systems management event data, the question arises, how do we incorporate this knowledge in mining for historical events? The section ERN constructor presents a method and an implementation that validate existing correlation knowledge and construct new knowledge from historical data.
Patterns and pattern discovery algorithms
This section describes commonly occurring patterns in systems management tasks as illustrated in Figure 2. Specifically, we discuss event bursts (patterns 3 and 4 in Figure 2B), periodic patterns (patterns 1 and 2), and mutually dependent patterns (pattern 3).
Event burst analysis. Event bursts (or event storms) may arise under several circumstances. When a critical element fails in a network that lacks sufficient redundancy (e.g., there is only one name server in the network and it fails), communications are impaired thereby causing numerous cannot reach destination events to be generated in a short time period. Or, when a cascading problem occurs, such as the one caused by a virus or by switching loads after a failure, it may result in additional failures due to heavier load.
Figure 3 provides a means for visual identification of event bursts in our corporate intranet data. The plot in the lower left contains the raw data in the same form as in Figure 2B. Given the coarse time scale of the plot relative to the granularity of event arrivals, there are many cases in which more than one event occupies the same pixel. As a result, it is difficult to discern event rates visually. We could drill down to various sections of the plot to better determine event rates, but this is labor-intensive.
Figure 3
Instead, the upper left plot summarizes the rates of events for a specific window size (as indicated in the lower left). The table in the upper right of Figure 3 summarizes those situations in which large event rates are present. This provides a convenient way to select subsets of the data to study in detail.
Mining event bursts consists of two steps.
-
Finding periods in which event rates are higher than a specified threshold
-
Mining for patterns common to the periods identified in Step 1
For Step 1, we proceed by first intervalizing the data. Then, event rates within each interval are computed. Those intervals in which rates exceed a specified threshold are then identified. In Figure 2B, these intervals are indicated by the vertical lines that lie above the threshold (which is indicated by the horizontal line).
Step 2 uses the intervals identified in Step 1 as the market baskets of events. For example, mining the three intervals with the largest event rates in Figure 3 finds the pattern SNMP request, Authentication Failure. Note that the mining employed here is essentially that performed by Algorithm 1. However, our market baskets are just those intervals that have high event rates.
Partially periodic event patterns. Periodic patterns consist of repeated occurrences of the same event or event set. Our experience has been that such patterns are common in event data, often accounting for one half to two thirds of the events present.
Periodic behaviors are common in networks. Two factors contribute to this phenomenon. The first is the monitoring modelwhen a managed element emits a high-severity event, the management server often initiates periodic monitoring of key resources (e.g., router CPU utilization). The second is the routine scheduling of maintenance tasks, such as rebooting print servers every morning or backing up data every week.
Our experience with analyzing events in computer networks suggests that periodic patterns often lead to actionable insights. Indeed, a periodic pattern indicates a persistent and predictable behavior, valuable in identifying and characterizing the periodicity. In addition, the period itself often provides a signature of the underlying phenomenon, thereby facilitating diagnosis. In either case, patterns with a very low support are often of great interest. For example, we found a one-day periodic pattern due to a periodic port scan. Although this pattern only happens three times in a three-day log, it is a strong indication of a potential intrusion.
Unfortunately, mining such periodic patterns is complicated by several factors.
-
Periodic behaviors are not necessarily persistent. For example, in complex networks, periodic monitoring is initiated when an exception occurs (e.g.,
CPU utilization exceeds a threshold) and stopped once exceptional situation is no longer present. During the monitoring interval or on segment, the monitoring request and its response occur periodically. The off segment consists of a random gap in the periodicity until another exceptional situation initiates periodic monitoring. This makes it difficult to apply well-established techniques such as the fast Fourier transforms.
-
There may be phase shifts and variations in the period due to network delays, lack of clock synchronization, and rounding errors.
-
Period lengths are not known in advance. This means that either an exhaustive search is required or there must be a way to infer the periods. Further, periods may span a wide range, from milliseconds to days.
-
The number of occurrences of a periodic pattern typically depends on the period. For example, a pattern with a period of one day has, at most, seven occurrences in a week, whereas one with a one-minute period may have as many as 1440 occurrences in a day. Thus, mining patterns with longer periods requires adjusting support levels. In particular, mining patterns with low support greatly increases computational requirements in existing approaches to discovering temporal associations.
In order to capture all the factors above, we employ the concept of partially periodic temporal association. We refer to it as a p-pattern. P-patterns generalize the concept of partial periodicity25 defined by combining it with temporal associations (akin to episodes in Reference 12) and including the concept of time tolerance to account for imperfections in the periodicities.
Figure 4 illustrates the structure of a partially periodic pattern. Such patterns consist of an on segment and an off segment. During the on segment, events are periodic with a period of 3. No periodic event is present during the off segment. Spurious events (or noise) may arise during both on segments and off segments.
Figure 4
Pattern 1 in Figure 2B is an example of a partial periodicity. These partial periodicities contain two types of events: threshold violated and threshold reset. Here the periodicities occur approximately every 30 seconds, although some are closer to 28 seconds and others are near 33 seconds.
Algorithm. Mining for p-patterns consists of two steps: (1) finding period lengths for each event type; and (2) finding temporal associations. Although a variation of level-wise search can be employed to address the second subtask, the first subtask has not been addressed (to the best of our knowledge). Our approach to finding the periods of p-patterns is to compute event interarrival times and then test if interarrival counts exceed what would have been expected by chance. Note that a simple threshold test is not sufficient here, since small interarrival times are much more common than longer ones and hence the threshold must be adjusted by the size of the interarrival time. We address this by using a Poisson distribution as our null hypothesis for the count of events at specified interarrival times. A 2 test is used to assess statistical significance. Next, we mine for the patterns at each statistically significant interarrival time. This is done by level-wise search on each interval that comprises each period. The details can be found in Reference 26.
Results. Table 1 displays the results of mining p-patterns in the corporate intranet data. The left-most column indicates the number of events in the pattern (i.e., size of the itemset). Note that patterns range in size from 1 event to 13 events. Column two specifies the number of candidate patterns searched for each pattern size. Patterns are distinguished by their host, event type, and period. Column three lists the number of distinct p-patterns. The last two columns specify the range of periods and the number of occurrences of patterns listed in each row. Note that the table includes each pattern only once (i.e., subpatterns are not listed).
|
|
Table 1 Result of mining p-patterns
|
| |
| |
| |
| |
| |
Itemset Size
|
Candidate Size
|
Distinct p-patterns
|
Min:Max Periods
|
Min:Max Count
|
|
|
1
| | |
100
| | |
28
| |
0:1-day
|
6:680
|
|
2
| | |
307
| | |
22
| |
0:300
|
3:689
|
|
3
| | |
938
| | |
5
| |
0:30
|
3:8
|
|
4
| | |
1917
| | |
1
| |
4
|
3
|
|
5
| | |
3010
| | |
5
| |
4
|
3
|
|
6
| | |
3525
| | |
3
| |
4
|
3
|
|
7
| | |
3104
| | |
0
| |
|
|
|
8
| | |
2057
| | |
2
| |
4:1-day
|
3
|
|
9
| | |
1017
| | |
0
| |
|
|
|
10
| | |
366
| | |
1
| |
1 day
|
5
|
|
11
| | |
91
| | |
2
| |
1 day
|
5
|
|
12
| | |
14
| | |
1
| |
20
|
20
|
|
13
| | |
1
| | |
1
| |
10
|
21
|
|
Reprinted with permission from S. Ma and J. L. Hellerstein, Mining Partially Periodic Event Patterns with Unknown Periods, Table 3, Proceedings of the 2001 International Conference on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001, IEEE, New York (© 2001 IEEE).
Mutually dependent patterns. As we discussed before, in a systems management application, normal behavior dominates; abnormal behavior, such as failures and intrusions, is rare. Thus, it is important to discover infrequent patterns that relate to problem situations. Consider the example in Reference 8 in which the following events are generated on a router: network interface card failure, unreachable destination, and cold start trap. The last event indicates that the router has failed and restarted. If these events commonly occur together, then the first two may provide advance warning of an occurrence of the third. In Figure 2B, pattern 4 is an m-pattern. It consists of a combination of link down and link up events.
One naive approach to mining for infrequent patterns is to discover all frequent patterns (e.g., using the apriori algorithm) with very low minimum support (minsup), and then apply postprocessing to discover significant patterns. This approach, however, is impractical for finding infrequent patterns, because the minimum support in the apriori algorithm must be set very low, which in turn results in long processing times as well as many irrelevant patterns that must be examined. Irrelevant patterns (i.e., patterns occurring simply by chance) are particularly problematic in real applications with a wide disparity in the frequency of items. As an example, some events have a very high probability of occurrence, such as connection closed events issued by network hubs that support the Dynamic Host Configuration Protocol (DHCP). As a result, there are a large number of patterns that have connection closed in them even though this event has little correlation with other events.
With this background, we define an m-pattern to be an itemset for which any subset is significantly mutually dependent on all other subsets. That is, if any subset of an m-pattern is present, the remaining items occur with high probability. This is formalized26 as follows.
Definition 1: A nonempty itemset E is an m-pattern with minimum mutual dependence threshold minp, iff
PD (E1|E2) minp
|
(1)
|
holds for any nonempty subsets E1 and E2 of E, where 0 minp 1.
Note that this definition does not refer to a support level. Thus, unlike frequent associations, m-patterns will be discovered regardless of the frequency of their occurrences. However, we note that it is quite reasonable to consider frequent m-patterns that use both minp and minsup.
Figure 5 compares m-patterns and frequent patterns. Here, a, b, c, and d are events, and the intervalization consists of two time units (as indicated by the dashed lines). Suppose that: (1) the support threshold for a frequent pattern is 40 percent (i.e., a pattern must appear in 40 percent of the intervals to be considered frequent) and (2) the m-pattern co-occurrence threshold is 60 percent. (The latter is much higher because of the semantics of m-pattern.) Observe that the pattern {a, b} is frequent in that this pattern occurs in 50 percent of the intervals. However, there are four cases in which a occurs but b does not. Thus, {a, b} is not an m-pattern. Now consider, {d, c}. This pattern is much less frequent than {a, b} in that {d, c} occurs in only two of the eight intervals, which is below the support threshold. However, whenever c or d occurs the other does as well. Thus, {d, c} is an m-pattern.
Figure 5
M-patterns can be seen as basic semantic units one level above events. Logically, the events in the pattern can be treated as a group. So, from an analysis perspective, it is desirable to coalesce an m-pattern into a single event. This not only reduces the number of events, it also provides the opportunity for higher level semantics (e.g., the m-pattern caused by a router failure).
Algorithm. This section develops an efficient algorithm for discovering all m-patterns. Efficiency is obtained by addressing two issues: (1) how to test (qualify) that an itemset is an m-pattern; (2) how to exploit level-wise search.
For (1), if we use the definition of an m-pattern to test for its presence, then the number of tests we must make is exponential in the size of the pattern. Clearly, this scales poorly. Fortunately, there is a way to simplify matters.
Theorem 1: (Equivalent definition of m-patterns.) An itemset E is an m-pattern as in Definition 2, iff
PD (E {a}|{a}) minp
|
(2)
|
for every item a in E.
Following is the sketch of a proof. The necessary condition is followed by m-pattern definition directly. The sufficient condition can be proved by the fact that
|
P(E1|E2) =
|
P(E1 E2)
P(E2)
|
|
P(E1 E2)
P(a)
|
|
P(E)
P(a)
|
| |
| = |
P(E a|a) minp
|
This equivalent definition provides a means to check for the presence of an m-pattern in only linear time of the length of E.
Second, observe that m-patterns have the downward closure property. This is implied by Definition 1. Since m-patterns are downward closed, a level-wise algorithm (Algorithm 1) can be used. Conceptually, we just need to use the m-pattern definition to qualify a pattern (Theorem 1) in the fourth step of Algorithm 1. Detailed elaboration of the algorithm and the experimental evaluation can be found in Reference 27.
Results. We apply our algorithm for m-pattern discovery to a set of operational data and compare the results to those from mining frequent itemsets. We fix minsup to be 3 so as to eliminate a pattern with only one or two instances, and we vary minp. Our results are reported in Figure 6A, which plots the total number of m-patterns (the solid line) and the number of border m-patterns (the dashed line) against minp. Here, a border pattern refers to a pattern that is not a subset of any other pattern. The x-axis is minp and the y-axis is the number of m-patterns discovered on a log scale. Clearly, minp provides a very effective way to select the strongest patterns in that the number of m-patterns discovered drops dramatically as minp increases. Many of these patterns have very low support levels. For example, we found 59 border m-patterns with length from 2 to 5 in the data set when minp = 0.7. Half of these patterns have support levels below 10.
Figure 6
Figure 6B reports the result of mining for frequent patterns. Here the x-axis is minsup, and the y-axis is the number of patterns found on a log scale. Note that the number of frequent patterns is hugein excess of 1000even when minsup is 20. Examining the frequent patterns closely, we find that most are related to items that occur frequently, not necessarily items that are causally related. This is not surprising as the marginal distribution of items in our data is highly skewed. Indeed, a small set of items accounts for over 50 percent of total events and, consequently, these items tend to appear in many frequent patterns.
Multiattribute frequent pattern mining
In this section, we introduce a special class of patterns, multiple attribute frequent patterns or maf-patterns. The basic concept of maf-patterns follows the apriori algorithm reviewed in the section Generic level-wise search algorithm. However, as previously mentioned, although events usually have multiple attributes, there is no obvious way of defining transactions as well as labeling events as items. In short, maf-patterns are designed in such a way that all possible ways of grouping events to transactions and labeling events to items are explored.
Table 2 shows a set of events collected from a production environment. We made the following observations.
-
Host 23 generated a large number of InterfaceDown events on 8/21/00. Such situations may indicate a problem with that host.
-
When Host 45 generates an InterfaceDown event, Host 16 generates a CiscoLinkUp (failure recovery) event within the same five-minute interval. Thus, a Host 45 InterfaceDown event may provide a way to anticipate the failure of Host 16.
-
The event types NetworkManagerUp and RouterLinkUp tend to be generated from the same host and within the same minute. This means that when a Cisco router recovers a link, it will discover that its midlevel manager is accessible.
-
Host 24 and Host 32 (not shown in Table 2) tend to generate events with same severity in the same day. This suggests a close linkage between these hosts. If this linkage is unexpected, it should be investigated in order to avoid problems wherein one host causes problems with the other host.
|
Rec
|
Date
|
Time
|
Interval
|
EventType
|
Host
|
Severity
|
|
1
| |
08/21/00
|
2:12AM
|
2:10AM
|
TcpCnnctClose
| |
3
| |
harmless
|
|
2
| |
08/21/00
|
2:13AM
|
2:10AM
|
InterfaceDown
| |
45
| |
severe
|
|
3
| |
08/21/00
|
2:14AM
|
2:10AM
|
InterfaceDown
| |
23
| |
severe
|
|
4
| |
08/21/00
|
2:14AM
|
2:10AM
|
InterfaceDown
| |
5
| |
severe
|
|
5
| |
08/21/00
|
2:15AM
|
2:10AM
|
InterfaceDown
| |
24
| |
severe
|
|
6
| |
08/21/00
|
2:16AM
|
2:15AM
|
CiscoLinkUp
| |
16
| |
harmless
|
|
7
| |
08/21/00
|
3:16AM
|
3:15AM
|
NetworkManagerUp
| |
16
| |
harmless
|
|
8
| |
08/21/00
|
3:16AM
|
3:15AM
|
RouterLinkUp
| |
16
| |
harmless
|
|
9
| |
08/21/00
|
3:33AM
|
3:30AM
|
InterfaceDown
| |
45
| |
severe
|
|
10
| |
08/21/00
|
3:34AM
|
3:30AM
|
CiscoLinkUp
| |
16
| |
harmless
|
|
Reprinted with permission from C.-S. Perng, H. Wang, S. Ma, and J. L. Hellerstein, FARM: A Framework for Exploring Mining Spaces with Multiple Attributes, Figure 1, Proceedings of the 2001 International Conference on Data Mining (ICDM'01), San Jose, CA, November 2001, IEEE, New York (© 2001 IEEE).
These patterns have been proven to be very useful in systems management. However, two challenges emerge, as one tries to build a system to discover these patterns. First, we need a language that has sufficient power to express the mining goals that produce the above patterns but is not so verbose that it becomes computationally intractable. We need such a language to define a problem space so that we can explore that space and find all significant patterns. We can see that it is not very difficult to write programs to discover or verify each individual statement. However, it is less clear how to discover all these patterns through a systematic process. Second, can the mining algorithm deal efficiently with many attributes? As the field of frequent itemset mining is mostly based on the apriori property,13,11 what are the similar properties in this new problem that we can utilize to speed up the mining process?
The maf-patterns go beyond fixed attribute mining to mining directly from multiattribute data. We are given data D with attributes A = {A1,
, Ak}. Thus, each record in D is a k tuple. For a given pattern, a subset of these attributes is used to define how transactions are grouped and another (disjoint) subset of attributes determines the items. The former are called the grouping attributes, and the latter are the itemizing attributes.
We begin with an example based on Table 2. Here, k = 6. For pattern 3, the grouping attributes are Host and Time; the itemizing attribute is EventType. The pattern has length two, which means that a pattern instance has two records. The items specified by these records are determined by the value of the EventType attribute. That is, one record must have EventType = NetworkManagerUp and the other has EventType = RouterLinkUp. Further, these records must have the same value for their Host and Time attributes. Records 7 and 8 form an instance of pattern 3 with Host = 16 and Time = 3:16 A.M. Note that items may be formed from multiple attributes. For example, pattern 2 has the itemizing attributes Host and EventType.
We use the term mining camp to provide the context in which patterns are discovered. The context includes pattern length (as in existing approaches), grouping attributes, and itemizing attributes. For example, pattern 3 has the mining camp (2, {Host, Time}, {EventType}). Or, more formally, a mining camp is a triple (n, G, S) where n is the number of records in a pattern, G is a set of grouping attributes, and S is the set of itemizing attributes.
Next, we formalize the notion of a pattern. There are several parts to this. First, note that two records occur in the same grouping if their G attributes have the same value. Let r D. We use the notation G(r) to indicate the values of r that correspond to the attributes of G. Given a set of attributes G, two records r1 and r2 are G-equivalent if and only if G(r1) = G(r2).
In Table 2, records 7 and 8 are G-equivalent, where G = {Host, Time}.
In maf-patterns, items are determined by the combinations of values of the attributes of S. Consider pattern 2 for which we require one record with EventType = InterfaceDown, Host = 45 and a second for which EventType = CiscoLinkUp, Host = 16. Thus, (InterfaceDown, 45) is one component (or item) of the pattern and (CiscoLinkUp, 16) is the other component. More formally, consider a mining camp (n, G, S) where S = {S1,
, Sm} A pattern component (also called a pattern item) is a sequence of attribute values sv = <s1,
, sm> where si Si for 1 i m. p = {sv1,
, svn} is a pattern of length n for this mining camp if each svi is a pattern component for S.
An instance of a pattern is a set of records that are in the same grouping and whose itemizing attributes match those in the pattern. Let p = {sv1,
, svn} be a pattern in mining camp (n, G, S) and let D be a set of records. An instance of pattern p is a set of n records R = {r1,
, rn} such that ri D and S(ri ) = svi for 1 i n, and ri and rj are G-equivalent for all ri, rj R.
Now we can define the concept of support in the maf-pattern mining framework: given a mining camp (n, G, S) and a set of records D that can be partitioned into G-equivalent classes GEC1,
, GECw, the support of a pattern p is defined as |GEC1|p +
, + |GECw|p where |GECi|p is the number of disjoint instances of p in GECi for 1 i w.
We now have in place all of the definitions necessary to discuss mining in the framework. First, note that if G and S are fixed, then we have the traditional fixed attribute data mining problem. Here, downward closure of the pattern length is used to look for those patterns in (n + 1, G, S) for which there is sufficient support in (n, G, S).
In maf-pattern mining, G and S need not be fixed. In essence, a separate search is done for each combination of G, S. This scales poorly. In particular, the number of permitted combinations of G and S is 3k 2k, where k is the number of attributes (which follows from observing that Ai may be in G, S, or neither and eliminating the 2k cases for which S = ).
We propose the following ways for connecting mining camps. Given a mining camp c = (n, G, S) and an attribute Ai G S then
-
(n + 1, G, S) is the type-1 successor of c.
-
(n, G
{Ai}, S) is a type-2 successor of c.
-
(n, G, S
{Ai}) is a type-3 successor of c.
Figure 7 depicts the predecessor/successor relationships. The root precedes all other mining camps. (In this case, it is not a real camp since S = .) The level of mining camp (n, G, S) is defined as n + |G| + |S|. Since n is at least 1 and S is nonempty, a minable mining camp has level no less than 2. We structure the mining camps so that the successor relationships only exist between mining camps at different levels. This imposes a partial order. Figure 7 is an example of a mining space. More formally, a mining space MS(c) is a partially ordered set (poset) of mining camps containing c and all of its successors.
Figure 7
To make the notation more readable, we use MS(n, G, S) to denote MS((n, G, S)).
Algorithm. This section shows that several types of downward closure can be present in the maf-pattern mining framework. Exploiting these properties provides considerable benefit in terms of efficiency. We begin by presenting the main theorem of the framework.
Theorem 2: Given a mining camp c = (n, G, S) such that the support of a pattern p = {sv1,
, svn} is less than .
-
For any type-1 successor of c, any pattern that is a superset of p has support less than
.
-
The support of p in any of type-2 successor of c is less than
.
-
The support of pattern p' = {sv'1,
, sv'n} of any type-3 successor of c is less than
if svi sv'i for all 1 i n.
Downward closure properties are the foundation of our algorithm as they are in traditional (fixed attribute) mining for frequent itemsets. The more downward properties the mining algorithm uses, the greater the efficiencies that can be realized in mining.
Applications. The framework of maf-pattern mining provides twofold benefits. First, it significantly improves the performance of frequent itemset mining on data with multiple attributes. The problem of mining data with multiple attributes is hard because the number of possible mining camps grows exponentially with the number of attributes. The downward closure properties covered in this section can greatly reduce the computation time. Second and more important, maf-pattern is a notion that unifies frequent itemset mining and aggregation queries. For those mining camps with n = 1, the results are aggregations. Here are some examples:
-
The mining camp (1,
, {Host}) produces a list of hosts that generate a large number of events.
-
The mining camp (1, {Interval}, {EventType}) finds all event types that tend to be bursty.
-
The mining camp (1, {Interval}) outputs the busiest moments in a day.
When n > 1, the results are frequent itemset mining. Here we show an example that is particularly interesting in systems management. The mining camp (n, {Interval}, {Host}) finds the groups of hosts that tend to emit events at the same time. Furthermore, (n, {Interval, EventType}, {Host}) finds the groups of hosts that tend to encounter the same problems at the same time, and implies there is some sort of correlation among hosts in the same group.
The expressive power of maf-patterns can even go beyond the above examples by defining more derived attributes. For example, Interval is a derived attribute in Table 2. Intervals of other lengths can also be used. Network topologies can also be used to add attributes like Network Segment, Subnet, and Functionality. See Reference 28 for a more detailed discussion.
Event Relationship Network constructor
The Event Relationship Network is becoming a popular graphical representation of event correlation. In this section, we first introduce the ERN and the architecture of Event Correlation Constructor (ECC) and describe how domain experts can work with it. Next we present a statistical interpretation of the graphical model. Last, we discuss the application of Event Correlation Constructor in a case study.
Our approach to describing correlation logic uses the conceptual framework of ERN. An ERN is a directed acyclic graph. Nodes are events and are labeled with the role of the event within the case. Arcs from one event to the next indicate that the latter is associated with the former.
A simple event relationship network for a Cisco router is shown in Figure 8. The Cisco monitoring agent emits minor and major alarms for power supply device when the device is encountering an abnormal situation. As the status backs to normal, the monitoring agent emits AlarmOff events.
Figure 8
We now introduce a key concept: event roles. An event plays a primary role (is a primary event) if it provides an immediate, often unambiguous, indication as to the corrective action to take. For example, if a warning trap is the first event in the correlation case, then it is a primary event. Proactive management uses the receipt of a primary event to trigger a first level of response. Referring to Figure 8 the role of the chassisMinorAlarmPS1 and chassisMinorAlarmPS2 event is primary within the context of this example correlation case.
An event plays a secondary role (is a secondary event) if it is always extraneous in terms of selecting the corrective action in an exceptional situation. Although secondary events do not affect the choice of correct action, they may invoke actions of their own.
If events were always either primary or secondary, then correlation would be much less complex. However, in a large number of cases, the role of an event depends on context within the correlation case. Events that may be either primary or secondary are called primary/secondary events. Within our example correlation case two events act in the role of primary/secondary, the chassisMinorAlarmOff and chassisMajorAlarmOff events.
ERNs play a central role in building event correlation rules. Event correlation rules in most correlation enginesfor example, Tivoli event consoleare in the form of Event-Condition-Action. The structure of an ERN and the event roles of events can be translated to the event and condition part of correlation rules. With an action template provided by users, usually a means to notify system administrators, each ERN can be translated to a set of correlation rules.
ECC usage scenarios. The ECC is capable of working in three modes.
-
ERN validation: In situations where there are existing ERNs, either constructed previously or from other similar production environments, the ECC can validate the existing ERNs against historical event log from the environment.
-
ERN completion: For a given ERN, it identifies incorrect links. This completion process is done in an iterative manner. In each iteration, all event types correlated to any event types in current ERNs are attached with corresponding links. The process proceeds until no more event type can be added.
-
ERN construction: In situations where no existing ERN can be used as the starter set, the ECC is responsible for generating ERNs for subject matter experts to review. ERN construction can be treated as a special case of ERN completion where no ERN is available.
Figure 9 shows how domain experts and ECC can work together in an iterative fashion to produce high quality ERNs. There are two general cases. When the ERN Starter Set exists, then the ERN validation and completion modules are used; otherwise, the ERN construction module is used. The resulting ERNs are then reviewed by domain experts and the process repeated until the ERNs obtained are satisfactory. Then, the ERNs are translated into correlation rules for real-time monitoring (Figure 1).
Figure 9
A statistical view of ERNs. Now we need to establish a notion of validity of ERNs. We propose here a quantitative method that combines event counting and statistical testing to justify whether an ERN is valid based on the event history.
Given a predefined window length w, for each link (A, B) we compute the following statistics ConfAB = <NA, PB | A, AB>.
-
NA: The total number of occurrences of event type A. NA indicates whether the event type A as well as the link are worth including in an
ERN. In a sense, NA represents the possible cost of applying incomplete ERN. The cost of processing these redundant trouble tickets caused by missing link (A, B) is proportional to NA. So for large NA, it is suggested to include the link if other statistics also indicate high correlation. For small NA, the cost is a judgment call for the domain experts.
-
PB | A: The conditional probability that an occurrence of event type A is followed by an occurrence of event type B within time no later than w. This is defined as:
number of windows containing both A and B
the number of windows containing A
|
-
AB: The dependence test score of (A, B) coupling, which indicates the deviation of A and B's distribution from random distribution. High score indicates it is likely that the correlation of the two events is not due to randomness.
The dependence test score is defined through the following statistics. The probability of observing an event A in a window is PA = NA/T where T is the time covered in the log. The expected probability of finding both event A and event B in a window with event A occurring before event B is E(PAB) = PA × PB /2. The actual probability of finding both event A and event B in a window with event A occurring before event B is PB | A = NAB /2T where NAB is the number of (A, B) event pairs. The variance of co-occurrences of event A and event B is defined as VARAB = (PAB(1 PAB))/T. The dependence test score is defined as AB = (PB | A E(PAB))2/VARAB.
Thresholds of link confidence are also in the form of a triple <Nt , Pt , t> such that a link (A, B) is valid if NA Nt , PAB Pt and AB t. Note that it is possible that both links (A, B) and (B, A) are valid. In such cases, we suggest the direction of link (A, B) should be from A to B if PB | A PA | B, otherwise, the direction should be from B to A.
Application. The ERN in Figure 10, which was created by an IBM consultant team for a large financial services company, is annotated with validation statistics. The count, NA, appears as a label adjacent to each node. For each link (A, B), (PB | A , AB) and (PA | B , BA), in this order, appear as labels on the link.
Figure 10
When domain experts drew the ERN shown in Figure 8, they interpreted MinorAlarms and MajorAlarms as different stages of the same underlying problem. Correspondingly, AlarmOff events indicate the problem has been cleared. Also, Cisco_Cold_Start, as the event name implies, should clear all alarms.
The statistics shown in Figure 10 contain some surprising results. First, while the number of major alarms in the event log is considerable (760 for power supply 1 and 16 for power supply 2), minor alarms never occur prior to major alarms, or more accurately, minor alarms never occur. Second, the Cisco_Cold_Start event has no correlation with any of the alarm events. We consulted the original designers of the ERN and found that both cases are due to design errors. In the first case, the minor alarms and major alarms are actually assigned to indicate problems of very different nature instead of different severities. In the second case, the designers already doubted the correlation between the Cisco_Cold_Start event and alarm events, but there was no way to check out the correlation at the time. So they included the event in this ERN because the cost of making this type of error is smaller.
There are undoubtedly some constraints in using event data to validate, complete, and construct ERNs. The most noticeable one is the inability to construct and refute the correlations that are not shown in event logs. However, ECC does provide significant improvement in ERN qualities and reduction of the cost of ERN construction.
Conclusions
Event management is the foundation of systems management. Over the last 15 years, automated operations based on the use of correlation rules for interpreting event patterns has lead to increased operator productivity. Although productivity has improved, a substantial bottleneck remains, namely determining the correlation rules.
In this paper we describe how data mining can be used to identify patterns of events that indicate underlying problems. In particular, we identify several patterns of interest to event management: event bursts, periodicities, and mutual dependencies. We provide interpretations for each, and we show how pattern discovery can be structured to exploit a level-wise search, thereby improving scalability. Scalability is particularly important because, as experience shows, tens of millions of events may need to be analyzed in order to discover actionable patterns.
We also address the issue of related multiple attributes of events. We develop here a framework that provides a means to systematically and efficiently explore patterns with different membership definitions. Last, we present the Event Correlation Constructor, a tool that can validate existing correlation knowledge and further construct such knowledge.
Acknowledgment
Our thanks to Luanne Burns, Dave Taylor, David Rabenhorst, and Genady Grabarnik for developing the visualization and mining facility prototype used for the studies in this paper. We also thank Herb Lee for stimulating discussion and for his comments on this work.
Accepted for publication April 25, 2002.
Cited references and notes
|