Alert Correlation Policy

Introduction

OpsRamp’s Alert Correlation is the process where similar alerts are grouped together to reduce unnecessary noise created by individual alerts. Users can manage inferences rather than addressing individual alerts thereby reducing the noise that users need to sift through multiple alerts.

For example, power outage on a network resource could impact resources dependent on the network resource and each dependent resource will generate similar alerts. Alert Correlation correlates the alerts generated on the dependent resources with the alert generated on network resource.

Benefits of Alert Correlation Policy

By correlating alerts, you gain:

  • Faster Detection and Remediation: Find critical issues quickly, analyze the alert phenomenon and resolve the issues within SLA.

  • Low Alert Noise: Reduce the number of alerts a user has to handle in order to solve an issue.

  • Efficient Control: Customize how to correlate the alerts. Alert Correlation supports correlating alerts in two patterns:

    • Root Cause Analysis: Correlate alerts due to a known cause.

    • Inference: Correlate alerts that share similar alert properties.

Core Concepts

Root Cause Analysis

A problem on a single resource may impact multiple resources in the infrastructure thereby generating multiple alerts, due to which high amount of alert noise is generated. Root cause analysis helps you to proactively determine the root cause of an alert by defining the below parameters:

  • Metrics: Metrics on upstream and downstream resources that may generate alerts. For example, metrics on an upstream resource may generate alerts concerning low memory and high CPU loads, due to these issues, metrics on downstream resources generate memory utilization issues.

  • Time Span: Time duration in which related alerts generate. Alerts generated within the time span are correlated.

Alerts on the downstream resource correlate with alert on the upstream resource to form an RCA.

For example, in a network topology, there are Switch, ESX Host and VLAN dependent on a Router. If the Router goes down, then the dependent resources also go down generating multiple alerts as shown below:

From the above example, alert on Router becomes RCA alert and alerts on downstream resources correlate with the RCA alert as shown below:

Note: RCA correlation applies only when Critical alerts generate from all the metrics specified in the pattern within a certain time span.

Upstream Resource

A host resource on which guest resources depend on to operate. A problem on a host resource impacts the guest resources causing multiple alerts to generate.

Downstream Resource

Guest resources which depend on Host resources to operate. A problem on a guest resource may impact the host resource, but the severity of impact may be low.

For example, a ESX Host going down impacts all VMware resources hosted on the ESX host. Here, ESX Host is the upstream resource and VMware resources are the downstream resources.

Dependent Alerts

Alerts generated on the upstream resources are the Dependent alerts.

Dependency Alerts

Alerts generated on the downstream resources are the Dependency alerts.

Inference

An inference is set of alerts that are correlated by a common cause or share similar alert properties.

Examples of alerts correlated due to a common cause:

  • Power failure in a server closet.

  • Windows servers going down for 10 minutes due to a reboot.

Alerts correlate on the basis of similar alert properties like Alert Subject, Host Name, Alert Metric, Alert Source, IP Address, and Device Type.

Examples of alerts correlated by similar alert properties:

When a VLAN is down, multiple alerts are generated as shown below:

As the IP address is similar, user can correlate on the basis of IP Address. Example: 90% match of IP Address.

Note:

  • An inference is created when more than two alerts correlate.

  • Inference correlation applies only when Critical and Warning alerts generate from alert properties within a certain time span.

Create Alert Correlation Policy

Provide Name, Client and Select Status

  1. Log into OpsRamp.

  2. Click All Clients and from the displayed list select a client.

  3. From the drop-down options, click Setup.

  4. From the left pane, click Alert Management>Alert Correlation. The Alert Correlation Policies page is displayed.

  5. Click Add to create an Alert Correlation Policy. The Create Alert Correlation Policy page is displayed.

  6. Provide Name and configure the details:

    1. If the policy is partner-defined client scope policy:

      • Select Include All Clients if a policy must be applicable to all clients in a partner.

      • Select Include Clients if a policy must be applicable to specific clients in a partner. Click Add Clients and then select the required clients. See Frequently Asked Questions to know more about partner-defined client scope policy and client-defined policy.

    2. If the policy is client-defined policy, click +Add Clients. The Add Include Clients pop-up appears.

    3. From the Excluded Clients list, select the required client and click >> to move it to Included Clients.

    4. Click Add Include Client. The selected client is displayed in Included Clients.

  7. Select the Enabled state from the drop-down menu.

    1. OFF – Alert correlation policy is created but no alert correlation is done.

    2. Observed – An Inference is created with Observed status. An Observed inference is for only information purpose. You can only perform Close action on an Observed inference.

    3. ON – Alert correlation policy is created and alert correlation is performed.

      Note: OpsRamp recommends you to follow the above sequence when creating the alert correlation policies.

  8. Define Filter Resources and Policy Definitions. For detailed explanation of Filter Resources, refer Filter Resources. For detailed explanation of Policy Definitions, refer Policy Definitions.

Filter Resources

Select resources whose alerts will match the policy.

  1. In Filter Criteria section:

    • Select Any to filter the resources that match any of the rules.

    • Select All to filter the resources that match all the rules.

  2. Configure the below details:

    1. Select Native Attributes to filter resources based on the predefined attributes. Example: Host Name, DNS Name, IP Address.

    2. Select Custom Attributes to filter resources based on the custom attributes of a client or device.

    3. Select the attributes, operator, and then provide the value. Note: Click to add additional filter criteria.

Policy Definition

Alert correlation policy supports correlating alerts in three patterns:

Correlate Dependent Alerts on Downstream Resources

Prerequisite

Prior to defining the metrics, ensure to define an impact dependency between resources. The dependency relationship helps to determine an upstream resource and a downstream resource.

Note: Impact dependency between ESX host and VMware resources is automatically created during the discovery of ESX Host. However, for Network, Storage, and Compute resources dependency should be created manually.

Below are the instructions to create Impact Dependency between resources:

  1. Log into OpsRamp.

  2. Click All Clients and from the displayed list select a client.

  3. From the drop-down options, click Infrastructure.

  4. Click on the resource name. Device Details page is displayed.

  5. Click Impacted Resources tab.

  6. Define upstream resources in Select Resources which impact section.

  7. Define downstream resources in Select Resources impacted by section.

Configure Correlation Rules

  1. Select Time Span.

  2. Select metrics on upstream resources and downstream resources and then click Save.

Algorithm-based Correlation

Correlate alerts that share similar alert properties. For example, a user wants to correlate alerts that have similar alert content.

  • If alert subject is identical

  • If Hostname is similar

Below are the instructions to configure Algorithm-based correlation:

  1. Select an alert property and then select matching condition.

    Matching condition

    Description

    Identical

    Correlate alerts which exactly (100%) share same alert property. For example, a user wants to correlate alerts which share the same hostname 172.34.213. Consider the below scenario:

    • Alert 1 triggers from host 172.34.213

    • Alert 2 triggers from host 172.34.213

    In this scenario, both alerts correlate as an Inference.

    Nearly Identical

    Correlate alerts which share almost (>=90%) the same alert property, but not the exact value. For example, a user wants to correlate alerts that contain hostname 172.34.2… Consider the below scenario:

    • Alert 1 triggers from host 172.34.210

    • Alert 2 triggers from host 172.34.215

    In this scenario, both alerts correlate as an Inference since the hostnames nearly match.

    Similar

    Correlate alerts which share similar (>=75%) alert property. For example, a user wants to correlate alerts that contain hostname starting with 172.34…… Consider the below scenario:

    • Alert 1 triggers from host 172.34.675

    • Alert 1 triggers from host 172.34.432

    In this scenario, both alerts correlate as an Inference since the hostnames have a similar match.

    Somewhat

    Correlate alerts which share partial (>=50%) alert property. For example, a user wants to correlate alerts that contain hostname starting with 172. Consider the below scenario:

    • Alert 1 triggers from host 172.45.786

    • Alert 2 triggers from host 172.89.453

    In this scenario, both alerts correlate as an Inference since the hostnames have a partial match.

  2. Enter subject name for primary alert.

  3. Click Save.

Algorithm-based Alert Correlation Policy is created.

Note:

  • Primary alert is the first alert generated by satisfying the conditions defined in the pattern. If subject for primary alert is NOT provided, then OpsRamp considers the subject of the first alert as the Inference subject.

  • Provide a unique subject name for primary alert that helps to identify the alert while escalating an alert through alert escalation policy.

Co-occurrence based Correlation

Co-occurrence based correlation is a method of grouping alerts which often trigger together. OpsRamp forms a cluster (group of co-occurred alerts) using two sources of alert sequences:

  • Learning from existing data (historical alert sequences)

  • User-provided alert sequences

OpsRamp applies machine-learning to learn the existing and user-defined alert sequences and uses the learned pattern to drive the alert correlation. OpsRamp applies topology reinforcement to fine tune the correlation.

For more information about Topology reinforced correlation, click here.

Correlate using Learned Alert Co-occurrence Sequences

This model requires no user input and is based on the historical occurrence of specific alert sequences. OpsRamp applies machine-learning to learn the existing alert sequence and uses the learned pattern to drive alert correlation. The learned models are applied against the incoming alert streams.

Alert sequences with high frequency are grouped into Inference. For example, a learned pattern determines CPU Utilization, CPU Stats, Disk Utilization, and Memory alerts trigger together. In the run time, if alerts from this pattern trigger together, then the alerts form an Inference.

Important! Alerts are correlated only when the time gap between each consecutive alert is less than two minutes.

Notes:

  • The two-minute time period starts when the last alert is triggered.

  • Any alert triggered after the two-minute time period is considered as a Raw alert.

Example

Below are a few sample alerts to explain the correlation process. The below time period is indicated in HH:mm:ss.

  • 10:00:00 AM: Alert A1 (CPU Utilization) triggers

  • 10:01:05 AM: Alert A2 (CPU Stats) triggers and correlates with A1 to form an Inference

  • 10:02:40 AM: Alert A3 (Disk Utilization) triggers and correlates with Inference

  • 10:04:30 AM: Alert A4 (Memory) triggers and correlates with Inference

  • 10:07:01 AM: Alert A5 (Disk Utilization) triggers and gets created as a Raw alert

From the above example, alerts 1, 2, 3, 4 correlate as an Inference as these consecutive alerts are less than two minutes apart. However, alert 5 is considered as a Raw alert as the alert is triggered after more than two minutes when compared to the previous alert.

Notes:

  • OpsRamp considers three months of alert data as the training data.

  • The training runs every week and comes up with multiple alert patterns.

For instructions to create a co-occurrence based correlation, click here.

Correlate using User-defined Alert Co-occurrence Sequences

When the existing alert data is limited, user can augment the learning by providing known alert sequences. During the inference time, the system correlates alerts using the known (user-defined) alert sequences.

Prerequisite

Create a CSV file and add all possible alert sequences to the file. For instructions to create a CSV file for alert correlation, click here.

Topology Reinforced Alert Correlation

Topology plays a vital role in alert correlation. In an IT environment, resources are connected to each other. When dependent resources go down, the downstream resources are also affected and multiple alerts are triggered. In such cases, you may find challenging to analyze the exact root cause of the issue.

OpsRamp provides a solution to fine tune the alert correlation using Topology:

Co-occurrence based correlation is a method of grouping alerts which often trigger together. In the co-occurrence based correlation model, OpsRamp applies machine-learning to correlate alerts on the basis of three factors:

  • Alerts on resources from the same site and region. OpsRamp considers siteName and regionName of resources in a cluster and creates an Inference only if the resources belong to the same site or region. Correlation of alerts from the same site and region happens automatically during the inference time.

  • Alerts of connected resources. Co-occurrence model learns the time sequence pattern of metrics. Combining the learned metrics sequence with topology helps you analyze if the given alerts are caused by the same root cause. During inference time, besides using machine-learning to correlate the alerts, OpsRamp checks for the resources topology and correlates alerts of connected resources. The final inference is created on the basis of metrics co-occurred and resources connected.

Note: To explicitly correlate the alerts of connected resources, you need to select the option Correlate alerts of connected resources.

Create Co-occurrence based Correlation

Perform the following steps to create co-occurrence based correlation:

  1. To learn the co-occurred sequences from existing data: Select Co-occurrence based correlation.

  2. To use the user-defined alert sequences:

    1. Click Browse to provide a training file for the policy.

    2. Select the CSV file from a local folder. The CSV file is uploaded. Notes:

      • One client can upload only one CSV file.

      • Updating the file in one policy automatically updates in all policies.

      • If you define multiple correlation policies, each policy uses the same training file.

  3. To correlate alerts using sequences learned from existing data: Enable Continuous Learning toggle button. As a result, the system continuously learns and picks up new patterns from the existing data.. When the Continuous learning option is turned ON, OpsRamp retrains the model every week. Notes:

    • Enabling the Continuous Learning option in one policy automatically updates in all co-occurrence based correlation policies.

    • If a training file is updated or Continuous learning is enabled, then the ML model is re-trained immediately.

  4. To correlate alerts of connected resources: Select Correlate alerts of connected resources.

  5. Click Save. The inference model policy is created and appears on Inference Models page. The policy initiates the Training process.

Legends

Legend

Description

The status bar displays how much training has been completed.

  • Accuracy of the trained co-occurrence model is below 70%. If this happens, the policy is temporarily disabled, until the model’s accuracy moves above 70% after the next training. The common reason for low accuracy is that the alert data used for training lacks the sequence pattern.

  • ML indicates a policy is based on Machine-Learning algorithm. This further helps to differentiate between a machine-learning model and a user-defined model.

Accuracy of trained co-occurrence model is above 70%, and the policy is used for alert correlation.

Edit Alert Correlation Policy

Update the details of an Alert Correlation Policy.

  1. Click All Clients and from the displayed list select a client.

  2. From the drop-down options, click Setup.

  3. From the left pane, click Alert Management menu and then click Alert Correlation.

  4. Click on the alert correlation policy name and then click Edit.

  5. Configure the details of the model and then click Save.

Decorrelate an Alert

Decorrelate an alert if the alert should be treated as an individual alert. The decorrelated alerts then appear on the Alerts Browser page as an individual alert.

Note:

  • To decorrelate an alert, a user requires Alerts Manage permission. Click here for details on Alerts permission values.

  • A correlated alert in Critical or Warning state automatically gets decorrelated when an RCA alert is healed.

  1. Click Alerts.

  2. Click on the alert ID. Alert Details page appears.

  3. Click Correlated Alerts tab, select the alert and then click Decorrelate. A comment about the decorrelation appears in the Comments section of the RCA or Inference and also in the decorrelated alert.

OpsRamp processes the incoming alerts and subjects the alerts to correlation based on patterns specified in the policies.

The correlated alerts appear on the Alerts Browser page as shown below:

Legends

Below table depicts the icon representation for the following alert types:

Legend

Description

Inference

RCA

Correlated Alert

Individual Alert

Alert Status

Below table depicts the color representation for the following alert statuses:

Alert Status

Color Representation

Critical

Red Example:

Warning

Orange Example:

OK

Green Example:

Observed

Blue Example:

Info

Blue Example:

Note: Observed and Info alert status are represented in Blue color.

Alert Status Transitions for Inference

Below are the scenarios that explain the status transitions for Inference.

Scenario 1: One or more alerts are in Critical.

In an Inference, even if one or more alerts are in Critical state, the Inference appears as Critical.

Scenario 2: One or more alerts are in Warning.

In an Inference, even if one alert is in Warning state, and remaining alerts are in OK state, then the Inference appears as Warning.

Scenario 3: All alerts are in OK state.

In an Inference, if all alerts are in OK state, then the Inference appears as OK.

Alert Actions

An alert action can be performed on the entire Inference, but NOT to a single correlated alert. OpsRamp supports the below alert actions:

Alert Action

Description

Acknowledge

Take ownership of an Inference to further work on it. The acknowledgment is notified to recipients of the Inference. This also halts further alert escalations.

Create Incident

Create an incident from the Inference and assign the incident to users and set the priority.

Attach Incident

Attach an existing incident to the Inference without updating the alert content

Attach and Update Incident

Attach an existing incident and also update the incident with the alert content. This action is generally performed to update same incident with related alerts.

Suppress

Temporarily suppress an Inference. Once an Inference is suppressed, any new Inference of the same type appears as a new Inference and NOT as a duplicate.

Permanent Suppress

Permanently suppress an Inference. Once an Inference is permanently suppressed, no new Inferences of the same type will appear on the Alerts Browser.

Close

Close an inference. User can close an inference only when it is healed.

Below are instructions to configure an alert action on an Inference:

  1. Click Alerts. Alerts Browser page appears.

  2. Click on the Inference. Alert details page appears.

  3. Click expand icon near Create Incident. Alert actions appear.

  4. Click on the required action and then configure the action details.

Alert Action Scenarios

Scenario 1: Reopen a closed Inference

Below is a scenario explaining alerts generation within a certain time span and how they are correlated to form an Inference.

  • 10:00 AM – Alert A1 triggers

  • 10:01 AM – Alert A2 triggers

  • 10:01 AM – Alert A2 correlates with A1 and forms an Inference

  • 10:02 AM – Alerts A1 and A2 are healed

  • 10:02 AM – Since A1 and A2 are healed, Inference is healed.

  • 10:03 AM – Alert A3 triggers and correlates with Inference

  • 10:03 AM – Inference is reopened and changes to Open state

Scenario 2: A suppressed Inference remains in the suppressed state even if new alert correlates during that state.

  • 10:00 AM – Alert A1 triggers

  • 10:01 AM – Alert A2 triggers

  • 10:01 AM – Alerts A2 correlates with A1 and forms an Inference

  • 10:02 AM – Alert A2 is healed

  • 10:03 AM – Inference is suppressed

  • 10:04 AM – Alert A3 triggers and correlates with Inference

In this scenario, Inference remains in the suppressed state even if new alerts correlate with the Inference during the suppression.

Alert Details

The alert details page displays the below information for a correlated alert.

  • Details: Displays details of the Inference.

  • Correlated Alerts: Displays alerts correlated with the parent alert.

  • Incidents: Displays the details of incidents attached to an Inference.

  • Matched Escalate Alert Policies: Click Escalate Alerts on the top header of incident details page. You can view the escalate alert policies that match the alert and policy that created the incident automatically.

Escalate RCA or Inference as Notification

You can send notifications of an RCA or Inference to users or user groups on-demand basis.

  1. Log into OpsRamp.

  2. Click All Clients and from the displayed list select a client.

  3. From the drop-down options, click Setup.

  4. From the let pane, click Alert Management>Alert Esccalation.

  5. Click Add to create a new escalation policy.

  6. Provide name, description and select the organization whose users will receive notifications from this policy.

  7. Click Next: Resources.

  8. Select the resources whose alerts will match this policy.

    1. For RCA: Select the resources whose alerts which match this policy.

    2. For Inference: Select All Resources to match this policy.

  9. Click Next: Define Alert Conditions.

  10. Filter for type of alerts which occur on the previously selected resources.

    1. For RCA: Select property Alert: Metric, select Operator and then provide the value.

    2. For Inference: Select property Alert: Subject, select Operator and then provide the value.

  11. Click Next: Define Escalation Rules.

  12. To send notifications to users:

    1. Manually: Select the option Escalate directly as needed.

      1. Click Select Users to select the users who will receive the notification.

      2. Select Users, User Groups, Roster, User Group (Distribution List) as the escalation contact.

    2. Automatically: Select the option Escalate automatically as follows.

      1. Select Escalate as Notification.

      2. Click Select Users to select the users who will receive the notification, select the Priority and then select time frequency for sending notifications.

        Note: For more details on automatic escalation of a notification, see Escalate as Notification in Alert Escalation.

  13. Click Review. A summary of all sections of the escalate alert policy is available for review. Edit any section as needed.

  14. Click Save.

See example of an escalation notification:

  • Escalation – Name of the escalate alert policy.

  • Other users notified – List of other users in the other levels of escalation who received the escalation notification.

Escalate RCA or Inference as Incident

To escalate an Inference, you need to be aware of the Inference subject. OpsRamp recommends you to consider Inference subject as the primary alert condition for escalation of Inference. If you are not aware of the Inference subject, OpsRamp recommends a best practice for escalating an Inference.

Below is the recommended process flow to escalate an Inference as Incident.

Step

Description

Useful Link

1: Create Alert Correlation Policy in Observed Mode

Observed mode helps you to view the potential Inferences getting created from the policy.

Click here for instructions to create an alert correlation policy.

2: View the Inferences with Observed status in Alerts page.

Confirm that OpsRamp is collecting the type of Inferences you want to use for escalation.

Click here to view details of Inference in Alerts page.

3: Note Inference subject

The Inference (alert) subject is the primary alert condition to escalate an Inference.

Take note of the Inference subject.

4: Change the mode of alert correlation policy

If the Inferences are created as required, change the Enabled mode of alert correlation policy from Observed to ON.

Click here to view instructions to change the Enabled mode of an alert correlation policy.

5: Create Alert Escalation Policy

Create an alert escalation policy to escalate an Inference

OpsRamp provides two ways to escalate an Inference:

6: View the Incident

The Incident is displayed in the Inference details page in Incidents tab.

Click Incidents tab to view the Incident attached to the Inference.

Note: If required, see Escalate Inference as Incident to specific users scenario to understand the use case for escalating an Inference as an Incident.

Escalate Inference Manually

Perform the following steps to manually create an Incident for a RCA/Inference:

  1. Click Alerts. Alerts Browser page is displayed.

  2. Click on RCA or Inference. Alert Details page is displayed.

  3. Click Create Incident. Incident subject and description are displayed by default. However, you can edit the fields.

  4. To escalate Incident to:

    1. User group: Click Select User Group and select user groups from the drop-down.

    2. User: Click Select User and select the user from list of users.

  5. Select Category, Priority, and select the Due Date for the Incident.

  6. To attach an existing entity, click on the required entity name. For example, user can attach an existing Problem that relates to this Incident.

  7. To send an email notification to users, provide the email address in To and Cc fields.

User can now view the incident in Alert Details page.

Escalate Inference Automatically

Perform the following steps to automatically create an Incident for a RCA/Inference:

  1. Follow the instructions from 1 to 9 in Escalate Alerts as Notifications.

  2. Select the option Escalate automatically as follows.

  3. Select Escalate As Incident to automatically create an incident from an alert and assign it to the desired user.

  4. Configure the properties of the incident that will be created when an RCA/Inference condition matches this policy.

  5. Click Review. A summary of all sections of the escalate alert policy is available for review. Edit any section as needed.

  6. Click Save.

    Note:

    • A new Incident is created for an alert if there is no open Incident existing for the alert.

    • For more details on automatic escalation of an Incident, see Escalate as Incident in Escalate automatically as follows section in Alert Escalation.

User can now view the incident in Alert Details page.

Manage Alert Correlation Policy

Define Precedence

Determine the order of execution for an Alert Correlation Policy. For example, if a VMware is part of Agent status Alert Correlation Policy and Network outage Alert Correlation Policy, a user can determine which alert correlation policy should execute first to correlate alerts from the VMware.

Below are instructions to determine the precedence:

  1. Click All Clients and from the displayed list select a client

  2. From the drop-down options, click Setup.

  3. From the left pane, click Alert Management>Alert Correlation. Alert Correlation Policy page is displayed.

  4. Drag and place the Inference in the appropriate row to adjust the order. The numbers in alert correlation policy Precedence column change accordingly.

Processed Inferences

Displays the number of inference associated with a policy. Processed inference count is valid only for Inference policies, and NOT for RCA based policies.

Change the state of Alert Correlation Policy

Following table describes the different states of Alert Correlation Policy.

Enabled Mode

Description

OFF

An alert correlation policy is created and no alert correlation is performed.

Observed

  • Observed mode enables you to see the potential Inferences that would be created by the model without creating a real Inference.

  • This mode enables you to assess potential alert volume reduction from an alert correlation policy before enabling it.

  • You can view the Observed Inferences in the Alerts browser. These Observed inferences are indicated with Observed status.

  • You can only Close an Observed Inference, no other action is allowed. However, alerts correlated within the Observed Inference are independent alerts and you can perform any action on these alerts.

Enabling ModeObserved Inferences

ON

An alert correlation policy is created and alert correlation can be performed.

Follow these steps to change the Enabled mode of an Alert Correlation Policy:

  1. Log into OpsRamp.

  2. Click All Clients and from the displayed list select a client.

  3. From the drop-down options, click Setup.

  4. From the left pane, click Alert Management>Alert Correlation. The Alert Correlation Policy page is displayed with the list of all Alert Correlation Policies created.

  5. Select the Enabled status of the desired Alert Correlation Policies and change the status. The Enabled status is changed.

Delete the Alert Correlation Policy

If required, you can delete the alert correlation policy to remove from the system. When deleted, the correlation of alerts getting newly ingested to the system and matching the deleted alert correlation policy does not happen. Alert Correlation Policies are deleted in the following situations:

  1. The device/resource generating the alerts is unavailable.

  2. You do not want to correlate the alerts.

Follow these steps to delete the Alert Correlation Policy:

  1. Log into OpsRamp.

  2. Click All Clients and from the displayed list select a client.

  3. From the drop-down options, click Setup.

  4. From the left pane, click Alert Management>Alert Correlation. The Alert Correlation page is displayed with the list of all Alert Correlation Policies created.

  5. Select the checkbox of desired Alert Correlation Policy and from the top pane, click Delete. The system displays a confirmation message for the deletion of alert correlation policy.

  6. Click Yes to delete. The selected alert correlation policy gets deleted.

View Alert Correlation Policy Statistics

Inference Stats widget displays the status of events, alerts, and Inferences generated in the client.

  • Total events: Total number of events generated.

  • Total alerts: Total number of alerts generated. This includes RCA, Inference, correlated, and individual alerts.

  • Total Inferences: Total number of RCA and Inference generated.

  • Volume Optimized: Percentage of alerts volume reduced due to correlation of alerts.

Below are instructions to create an Inference Stats widget:

  1. Click Dashboard.

  2. Click Add Widget.

  3. In Predefined Widget, click Inference Stats. Add Widget window is displayed.

  4. Configure widget preferences:

    1. Show: Displays the alerts and Inferences generated within a certain duration. Default duration is Last 4 Hours.

    2. Refresh every: Refreshes the widget for every 5 to 10 minutes to display the recent alert data. Default refresh time is 5 minutes.

    3. Provide a name for the widget. Default name for the widget is Inference Stats.

    4. Inference Stats widget displays alert data in Chart style.

  5. Click Save. Inference Stats widget appears on the Dashboard.

Scenarios

Correlate alerts on specific resources

Scenario: An organization has a group of network resources consisting of Switch and Host resources. The host resources are dependent on the Switch to operate. There is a network outage causing the Switch to go down, as a result, the dependent host resources also go down generating multiple alerts. The IT admin team faces a challenge in analyzing the problem and takes more time to fix the issue.

Solution: User needs to analyze the metrics that monitor the network outage on the Switch and Host resources, and then provide the metrics in the policy definition. The alerts that generate from all the metrics are correlated to form an RCA alert.

  1. Create Impact dependency between the resources. Click here to view instructions to create Impact Dependency.

  2. Filter for resources on which the Alert Correlation Policies should be applied.

  3. Define the metrics on the resources that may generate alerts, and then define the time span. Alerts that generate within the specified time span are correlated to form an RCA.

Correlate alerts due to an unexpected cause

Scenario: The DevOps team just rolled out a new code update to an app running on multiple servers. The update has a bug and is causing high memory utilization issues on each app instance generating multiple Critical and Warning alerts. These alerts are causing multiple issues across the infrastructure. The DevOps team is receiving multiple alerts one after the other making the team difficult to diagnose the problem.

Solution:

  1. Define an Alert Correlation Policy to correlate alerts that have similar content.

  2. Configure an alert condition on Alert Source attribute to filter alerts that generate from the same app name.

  3. Alerts that generate within the specific time span possessing the app name are correlated to form an Inference.

Scenario: A customer restarts Agent on VMware resources. As a result, multiple alerts on Agent status are generated causing high amount of alert noise. The customer wants the Agent status alerts generated within 1 hour to appear as a single alert in order to reduce the alert noise.

Solution: Define an Alert Correlation policy to correlate Agent status alerts on VMware resources generated with a span of 1 hour. Provide the metric that monitors the Agent status of the resources.

  1. Filter for VMware resources on which the Alert Correlation Policy should be applied using Native/Custom attributes.

  2. Select time span from the drop-down. Alerts that generate within this time span are correlated.

  3. Configure an alert condition on Alert Metric to correlate alerts that match with the metric name.

  4. Alerts that generate within the specific time span are correlated to form an Inference.

Escalate an Inference as an Incident to specific users

Scenario: An organization has IT infrastructure consisting of compute devices. A job is assigned to reboot the devices every Friday. During the reboot, there are alerts on Uptime, Agent Status, and Process slow down. The Support team creates an Alert Correlation Policy to correlate these alerts. An Inference is created in OpsRamp with the correlated alerts. The Support team now wants to escalate the Inference with Critical status as an Incident to the IT administration for quick resolution.

Solution: Create an alert escalation policy to escalate Inference as an Incident. While defining the alert escalation policy, create a rule to escalate the Inference on the basis of Inference subject. Click here to view instructions to escalate an Inference.

For the above scenario, define an escalation rule to escalate Inference based on the Inference subject. In Step 3: Define Alert and Resource Conditions section, select the filter Property as Alert: Subject, select the required Operator and provide the values in Inference subject. For example, if the Inference subject is “CPU Uptime, Agent Status Down on Windows Server”. You can configure the escalation rule as shown in the below screenshot.

Correlate alerts on multiple resources using a single policy

Scenario: A user managing a group of Windows and Linux devices, wants to correlate alerts on the devices using a single policy and escalate them to respective teams.

Solution: Define Alert Correlation Policy to correlate alerts based on the operating system.

  1. Provide name for Alert Correlation Policy.

  2. In the Filter Criteria section, select attribute as Operating System, operator as Regex and the provide the value as Windows|Linux.

    Note: Click here to filter resources using Regular Expressions.

  3. In Policy Definition section, select Algorithm based correlation, select the Time Span, select condition as Device Type has 100% matching content, and click Save.

    Note: See Escalate Alerts as Notifications or Escalate Alerts as Incident to view instructions to escalate alerts to respective teams.

Appendix

Training File

Introduction The training file is the input data for machine-learning to learn alert patterns and drive the alert correlation.

Create a Training File (CSV) A CSV is a comma-separated values file which allows data to be saved in a table structured format. The CSV file is generally a text file containing information separated by commas.

Below are the instructions to create a CSV file using Microsoft Excel 2010, but any spreadsheet program will have a similar process.

  1. Open a spreadsheet (Microsoft Excel or Google Sheets).

  2. Provide the alert sequence (metric names) in each column as shown below:

    Note: Each line represents one alert sequence.

  3. Click File and then click Save As.

  4. Select a local folder, provide a name for the sheet, select CSV UTF-8 (Comma delimited) and then click Save.

Frequently Asked Questions

What is the relationship between partner-defined client scope policy and client-defined policy?

Alert Correlation policy can be defined by partners on behalf of a client. A partner can define the policy for a single client or multiple clients. However, each individual client can define their own policy or can use partner-defined policy.

Partner-defined client scope policy is automatically available to specific clients. Client can only view the inherited policy, but cannot edit. However, client can enable or disable an inherited policy.

Last updated

Was this helpful?