Thank you for your feedback.
Form temporarily unavailable. Please try again or contact to submit your comments.

Root cause CIs

Log in to subscribe to topics and get notified when content changes.

Root cause CIs

For alerts associated with discovered business services or manual services, Service Analytics implements root cause analysis (RCA) to identify the CI that is the underlying root cause in an automated alert group.

If the Domain Support - Domain Extensions Installer plugin is activated, then RCA is domain aware. Alerts are analyzed within the context of the domain that business services or manual services belong to, and RCA for a business service or manual service runs on the MID Server that is in the same domain as the business service or manual service. If there is no MID Server for a specific domain, then the MID Server from the global domain is used.

Root cause analysis

The ongoing operations of an application can generate a large number of events and alerts which can become overwhelming when problems arise. If the system is experiencing a problem, the manual process of assessing the impact of the alerts and identifying the underlying cause might require extensive resources and might be lengthy.

To identify underlying problems, RCA algorithms prioritize alerts, group them in the context of impacted services, and identify root cause CIs. The root cause CI is a CI from which the root alert for an incident originated and which subsequently triggered other alerts to be generated.

RCA has these components:
RCA Learner
An offline job that runs once a day to process past alerts. It collects information about frequent alert patterns within a service context, and stores this information in the alert knowledge base. Based on past alerts and on the impact model, the RCA Learner creates a probability model that can be used to answer cause and effect queries.
Real Time Query
A scheduled job that runs every minute to group alerts and to update root cause CIs. It queries past lists of root cause recommendations to get the probability score for real-time alerts with respect to other open alerts within the service.

For discovered business services and for manual services, the Learner collects and analyzes data from past alerts. For new alerts, the Learner applies existing knowledge from similar past alerts, and continues to capture and analyze data from new alerts. As more alerts are encountered and resolved, the alert knowledge base grows and the precision of diagnosing the root cause CI improves.

Business services are discovered by Service Mapping and represented internally in the system by a service model. The service model of the business service is used for identifying CIs related to the root cause CI.

When the root cause CI is known, operators can create a single incident ticket and engage only the needed IT operator to expedite remediation. The IT operator can direct troubleshooting efforts to remove the root cause problem, and stop the recurrence of undesirable events.

Setting RCA properties

By default, RCA is not applied. You can enable RCA and modify other RCA-related behavior by changing the settings of the sa_analytics.aggregation.include_service and sa_analytics.rca_enabled properties for Service Analytics.

RCA configurations

RCA uses an RCA configuration that filters and scopes the alerts to be analyzed. The base system includes pre-defined RCA configurations, but those might not be optimal in every environment. See RCA configurations for more information.

Confidence score

To help you decide how to invest troubleshooting efforts, RCA algorithms calculate a confidence score for the identified root cause CI. The confidence score is based on the Learner data and expresses the confidence in the identification. For example, a confidence score of 75% means that there is a certainty of 75% in the identification of the root cause CI. If more than one cause is possible, you can investigate the most likely root cause before investigating less likely root causes.

By default, RCA groups with any confidence scores are displayed. To limit what groups are displayed, change the sa_analytics.rca.query_probabality_threshold property to a percentage that the RCA group confidence score must meet to be displayed. If a root cause CI has a confidence score that is lower than the specified percentage, Service Analytics does not treat that CI as the root cause.

Viewing root cause CIs

You have these options for viewing root cause CI, if applicable.

UI access Description
Event Management dashboard - root cause CI Displays root cause CIs highlighted in a business service map, and the relationships between the root cause CI, alerts, and related CIs in business services.
From a correlated alert group Displays all automated alert groups and lets you drill down to view details about the alerts in the group, and the root cause CI if it exists. Double-click a group, and then click the Impacted Services tab to display the services and root cause CIs if applicable.