Operational Intelligence

Operational Metrics within Operational Intelligence learns from historical metric data, and builds standard statistical models to project expected metric values along with upper and lower control bounds. Operational Metrics then uses these projections to detect statistical outliers and to calculate anomaly scores. Operational Metrics is available when you activate the ITOM Metric Management (com.snc.sa.metric) plugin.

Metric data is collected by various data sources. For example, SCOM is a monitoring system that collects metric data from the source environment regularly. Operational Metrics captures the raw data from these monitoring systems, and uses event rules and the CMDB identification engine to map this data to existing CIs. Operational Metrics then analyzes the data to detect anomalies and to provide other statistical scores. By default, Operational Metrics is already partially configured for using the SCOM data source.

After processing, metrics statistics and charts are displayed in the Metric Explorer, and the Anomaly Map displays correlated scores for CIs with the highest anomaly scores, across a timeline.

Terms used with Operational Metrics

Source metric type
A metric such as '% Free Space' or 'Current Bandwidth' that can be measured by a data source for a CI. For each data source, you can choose which of all possible metrics will be processed. By default, there are about 380 source metric types that are active for the SCOM data source.
Statistical model
Processes on the Metric MID Server learn statistical models for each metric from past metric data (up to four weeks old). Each statistical model is used to project the expected average behavior of metric for the next 24 hours, to calculate upper and lower control bounds for the metric, and to identify anomalies.
Anomaly
Data that is outside the control bounds is considered a statistical outlier. These outliers are used to compute an anomaly score, which is a value between 0–10 that indicates the degree to which the metric appears unlikely. Outliers are monitored, and when an anomaly score is above a minimum threshold, an anomaly alert is generated. Anomaly alerts are reported separately from the system's regular alerts.

Operational Metrics MID Server

Operational Metrics uses a dedicated MID Server to process the raw data, to build a statistical model, and to detect anomalies. The MID Server is configured with extensions that enable it to independently transmit metric data to the instance. The MID Server transmits the statistical model and anomalies above a specified score (4 by default) to the instance where it is accessible to users.

Metrics to CI mapping

The data that is collected on the MID Server is raw and does not relate to any specific CI in the CMDB. To be useful, the data goes through a normalization process that uses CMDB identification rules and event rules to uniquely identify CIs, and to map them to the raw data.

Records for mapping of raw data to CIs are automatically generated and remain in effect for a specified length of time determined by the properties:
  • sa.metric.map.with.ci.expiration.sec: If the mapping to the CI was found. Set by default to be valid for five days.
  • sa.metric.map.without.ci.expiration.sec: If mapping to the CI was not found. Set by default to be valid for 24 hours.
When similar metric data arrives within that time period, the existing mapping is used to match the data to CIs. At the end of the time period, metric-to-CI records expire. Also, a change in the event rules triggers an immediate expiration of the respective metric-to-CI records. Next time that raw metric data arrives, it will be normalized again. When Discovery adds or removes CIs, mappings are adjusted to reflect these changes at the next cycle.