Event Management guidelines

Improve overall system performance and learn about efficient configuration practices by considering the following Event Management tips.

Table 1. Event Management system performance improvement tips:
Topic Details
Business rules
  • Avoid writing business rules for event tables [em_event] as they can result in performance degradation.
  • Business rules that are written for alert tables [em_alert] must be highly efficient or they may result in performance degradation. Instead of writing a business rule, consider whether it is more appropriate to write a job.
  • Async business rules that are written for alert tables are also not recommended.
Configure for large-scale environments
  • Set the Enable multi node event processing (event_processor_enable_multi_node) property to Yes.
  • Set the Number of scheduled jobs processing events (event_processor_job_count) property to 4.
  • If you are sending events from a custom source, verify that events have Message Key or Source, Node, Type, and Resource data.
Latency issues for receiving events Check the following settings:
  • Verify that the Bucket field in the Event [em_event] table is set to a value that is greater than zero (0).
  • Navigate to System Scheduler > Scheduled Jobs Scheduled Jobs and search for Event Management - process events*.

    Check that all Event Management - process events* jobs exist according to the Number of scheduled jobs processing events (event_processor_job_count) property configuration. Verify that the State is Running or Ready. If the state is Queued or Error, set the job state to Ready.

Archive events

Avoid changing the default retention time for events.

To log events for a longer time, create an archive table and a job that copies new events to it. Do this by scheduling a job to regularly back up events [em_event] to a custom table.

ServiceNow does not recommend extending table rotation by adding more days.

Planning

  • Organize event source configuration of filters, modules, and so on, into multiple parallel efforts, rather than in serial.
  • Validate processed event formats to ensure that the data that is parsed is aligned with desired results.
  • Test production events in a non-production environment. Integrate with non-production element managers and ServiceNow instances. If non-production element managers are not available, send events from element managers to both production and non-production environments.

Event integration

SNMP traps

  • It is preferable to use a monitoring tool to send SNMP traps rather than sending them directly from devices.
  • Upload MIBs prior to defining event rules to avoid having to rewrite the Event Rules.

Web service API

  • Using a web service API for integration can reduce the number of Event Rules needed. This avoids having to transform events (prepared data is sent in an event to ServiceNow).
  • Use dedicated credentials for Event Management integration. Optionally, designate credentials specific to each event source.
  • Use cURL instead of Python, Perl, and so on.

CloudWatch

  • Use dedicated credentials for integrating CloudWatch with ServiceNow.

Email

  • Use email only if the source has a low volume and other options are not available, such as, running a script or forwarding an SNMP trap.

Event rules

  • Write Event Rules to apply to the broadest number of events possible. More specific rules can then be created as necessary. More specific rules should use a lower-order value.
  • Avoid writing Event Rules that apply only to a certain subset of events if a more general rule can achieve the same outcome.
  • When Event Rules are applied to events, no changes are made to the original event. All processing occurs in memory, so use the Processing Notes field and/or use the Check Process of Event UI action link to troubleshoot.
  • If you change a rule/transform that has existing mapping rules, you should review and retest with events that are either real or simulated.
  • Establish a consistent naming convention. A common convention is: <customer acronym>.<Event Source>.<Description>. For example, ACME.OEM.Normalize
  • Use the Order field to control which Event Rule runs if two Event Rules have similar conditions set.

Construct Event Rules using these guidelines

Table 2. Effective rules should be written to achieve:
Desired result Required activity
Effective de-duplication and enabling efficient parallel event processing Populate the Source, Node, Type, Resource, Metric Name fields.
CI binding
  • Bind to host - by populating the Node field and optionally CI identifiers.
  • Binding to application and / or device – by populating the CI type field and the Additional Info fields.
Alert Correlation, using Service Analytics Populate the Resource and Metric Name fields.
Note: If CI is also bound, Alert Correlation is improved.

Event Mapping rules

  • Ensure that the From field value exactly matches a string in the JSON in the additional_info field. This matching happens when a rule has been configured based on information in a MIB file. However, if the MIB file is not uploaded, the JSON for the SNMP trap then shows varbinds (variable binding) with dotted names, instead of the translated name in the MIB. The event field mapping rule then fails to be applied.
  • Establish a consistent naming convention. A common convention is: <Customer acronym>.<Event Source>.<Mapping target>. For example, ACME.OEM.Severity.

De-Duplication

  • The message_key field is used for De-Duplication. If reliable message key values are not provided with the source event, it is important to have a well-defined plan for constructing these identifiers.
  • If the message key is not defined, then the message key is <Source + Node + Type + Resource + Metric Name> .
  • The guideline is to have the event source populate the above five fields out-of-the-box (OOB), and populate the message key. This enables a better distribution of event processing among instance workers and nodes.
  • If the source event does not have values for these fields, make sure to populate them using transform rules. This does not affect event processing, but is used for de-duplication.

CI Binding

  • Where possible, always attempt to bind an alert to a CI.
    Note: Alerts are bound to CIs, not events.
  • To bind a host, machine or any device with an IP, the guideline is to populate the event Node field with a unique host name, FQDN, IP or MAC address. If other identifiers are necessary to identify a host, then populate the ci_identifiers field with JSON that contains the CMDB field name and value to perform the match.
    Note: This has to be populated from the source before the event is inserted.
  • The primary binding strategy is to use the Node field. If the Node field is not pre-populated in the event, it can be populated using Event Rules.

Alert Rules

  • A scheduled job applies Alert Rules to new Alerts every 11 seconds. If an Alert Rule does not immediately start, allow 10–15 seconds before you start troubleshooting.
  • Use the Order field to control which Alert Rule runs if two Alert Rules have similar conditions set.

Task templates

  • Create a user called Event Management (or a similar name) so that the Created by field in a task template (for example, Incident) can be set to indicate that Event Management was the source of the task.
  • To perform any dynamic value assignment or to override OOB dynamic value assignment, use the EvtMgmtCustomIncidentPopulator script include.

Remediation

  • Always set orchestration workflow properties to the Remediation Task [em_remediation_task] table.
  • Use ECC Queue and Workflow > Live Workflow > All Contexts to find more detailed information on remediation activities.

Services and Dashboard

  • Use Service Groups to group business services into logical groups to reduce the number of services displayed on the Service Health dashboard.
  • Import manually built service maps. For a description of the conversion process, see Import a business service as a manual service.

Performance

Enable multi-node in production environments and set values based on the size of the deployment and expected event rate.

Operational Metric collector logs and files

The Operational Metric collector logs and files are located under the path $(MID_SERVER_DIR)/agent. Use these logs and files for troubleshooting and monitoring purposes.

Table 3. Location of Operational Metric collector logs and files:
Log or file Path
PowerShell metric collector log file Logs/retrieve_metrics{connector instance ID}.log
PowerShell output file work/metrics/metrics_output_{connector instance ID}.txt  
PowerShell input file work/metrics/parameters_{connector instance ID}.txt

The performance of Operational Metrics can be checked in the MID Server log file when the mid.log.level MID Server parameter is in debug mode.

Operational Metrics performance numbers are available in the sa_performance_statistics table. To display the performance numbers, filter the Performance Statistics list for Metric Collector.

Add custom fields to an event

Additional fields should be included in the Additional info field of the event. Additional fields should not be added to an event by adding a custom field to the event table [em_event]. For more information about how to include additional fields in events, see Populate custom alert fields.