Starting with the Tokyo release, a new, optimized data collector is available. The log details for this data collector differ from the log details of the classic data collector.

Collection overview

The data collection job involves executing an SQL query for each indicator source that uses the data collector. The query repeats for every collection time from the start date to the stop date. Then queries run for the next indicator source. Each step of executing a query is documented in the data collection job log.
Figure 1. High-level view of data collection job
Flow chart showing the relation between the different levels of processes in a data collection job

Starting and completing collection job

Follow the data collection process as revealed in log messages. The log messages in each subprocess have been separated into their own tables.

Note: This log is for the optimized HSQL data collector as described in Optimizing data collection. If com.snc.pa.dc.hsql is "false," you are using the classic data collector. Otherwise, you are using the optimized data collector. For more information about that log, see Log details for classic data collector.

Steps iterated for each indicator source

Steps iterated for each collection period

Collection job step Example of resulting log entry
Start date of collection job. Collecting for 20191103 on Indicator Source Incidents.Open
Fetch the following fields:
  • The unique fields of the breakdown sources (typically sys_id)
  • The indicator fields used in breakdown mappings
  • The table fields used in scripted breakdown mappings
  • The indicator source fields used in text analytics, if the text index is collected (typically includes short_description)
Fetching "short_description,sys_id, opened_at, assignment_group, description, priority, category" from "incident"
Generate SQL based on the conditions that are specified in the indicator source.
Note: If the indicator source specifies Today in one of the conditions, Today is considered relative to the period for which the data collection job is executed. For example, the Incidents.New indicator source includes the condition [Opened][on][Today]. With days defined to start at 07:00:00, when data is collected for 2019-10-28, the job produces the SQL script on the right.
SELECT task0.`sys_id` FROM task task0 WHERE task0.`sys_class_name` = 'incident' AND (task0.`opened_at` >= '2019-10-28 07:00:00' AND task0.`opened_at` <= '2019-10-29 06:59:59')
Display the number of records fetched by the SQL. Fetched 150 rows from Indicator Source Incidents.Open (facts table: incident)
Delete previous results. Deleted previous results 1 for Indicator: Impersonations
Run the map/reduce function to calculate indicator scores. See the table Steps of the map/reduce function
If text indexing is active and has been configured for the indicator source, the data collector stores the resulting text index. Storing Text Index for indicator source Incidents.Open
Bytes used by text index: 41,984 for: Incidents.Open
Objects are offloaded to the HSQL database whenever one of the following thresholds is exceeded:
  • Number of scores
  • Number of snapshots
  • Total number of Count Distinct values

Log number of times the offload limit was exceeded for each category of object. Each time the offload limit is exceeded in one category, all objects are moved to HSQL temporary database storage. At the end of the map/reduce function, any remaining records are offloaded. The number of remaining records is usually less than the threshold, and this offloading is also logged.

Number of times offload limit crossed threshold for scores: 275

Number of times offload limit crossed threshold for snapshots: 0

Number of times offload limit crossed threshold for distinct values: 0

Offload batch of records which didn't cross threshold: 1

Begin storing newly collected results for the indicator source. Storing collected results for indicator source Incidents.Resolved
Any indicator with a null value is logged. Stored 'Value when nil': 0.0 as score for indicator: Number of incidents deflected by knowledge
Finish storing results. Stored collected results
Specify which indicators the data collector does not collect scores for. Not collecting for Indicator: Summed age of open incidents with excluded Breakdown: Assignment Group
Finish collecting data for that indicator source for that period. Collection for 20200519 on Indicator Source Incidents.Open finished
Display the average amount of time spent on a record to process additional conditions that were set on indicators. Average time taken to process additional indicator condition for each record: 138 ms

Steps of the map/reduce function

Collection job step Example of resulting log entry
The map/reduce function runs. Applying map/reduce function for indicator source Incidents.Open
Warning if text indexing is active for the job but cannot run because of a missing or invalid text index configuration. Skipping text index collection as there is missing or invalid text index configuration
Process each record on the indicator source to calculate scores. Log messages track running total of processed records.

Each time the number of resulting scores or snapshots exceeds a limit, all objects are offloaded to the HSQL database. The offloading is logged after the map/reduce function completes.

Processing for each record involves:

  • Execute or evaluate any scripts.
  • Validate indicator conditions.
  • Calculate the indicator score.
  • For each breakdown:
    1. Calculate the breakdown score or execute the breakdown script.
    2. Retrieve all breakdown unique values.
    3. Create or update the array for scores or snapshots. The array is: Indicator, Breakdown 1, Artifact 1, Breakdown 2, Artifact 2, Domain, Value, Array of [sys_id]
Processed 15.8% (3) records
Processed 31.6% (6) records
Processed 47.4% (9) records
Processed 63.2% (12) records
Processed 78.9% (15) records
Processed 94.7% (18) records
Processed 100% (4) records, Result Object size: 20 MB
The map/reduce function completes. Applied map/reduce function