Consider applying the Hierarchical Density Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm to your clustering solution. HDBSCAN is available as an alternative to the default clustering algorithm, k-means.

Before you begin

Note: Configuring advanced settings on your ML solutions is optional. If you choose to configure any of these settings, make sure you're well informed regarding the technology you're enabling in the solution, and that your use case benefits from what the technology offers. For more information, see the Dive deeper with Clustering Advanced Parameters article on ServiceNow Community.
  • Create a clustering solution definition or use an existing one.
  • Role required: admin or ml_admin

About this task

You can apply the HDBSCAN algorithm to help the system identify data samples that aren't assigned to any cluster. For example, you can apply HDBSCAN to support Topic Discovery.

Predictive Intelligence implements the k-means algorithm by default in its clustering framework. HDBSCAN is similar to the DBSCAN clustering algorithm except that it works with minimum-sized clusters and can help deliver more stable and persistent clusters. For a summary of how HDBSCAN works, see this article. For a comparison between DBSCAN and HDBSCAN, see this article and this article.

Note: Clustering solutions trained with HDBSCAN do not support cluster updates. Updates on these solutions fail and the solutions are not logged in the ml_cluster_detail_table. Use DBSCAN or k-means training methods if you want to enable cluster updates.

Procedure

  1. Navigate to All > Predictive Intelligence > Clustering > Solution Definitions.
    The Predictive Intelligence navigation path to the Clustering Definitions list view where you create a new clustering solution definition.
  2. Select New.
  3. Create a new clustering solution definition form or use an existing one.
    In this example scenario, you create the hdbscan-sf clustering definition form as in the image below. Configure the fields as follows:
    • Label: hdbscan-sf
    • Word Corpus: incident_wc, or any other word corpus that has incident record data (from the Washington DC release, a word corpus is not needed, so this field does not appear).
    • Table: Incident [incident]
    • Fields: Short description
    • Update Frequency: Do not update
    • Stopwords: Default English Stopwords
    • Training Frequency: Every 30 days
    • Processing Language: English
    An example clustering solution definition you create on which you will apply the DBSCAN parameter.
  4. Select Submit & Train.
  5. On the Advanced Solution Settings tab in the Related Links section of the trained form, select Solution Parameters from the picker, then select New.
    How to select the Solution Parameters option for creating the parameter.
  6. Create a parameter record.
    1. In the Solution Parameters field, click the search icon.
    2. In the ML Solution Parameters screen, select Use HDBSCAN algo for clustering.
      How to create the parameter record by selecting the Search button, and then selecting the HDBSCAN key's Short Description.
  7. Select Submit.
    The Advanced Solution Setting record appears with the HDBSCAN algorithm applied to the record. The field User Inputs is grayed out because it does not apply to this algorithm.
    The new Solution Parameter record you create from the values you just assigned.
  8. Select Submit.
    Result: HDBSCAN is configured for your clustering solution. Its solution parameter appears on the Advanced Solution Settings tab of your clustering solution definition form.
    When you submit the record you created, the HDBSCAN solution parameter appears on your clustering solution definition form.