Group similar records into clusters so you can address them collectively or identify patterns.

Before you begin

Role required: ml_admin or admin
Important: In the Yokohama release, models in the classification, clustering, and similarity frameworks use Workflow solutions. These are pre-trained, so a word corpus isn't needed for your new solutions. When your existing solutions with a word corpus are re-trained after upgrading, they become Workflow solutions, and the Word Corpus field is removed from the form.

About this task

In this example procedure, you're creating a solution to identify a major incident by grouping similar incidents that have occurred recently.

For information on using encrypted training data, see Data Encryption in Predictive Intelligence.

Procedure

  1. Ensure that you are in the application scope that you want for your solution definition, then navigate to All > Predictive Intelligence > Clustering > Solution Definitions.
  2. On the Clustering Definitions list, select New.
  3. On the Clustering Definition form, configure the fields according to the following guidance.
  4. Select the appropriate context menu option or button for your solution definition.
    OptionDescription
    Save or Save & Train Save your solution definition record so you can return to it later, or save and submit it for training.
    Submit or Submit & Train Create your solution definition record and submit it, or submit and train it.
  5. If you submitted the solution for training, click OK on the Training Activation window to confirm.

Result

The system trains the solution and notifies you in real time when the training completes.

A treemap plot appears on the Cluster Visualization tab of your Clustering Solution Definition form. The plot shows the clusters the system formed for your solution in descending order from the top-left corner to the bottom-right corner. The treemap node labels are the Cluster Concept, which is created by the top words from the cluster, and helps you see the most prominent content found in each cluster.
Note: The Cluster Concept displays the top words from the processed input data, in the data's language. Depending on the language, the Cluster Concept may display words in their root form and so appear truncated.
Each node is colored from red to green depending on the cluster quality for that node. The Select Group filter appears only when you select the Use Group By and Group By fields on your Clustering Definition form. When you point to a cluster, you can see its Groupby value, Cluster Count, and Records in Groupby.
Figure 1. Cluster visualization example
Cluster Visualization tab of your Clustering Solution Definition form. It shows the clusters the system formed for your solution in descending order.

To open a cluster, you can click it, or select it from the Show All Groups filter.

Inside the cluster grouping, you can filter the results further by using the two slide bars for cluster size and cluster quality, respectively. You can also navigate backward by clicking the Back button, which only appears when a clustering hierarchy is present. When you point to a cluster at this level, the Purity field percentile values appear along with the Cluster Concept, Quality, and Size values.
Figure 2. Cluster group example
Cluster group. You can filter for cluster size and cluster quality. When you point to a cluster at this level, the Purity field values appear.
When you click a cluster node, its ML cluster details appear in a list view format.
Figure 3. Cluster details page
The ML Cluster Detail table contains the record IDs for the clustering solution, Groupby node, ML solution name, and ranking percentage are shown in a list view format.

What to do next

  • Review the solution output on the Solution Statistics tab of your solution. If you aren't satisfied with your clustering solution results, reconfigure the values you've set to your solution and retrain it until the results are to your satisfaction.

  • Review the Cluster Summary tab for a list view of the cluster IDs, quality size, and Groupby values.
    Figure 4. Cluster Summary example
    The Cluster Summary table shows all clusters are listed by cluster ID,quality, size, and groupby value.
  • On the Cluster Updates tab, review the summary of changes to clusters for each cluster update interval you configured in the solution definition.
    Figure 5. Cluster updates example
    The Cluster Updates tab shows records assigned to other clusters, newly created, and how many records have been assigned to new clusters, and the total number of records clustered.