Build a collection of words and phrases that functions as the vocabulary the system uses to compare your instance records based on their textual similarity. You can think of the word corpus as a dictionary that you want your machine-learning system to understand.

Before you begin

Role required: admin or ml_admin
Important: In the Washington DC release, models in the classification, clustering, and similarity frameworks use Workflow solutions. These are pre-trained, so a word corpus isn't needed for your new solutions.

After upgrading, your existing solutions with a word corpus become Workflow solutions the next time they are re-trained. Also the Word Corpus field is removed from the form.

The following information is provided for legacy context.

About this task

The primary purpose for a word corpus is to infer textual data for training your NLU model. If using a word corpus in a solution, you must specify it for training in the solution definition phase of a solution. A trained word corpus can be reused across solutions and capabilities.

You can use a word corpus to help compare similar record text in a table or across multiple tables. A word corpus can also be helpful in other scenarios, such as clustering, where you group similar records together for data analysis, reuse, or review. The items you add to your corpus should be specific to your company and your industry so you can reuse it in other similarity or clustering solutions and apply it to various use cases.

In this example procedure, you're working on incident records and you want to locate relevant knowledge base (KB) articles that could provide resolutions to those incident cases. Your goal here is to create a word corpus that you can apply to a new similarity solution that compares active incidents to published KB articles.

Procedure

  1. Navigate to All > Predictive Intelligence > Word Corpus.
  2. In the Word Corpus form, click New.
  3. Configure these fields according to the following guidance.
    Field Description
    Name A unique title that references the contents of your corpus. For example, in this use case you could enter a name such as Active Incidents and Published KBs, as the name indicates the tables that your corpus will mine to help create your solution.
    Active Select this check box if you're creating more than one word corpus at a time and you plan to configure their detail components later. Otherwise, leave it blank because you can select it in a later step.
  4. Select Submit.
  5. In the Word Corpus list view, locate your new word corpus and click its Name value to open the record.
  6. In the Word Corpus Contents section, Click New.
  7. In the Word Corpus Content form, configure these fields per the following guidance to define a content component for your word corpus.
  8. Select Submit.
  9. In the Word Corpus Details section, select New.
  10. Configure these fields according to the following guidance to define a second content component for your word corpus.
  11. Select Submit.

    Your two word corpus content components appear on the word corpus form.

    This image shows the two content components you've created for your word corpus.
  12. Select Update.

Result

The completed word corpus you created appears on the word corpus form and is available for use in your similarity and clustering solution definition forms.

When you click Update, the system validates the addition of the components to complete the corpus creation process.

What to do next

Create a solution in the appropriate framework. For more information, see the links in the Related Content panel on this page.