Defining Model Properties

  1. Under Primary Stages / Deployed Stages / Machine Learning, click the K-Means Clustering stage and drag it onto the canvas, placing it where you want on the dataflow and connecting it to other stages. Note that the input stage must be the data source that contains input variable fields for your model; an output stage is not required unless you select the Score input data option on the Basic Options tab. You may also connect an output stage if you wish to capture your output independent of the Machine Learning Model Management tool.
  2. Double-click the K-Means stage to show the K-Means Clustering Options dialog box.
  3. Enter a Model name if you do not want to use the default name.
  4. Optional: Check the Overwrite box to overwrite the existing model with new data.
  5. Enter the Number of clusters you want in your model if you do not want the default number (5).
  6. Optional: Enter a Description of the model.
  7. Click Include for each field whose data you want added to the model.
  8. Use the Model Data Type drop-down to specify whether the input field is to be used as a numeric, categorical, or datetime field.
  9. Click OK to save the model and configuration or continue to the next tab.