Configuring Basic Options

Enter the maximum Number of trees in your model.

Enter the Maximum depth—or the maximum number of levels you want your model to contain.

Enter the Minimum rows—the minimum number of rows (or records) you want your model to contain.

Enter the Number of bins numeric—the number of bins you want the histogram to build and then split at the best point.

Enter the Number of bins top level—the minimum number of bins you want at the root level.

Enter the Number of bins categorical—the maximum number of bins you want the histogram to build and then split at the best point.

Check Sample rate and enter the percentage of the rows to be used as a sample in each tree. This can be a value from 0.0 to 1.0.

Check Column sample rate per tree and enter the column sampling rate for each tree. This can be a value from 0.0 to 1.0.

Check Columns at each level and enter the relative change of the column sampling rate for every level. Valid values range from 1.0 to the number of the selected input predictor. Default is 1.0.

Check Score input data to add a column for the model prediction (score) to the input data.

Specify a value between 1 and 100 as the Percentage for training data when the input data is randomly split into training and test data samples.

Enter the value of 100 minus the amount you entered in Step 5 as the Percentage for test data.

Seed for sampling to ensure that when the data is split into test and train data it will occur the same way each time you run the dataflow. Uncheck this field to get a random split each time you run the flow.

Click OK to save the model and configuration or continue to the next tab.