Configuring Basic Options
-
Enter the maximum Number of trees in your model.
-
Enter the Maximum depth—or the maximum number of levels
you want your model to contain.
-
Enter the Minimum rows—the minimum number of rows (or
records) you want your model to contain.
-
Enter the Number of bins numeric—the number of bins you
want the histogram to build and then split at the best point.
-
Enter the Number of bins top level—the minimum number of
bins you want at the root level.
-
Enter the Number of bins categorical—the maximum number
of bins you want the histogram to build and then split at the best point.
-
Check Sample rate and enter the percentage of the rows
to be used as a sample in each tree. This can be a value from 0.0 to 1.0.
-
Check Column sample rate per tree and enter the column
sampling rate for each tree. This can be a value from 0.0 to 1.0.
-
Check Columns at each level and enter the relative
change of the column sampling rate for every level. Valid values range from 1.0
to the number of the selected input predictor. Default is 1.0.
-
Check Score input data to add a column for the model
prediction (score) to the input data.
-
Specify a value between 1 and 100 as the Percentage for training
data when the input data is randomly split into training and test
data samples.
-
Enter the value of 100 minus the amount you entered in Step 5 as the
Percentage for test data.
-
Seed for sampling to ensure that when the data is split
into test and train data it will occur the same way each time you run the
dataflow. Uncheck this field to get a random split each time you run the
flow.
-
Click OK to save the model and configuration or continue
to the next tab.