Machine Learning Module

Regularization

You can now apply a Regularization type to Linear Regression and Logistic Regression models to help manage overfitting. You can select from the LASSO, Ridge Regression, or Elastic Net types. Regularization can be fine tuned with these additional new fields for both stages:

Value of alpha
Value of lambda
Search for optimal value of lambda
Stop early
Maximum lambdas to search
Maximum active predictors

For more information, see "Configuring Advanced Options" for Linear Regression or Logistic Regression in the Machine Learning Guide.

Model Metrics Port

The new output Model Metrics Port lets you attach a data output stage to send model assessment metrics to a data file. This information helps you compare many models generated from within and outside of Spectrum™ Technology Platform and perform other data processing tasks on the metrics. Alternatively, you can add an inspection point and run inspection on the dataflow to view these metrics in Enterprse Designer. This new output port is available for the following stages:

K-Means Clustering
Linear Regression
Logistic Regression
Principal Component Analysis
Random Forest Classification
Random Forest Regression

Machine Learning Module Management Updates

The Model Assessment page in Machine Learning Model Management has the following updates in this release:

Model Type—A new field that indicates the type of model that was created by the dataflow.
Creation Time—A new field that indicates the date and time the model was created.
Value of lambda—Linear Regression and Logistic Regression contain this new metric, which controls the amount of regularization applied.
K-Means Clustering—Figures for the following three fields on the Model Summary tab of the Model Detail page now come from training data:
- Within cluster sum of squares
- Total sum of squares
- Between cluster sum of squares