5  Model

Train model

Check out the user manual for more details about this module’s features.
Use the Model >> model module to create a machine learning-based predictive model. Models are useful for identifying how well the data can classify groups(classification) or predict continuous variables (regression). Model variable importance and coefficient weights are useful for determining the contribution of each variable to the model performance.

Methods

Specify the variable to be predicted by using the data menu and selecting a regression (continuous numeric variable) or classification (factor or discreet). Next, select the variables to build the model and optionally filter options based on previous analyses. The following example shows how to select only significant variables based on statistical analysis.

Use the model menu to specify the modeling algorithm. Use tune model to specify how to tune model hyperparameters, where auto will create a grid of size grid size of all possible hyperparameter options for the selected algorithm. Select the metric to use to evaluate the model quality and select optimal hyperparameters. Use the optimize for option to show relevant information for the training or test data (recommended). Note, you can select multiple model algorithms to calculate and select optimal models from.

Use the train control menu to specify the proportion of the data to use to train and test the model. Optionally select to manually define the model cross-validation options used during model training and scoring.

Select calculate and view the analysis methods and results.

Explore and plot

The performance plot to compare model performance and training time.

The model plot to shows the cross-validated model scores for a variety of metrics.

The importance plot ranks all features based on their contribution to the model’s performance.

The confusion (matrix) plot shows predicted vs. actual sample classifications. Use show show metric to plot counts or percent or correctly and incorrectly classified samples.

The classification plot is used to create a precision vs. recall or area under the receiver and operator curve (ROC).

Report

Create a report to save all methods and results.

Save

Save the results for later analyses. Use cache models to save specific model results. Unselect the cache models option if you want to create predictive models for different objectives for the same data set.

Feature selection

Use the Model >> select module to carry out feature selection to identify the most predictive variables to predict the variable of interest. Note use the Model >> model menu to first create model and initialize the training and test data.
Specify the variable to be predicted by using the data menu and selecting a regression (continuous numeric variable) or classification (factor or discreet). Next, select the variables to select from.

Use the optimize menu to select the algoritjm to rank variable’s predictive power. The initial subset defines the minimum and maximum of optimal variables to pick. The metric specifies how to score each model for the variable selection procedure.

The validate menu can be used to specified the model cross-validation which is used to calculate the metric for each subset of variables.

Select calculate and view the analysis methods and results.

Explore and plot

The overall plot shows the model performance metric selected in optimize for each subset of iteratively selected variables. Use best subset function to define the set of variables with the best model performance.

The importance plot to shows the feature importance of all variables and highlights the optimal subset.

Report

Create a report to save all methods and results.

Save

Save all feature selection results.