4 Multivariate

Check out the user manual for more details about this module’s features.

PCA

Use the Multivariate >> PCA module to overview the major modes of variance in the data.

Methods

Specify the method selecting the number of components (components summarize trends for all variables), the scaling method to transform columns, decomposition `method` to calculate PCA and the `cross-validation` method to evaluate the models ability to reconstruct the original data.

Select `calculate` and view the analysis methods and results.

Explore and plot

Create a `cumulative screeplot` to view the cumulative (sum) variance explained for the selected components. Note the first two principal components, principal plane, explain the major modes of variance in the data. However, to better reconstruct the full data enough components to reproduce at least 80% of of the variance (as denoted by the eigenvalues or cross-validated q2).

Create a plot of sample (row) `scores` to view the cumulative (sum) variance explained for the selected components. Note the first two principal components, principal plane, explain the major modes of variance in the data. However, to better reconstruct the full data enough components to reproduce at least 80% of of the variance (as denoted by the eigenvalues or cross-validated q2).

Create a `diagnostic` plot to overview extreme (leverage) and intermediate (DmodX) sample outliers. Leverage corresponds to sample outliers in the principal plane and DmodX samples which are far in the projection (orthogonal) to the principal plane. Any samples with high leverage and DmodX may need to be removed to improve the model fit of projection pursuits like PCA.

Create a `loadings` plot to view variable contributions to samples `scores`. Use the labels to specify variable names and text size. Variables with large loadings on the x-axis or y-axis have the largest contribution to defining sample scores in the x-axis and y-axis, respectively. For example the following plot suggests that as sample scores move left on the x-axis the contain larger amounts of 1,5-anhydroglucitol. Variable loadings can be colored base on variable meta data. For example, we can view which variables displayed significant differences for `class` based on the statistical test (shown in green).

Create a `biplot` which overlays samples `scores` and variable `loadings`. This plot can be used to get an overview of the variable abundance among samples as encoded by the PCA. For examples, diabetic samples generally have high amounts of 1,5,-anhydroglucitol (the reverse can be said of non-diabetic samples) and diabetic samples have high amounts of 2-hydroxybutanoic acid (the reverse can be said of diabetic samples).

Report

Create areport to save methods and results.

Save

Save the results for later analyses. This will save the principal component `scores`,`sample leverage` and `DmodX`.