method
to calculate PCA and the cross-validation
method to evaluate the models ability to reconstruct the original data.calculate
and view the analysis methods and results.cumulative screeplot
to view the cumulative (sum) variance explained for the selected components. Note the first two principal components, principal plane, explain the major modes of variance in the data. However, to better reconstruct the full data enough components to reproduce at least 80% of of the variance (as denoted by the eigenvalues or cross-validated q2).scores
to view the cumulative (sum) variance explained for the selected components. Note the first two principal components, principal plane, explain the major modes of variance in the data. However, to better reconstruct the full data enough components to reproduce at least 80% of of the variance (as denoted by the eigenvalues or cross-validated q2).diagnostic
plot to overview extreme (leverage) and intermediate (DmodX) sample outliers. Leverage corresponds to sample outliers in the principal plane and DmodX samples which are far in the projection (orthogonal) to the principal plane. Any samples with high leverage and DmodX may need to be removed to improve the model fit of projection pursuits like PCA.loadings
plot to view variable contributions to samples scores
. Use the labels to specify variable names and text size. Variables with large loadings on the x-axis or y-axis have the largest contribution to defining sample scores in the x-axis and y-axis, respectively. For example the following plot suggests that as sample scores move left on the x-axis the contain larger amounts of 1,5-anhydroglucitol. Variable loadings can be colored base on variable meta data. For example, we can view which variables displayed significant differences for class
based on the statistical test (shown in green).biplot
which overlays samples scores
and variable loadings
. This plot can be used to get an overview of the variable abundance among samples as encoded by the PCA. For examples, diabetic samples generally have high amounts of 1,5,-anhydroglucitol (the reverse can be said of non-diabetic samples) and diabetic samples have high amounts of 2-hydroxybutanoic acid (the reverse can be said of diabetic samples).scores
,sample leverage
and DmodX
.