Hello! I have one question to you. PCA is usually not validated alrorithn beacase it is not a learning method. How does PCA use cross-validation and why is that?

The validation for PCA is done by projecting new (validation) samples onto a PCA model. The explained (or residual) variance curve can then be evaluated for both calibration and validation samples, where the calibration curve shows the explained variance for the samples used for building the model and the validation curve shows the explained variance for new samples projected on the model built, when different number of principal components are included. This can be useful to make a more informed selection of the optimal number of principal components and to assess the quality of the PCA model, for example, for detecting potential outliers among the calibration set or for assessing if the model is applicable for new samples (calibration and validation curve should follow a similar trend and the validation explained variance is ideally not much lower than the calibration set).

Do I understand correctly, i.e. validation samples are not used to build a PCA model (in the process of separation the residuals matrix or as stopping criteria in the NIPALS algorithm). Cross-validation samples do not change the scores plot, right?

We are only talking about PCA for using PC, for example, as input vectors of neural networks. In this case validation does not change the values? Only helps to determine the number of PCs, right?

The idea behind validation is that you use samples not used for building the model to test the model, so in theory validation does not affect the calculation of model parameters, such as scores.

In the case of cross validation, the calibration samples are divided into groups. Multiple calibration rounds occur, where at each round one group is left out of model building to use for testing. The group that is left out for testing is alternated at each calibration round. The values returned from the samples at the round that they were test sets (be they explained variance, residuals, etc) are used for validation purposes, to test/assess the model. The final model is calibrated using all samples. More information is available in the Help menu of Unscrambler: Help -> Contents -> Validation

The program will calculate:
1 case, average PC values were calculated for some groups of cross-validation (for example 4 segments),
2 case, the PC values is formed after the PCA of all samples, and the results of cross-validation are used only as statistics?
That was the question.

## Thread Reply

## Please briefly explain why you feel this answer should be reported .

Report CancelThe validation for PCA is done by projecting new (validation) samples onto a PCA model. The explained (or residual) variance curve can then be evaluated for both calibration and validation samples, where the calibration curve shows the explained variance for the samples used for building the model and the validation curve shows the explained variance for new samples projected on the model built, when different number of principal components are included. This can be useful to make a more informed selection of the optimal number of principal components and to assess the quality of the PCA model, for example, for detecting potential outliers among the calibration set or for assessing if the model is applicable for new samples (calibration and validation curve should follow a similar trend and the validation explained variance is ideally not much lower than the calibration set).

## Please briefly explain why you feel this answer should be reported .

Report CancelDo I understand correctly, i.e. validation samples are not used to build a PCA model (in the process of separation the residuals matrix or as stopping criteria in the NIPALS algorithm). Cross-validation samples do not change the scores plot, right?

We are only talking about PCA for using PC, for example, as input vectors of neural networks. In this case validation does not change the values? Only helps to determine the number of PCs, right?

## Please briefly explain why you feel this answer should be reported .

Report CancelThe idea behind validation is that you use samples not used for building the model to test the model, so in theory validation does not affect the calculation of model parameters, such as scores.

In the case of cross validation, the calibration samples are divided into groups. Multiple calibration rounds occur, where at each round one group is left out of model building to use for testing. The group that is left out for testing is alternated at each calibration round. The values returned from the samples at the round that they were test sets (be they explained variance, residuals, etc) are used for validation purposes, to test/assess the model. The final model is calibrated using all samples. More information is available in the Help menu of Unscrambler: Help -> Contents -> Validation

## Please briefly explain why you feel this answer should be reported .

Report CancelThe program will calculate:

1 case, average PC values were calculated for some groups of cross-validation (for example 4 segments),

2 case, the PC values is formed after the PCA of all samples, and the results of cross-validation are used only as statistics?

That was the question.

Which option is correct?

We only talk about PCA, not PCR or PLS

## Please briefly explain why you feel this answer should be reported .

Report CancelCase 2 is correct.