# Principle Component Analysis ( Pca )

1329 Words6 Pages
PCA model Principle component analysis (PCA) is often used to reduce the dimensionality of a data set, and the reduced data can then explain most of the variance within the original data (Guo, Wang & Louie, 2004). The main function of the PCA is to convert a number of interrelated variables into a smaller set of independent variables. The new independent variables which are called principal components (PCs). They are the linear combinations of the original variables (Jackson & J.E., 2005). The PCA model can be expressed as (Miller et al., 2002): Z_ik=∑_(j=1)^p▒〖G_ij H_jk 〗 i=1,2,…,m;k=1,2,…,n (1) Where: G_ij= the correlation of compound i with factor j H_jk= the relative impact of jth factor for the total contaminant of the kth…show more content…
Standardization of the original data before the PCA can help us better understand how the variables influence the analysis (Thurston & Spengler, 1985). Before applying the standardized data to the PCA, the outliers should be excluded. A sensitivity analysis should be done to determine the outliers which should be removed. The suspected outliers should be removed from the data set one at a time until the PCA reaches a stable result and the further exclusion of the samples has very little effect on the PCA results (Swietlicki et al., 1996). Outputs The output of the PCA based on Matlab includes the following: “coeff” file, “score” file, “latent” file, “tsquared” file, and the “explained” file. The “coeff” file is a matrix of the principal component coefficients, or we can call them loadings. If the input matrix is n-by-p, then the coefficient matrix should be p-by-p. The columns of the coefficient matrix represent the principal components, and the columns are placed in descending order according to the component variance. The “score” file gives us a matrix of the principal component score which represents the input matrix in the new principal component space. The score file represents the projection of the