--- title: "How to use FactorAssumptions" author: "Jose Eduardo Storopoli" date: "7/16/2019" references: - id: hair2018 title: Multivariate data analysis author: - family: Hair given: Joseph F. - family: Black given: William C. - family: Babin given: Barry J. - family: Anderson given: Rolph E. edition: 8th ed. publisher: Cengage Learning type: book issued: year: 2018 output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{How to use FactorAssumptions} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{FactorAssumptions} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # FactorAssumptions Set of Assumptions for Factor and Principal Component Analysis Description:Tests for Kaiser-Meyer-Olkin (KMO) and communalities in a dataset. It provides a final sample by removing variables in a iterable manner while keeping account of the variables that were removed in each step. ## What is KMO and Communalities? *Factor Analysis* and *Principal Components Analysis* (PCA) have some precautions and assumptions to be observed (@hair2018). The first one is the KMO (Kaiser-Meyer-Olkin) measure, which measures the proportion of variance among the variables that can be derived from the common variance, also called systematic variance. KMO is computed between 0 and 1. Low values (close to 0) indicate that there are large partial correlations in comparison to the sum of the correlations, that is, there is a predominance of correlations of the variables that are problematic for the factorial/principal component analysis. @hair2018 suggest that individual KMOs smaller than 0.5 be removed from the factorial/principal component analysis. Consequently, this removal causes the overall KMO of the remaining variables of the factor/principal component analysis to be greater than 0.5. The second assumption of a valid factor or PCA analysis is the communality of the rotated variables. The commonalities indicate the common variance shared by factors/components with certain variables. Greater communality indicated that a greater amount of variance in the variable was extracted by the factorial/principal component solution. For a better measurement of factorial/principal component analysis, communalities should be 0.5 or greater (@hair2018). ## Loading an example dataset First we will load an example dataset `bfi` from `psych` and load the package `FactorAssumptions` ```{r bfi, message=FALSE} library(FactorAssumptions, quietly = T, verbose = F) bfi_data <- bfi #Remove rows with missing values and keep only complete cases bfi_data <- bfi_data[complete.cases(bfi_data),] head(bfi_data) ``` ## Performing the KMO Assumptions First we will perform the $KMO > 0.5 assumption$ for all individuals variables in the dataset with the `kmo_optimal_solution` function ```{r KMO} kmo_bfi <- kmo_optimal_solution(bfi_data, squared = FALSE) ``` Note that the `kmo_optimal_solution` outputs a list: 1. the final solution as `df` 2. removed variables with $invidual KMO < 0.5$ as `removed` 3. Anti-image covariance matrix as `AIS` 4. Anti-image correlation matrix as `AIR` In our case none of the variables were removed due to low individual KMO values ```{r removed_kmo} kmo_bfi$removed ``` ## Performing the Communalities Assumptions The parallel analysis of `bfi` data suggests seven factors we will then perform the assumptions for all $individual communalities > 0.5$ with the argument `nfactors` set to 7. We can use either the values `principal` or `fa` functions from `psych` package for argument `type` as desired: * `principal` will perform a *Principal Component Analysis* (PCA) * `fa` will perform a *Factor Analysis* *Note*: we are using the `df` generated from the `kmo_optimal_solution` function *Note 2*: the default of rotation employed by the `communalities_optimal_solution` is `varimax`. You can change if you want. ```{r communalities} comm_bfi <- communalities_optimal_solution(kmo_bfi$df, type = "principal", nfactors = 7, squared = FALSE) ``` Note that the `communalities_optimal_solution` outputs a list: 1. the final solution as `df` 2. removed variables with $invidual communalities < 0.5$ as `removed` 3. A table with the communalities loadings from the variables final iteration as `loadings` 4. Results of the final iteration of either the `principal` or `fa` functions from `psych` package as `results` In our case 3 variables were removed in an iterable fashion due to low individual communality values. And they are listed from the lowest communality that were removed until rendered an optimal solution. ```{r removed_comm} comm_bfi$removed ``` And finally we arrive at our final principal components analysis rotated matrix. You can export it as a CSV with `write.csv` or `write.csv2` ```{r final_solution} comm_bfi$results ``` # References