My colleagues and I are in the process of reviewing the methodology of our Index and as part of that we are exploring using principle component analysis to weight the variables that we have in the Index. We have assembled our dataset and have an unbalanced panel with the panel variable being country, and the time variable being years from 1990-2015.
We wish to estimate a number of principle component models to see what principle components there are in the data. For example in the area of health we want to see what principle components underlie the variables: life expectancy, infant mortality, immunisations, cardiovascular disease rates, etc.
However, I was wondering if we can estimate a principle component model on the panel dataset? Or does this imply dependence in the observations, meaning that we should run our PCA on a cross-section of countries (time-averaged)?
My fear is that because the observations will be correlated across country-years, the principle component model will produce inaccurate principle components and variable loadings. Is this a valid fear?
Obviously taking advantage of the time-dimension of the dataset increases the number of observations and so increases the number of variables we can include in the model, but I do not wish to increase the number of observations and testable variables this way if it produces inaccurate results.
Any advice you may be able to offer would be very much appreciated. Apologies if any of the above is not well-explained, I would be happy to provide more info if helpful.
We wish to estimate a number of principle component models to see what principle components there are in the data. For example in the area of health we want to see what principle components underlie the variables: life expectancy, infant mortality, immunisations, cardiovascular disease rates, etc.
However, I was wondering if we can estimate a principle component model on the panel dataset? Or does this imply dependence in the observations, meaning that we should run our PCA on a cross-section of countries (time-averaged)?
My fear is that because the observations will be correlated across country-years, the principle component model will produce inaccurate principle components and variable loadings. Is this a valid fear?
Obviously taking advantage of the time-dimension of the dataset increases the number of observations and so increases the number of variables we can include in the model, but I do not wish to increase the number of observations and testable variables this way if it produces inaccurate results.
Any advice you may be able to offer would be very much appreciated. Apologies if any of the above is not well-explained, I would be happy to provide more info if helpful.