*The Dow Chemical Company*
**Proposed Project for the
MSU Industrial Math Students**

Variable Selection in Time Series Modeling Projects with Large Numbers of Leading Economic Indicators

We have found that certain public indices are good predictors of future economic activity, measured in certain ways. These indices usually are aggregates of many different economic time series data, such as unemployment, economic growth, etc. The number of these time series can run into the hundreds or even thousands and can be reported on the basis of many different time scales (weeks, months, etc.). We would like to analyze these underlying economic time series to see which of these time series are the most relevant for predicting future economic activity. This presents a particular problem relating to Time Series "Data Mining" that is not particularly rich in methodologies in the open literature. Two particular approaches are of interest, one unsupervised and one supervised. The unsupervised approach to be examined is Variable "Reduction," which involves such methods as Similarity (Leonard, Lee (2008)) and potentially traditional PCA and Cluster Analysis (VARCLUS SAS Institute (2008)). The supervised approach to be examined is Variable "Selection," which involves such methods as Similarity or traditional variable selection methods via non-time series data mining best practices (SAS EM, SAS Institute (2008)). The traditional Data Mining approach is very time consuming given the "poor man's" approach to modeling time series data has to be adapted. That is, first differences are taken on all X's, then, a multitude of lags are taken on the X's then the traditional Data Mining Variable selection approaches are applied. Finding the most appropriate approaches for Time Series variable reduction and then variable selection is the key deliverable requested herein. Various large time series data sets will be provided as test cases.