Department of Mathematics

General Motors

Data mining to determine customer satisfaction over time

Proposer/liaison Eleanor M. Feit, GM North American Product Development.

Each year, General Motors surveys thousands of new vehicle owners asking them to rate their satisfaction with their new vehicles. In addition to rating their overall satisfaction survey respondents rate specific aspects of their vehicle such as
“Braking performance under normal conditions” or “Quietness inside the vehicle.” Once we get the survey responses,
we typically take average survey ratings for each vehicle to determine whether GM’s customers are more or less satisfied than customers of competitive vehicles. The data collected from these surveys is archived and is readily available going back to 1997.

We would like to mine this data to identify trends in customer satisfaction over time. We would like to be able to answer the question “How satisfied do we expect customers of Lexus RX300 to be two years from now? Based on past trends, will customers be more satisfied than they are now?” If we could answer this question, we would be in a much better position to know how good GM’s luxury sport utility vehicle will need to perform to compete with the RX300.

We would also like to identify significant “step function” improvements in satisfaction that occurred on competitive vehicles
in the past. For instance, if we were able to identify that satisfaction with braking on the Civic significantly improved between model years 1999 and 2000, then we could ask engineers to review design changes that occurred in those years and try to identify what about the design made it more satisfying to the customer.

While both of these problems may sound simple at the onset, data analysis of the automotive market is always problematical due to the tremendous complexity of our products. While there are just over 300 nameplates (e.g. Pontiac Vibe) in the US market, there are numerous variations of each nameplate, with different engines, transmissions,
and other features that could significantly impact customer satisfaction. Sometimes the differences between two variants of the same nameplate are more significant than the differences between two competitive nameplates.

This analysis is also potentially impacted by demographic skews that may exist in the survey data. For example, we know that men and women who purchase the same vehicle may have a tendency to rate the vehicle differently. Women who purchased the PT Cruiser in the second quarter of 2002 on average rated it 3.03 on a scale of 0-4, where men rated it 2.71. Given that men and women who purchase the same car rate it differently, maybe we should adjust the average ratings that we get for vehicles that are purchased predominantly by men or by women. There may be a need for other adjustments based on age or other demographics.

Because the survey data is proprietary, we will supply the team with a masked version. For instance, we may not reveal the names of the specific satisfaction areas. We will also make some hard performance data available to the team, such as acceleration times.

The ideally completed project deliverable would be a forecasting model used to predict trends in customer satisfaction and a method for sorting through large number of vehicles to identify significant changes in customer satisfaction over time. GM uses Excel, Access and Minitab as our primary data analysis software, so any delivered analysis tools should be compatible
with that software.

(This summary was prepared by Eleanor M. Feit.)

Top of Page