Department of Applied Statistics, Operations Research and
Quality
Polytechnic University of Valencia, Spain
Date :
Wednesday October 29, 2004.
Time :
3:30pm
Address
John Hodgins Engineering Building
Room:
326H
NOTE TIME AND LOCATION CHANGE
TITLE:
Multivariate Statistical Process Control with Missing Data
Using Principal Component Analysis
ABSTRACT:
This talk addresses the problem of using future multivariate
observations with missing data to estimate latent variable scores from an
existing PCA model. This is a critical issue in Multivariate Statistical
Process Control (MSPC) schemes where the process is continuously
interrogated based on an underlying PCA model. Several methods for
estimating the scores of new individuals with missing data are presented.
The basis for each method and the expressions for the score estimators,
and the covariance matrices of the estimation errors are developed. These
methods can be seen as different ways to impute values for the missing
variables. The efficiency of the methods is studied through simulations
based on an industrial data set. Missing data produce an increase in the
uncertainty associated to the monitoring statistics that reduces the
capability of the model to monitor new observations based on the normal
operative condition (NOC) control limits. The second goal of this talk is
to discuss how to characterise the uncertainty that missing data add to the
statistics employed for process monitoring: residuals, square prediction
error (SPE), scores, Hotelling T2, contribution of the process variables to
residuals, SPE and scores, and also contribution of the individual scores to
the Hotelling T2 . This added uncertainty provides useful information to
decide the appropriate action to be taken in process monitoring: to use
the estimated monitoring statistics as usual, to try to recover key
unmeasured variables or to shut down the monitoring scheme and wait until
the new observation is available. Several methods are introduced. Several
industrial data sets are used to illustrate the performance of the methods
to diagnose different situations, identifying those variables that generate
more uncertainty on every monitoring statistic and the variables responsible
for eventually out of control events, when the new observation has missing
data.
About the Speaker
Dr. Alberto Ferrer performed his undergraduate and graduate studies at
Polytechnic University of Valencia, Spain, where he presently is a
Full Professor in the Department of Applied Statistics, Operations Research
and Quality. He teaches undergraduate and graduate courses in applied
statistics. His areas of research include integration of statistical and
engineering process control, experimental designs and dispersion effects,
and multivariate statistical process control. Dr. Ferrer has extensive
consulting a variety of Spanish companies from different industrial sectors
(parts and process industries). He has been an active participant in the
foreign educational cooperation projects of the Government of Spain with
Latin America; this has taken him to Nicaragua, El Salvador, Mexico and Peru
where he has trained statisticians and engineers. Dr. Ferrer has given talks
at several international conferences as well as at conferences in Spain.
References
Arteaga, F.; Ferrer, A. (2002): Dealing with missing data in MSPC:
several methods, different interpretations, some examples. Journal
of Chemometrics16, 408-418.
Nelson, P.P.C. (2002): Treatment of missing measurements in PCA and
PLS models, M. Eng. Thesis. Department of Chemical Engineering, McMaster
University. Hamilton, Ontario, Canada.
Nelson, P.P.C.; Taylor, P.A.; MacGregor, J.F. (1996):
Missing data methods in PCA and PLS: Score
calculations with incomplete observations. Chemometrics and Intelligent
Laboratory Systems35, 45-65.