Text

1003361 - On-Line Monitoring of Instrument Channel Performance, Volume 1: Guidelines for Model Development and Implementation
	ML060400223
Person / Time
Site:	Summer
Issue date:	12/31/2004
From:	Rasmussen B, Shankar R; Edan Engineering Corp, Electric Power Research Institute
To:	; Office of Nuclear Reactor Regulation
References
	LAR 05-0677, RC-06-0024 1003361
	Download: ML060400223 (172)
	v • d • e

r2I On-Line Monitoring of Instrument Channel Performance I

Volume 1: Guidelines for Model Development and Implementation

+5 SED WARNING:

u A

< Please read the License Agreement A

on the back cover before removing A tAd the wrapping material.

hItmmtiso

On-Line Monitoring of Instrument Channel Performance Volume 1: Guidelines for Model Development and Implementation 1003361 Final Report, December 2004 EPRI Project Manager R. Shankar EPRI

3412 Hillview Avenue, Palo Alto, California 94304

PO Box 10412, Palo Alto, California 94303. USA 800.313.3774

650.855.21:21

askepri@epri.com

www.epri.com.

DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM:

(A) MAKES ANY WARRANTY OR REPRESEENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I)

WITH RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS, INCLUDING ANY PARTY'S INTELLECTUAL PROPERTY, OR (III) THAT THIS DOCUMENT IS SUITABLE TO ANY PARTICULAR USER'S CIRCUMSTANCE; OR (B)

ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS DOCUMENT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT.

ORGANIZATION(S) THAT PREPARED THIS DOCUMENT EPRI Edan Engineering Corp.

ORDERING INFORMATION Requests for copies of this report should be directed to EPRI Orders and Conferences, 1355 Willow Way, Suite 278, Concord, CA 94520, (800) 313-3774, press 2 or internally x5379, (925) 609-9169, (925) 609-1310 (fax).

Electric Power Research Institute and EPRI are registered service marks of the Electric Power Research Institute, Inc. EPRI. ELECTRIFY THE WORLD is a service mark of the Electric Power Research Institute, Inc.

CITATIONS This report was prepared by Edan Engineering Corporation 900 Washington St., Suite 830 Vancouver, WA 98660 Principal Investigator E. Davis EPRI I&C Center Kingston Fossil Plant 714 Swan Pond Road Harriman, TN 37748 Principal Investigator B. Rasmussen This report describes research sponsored by EPRI.

The report is a corporate document that should be cited in the literature in the following manner:

On-Line Monitoring of Instrument Channel Performance, Volume 1: Guidelines for Model Development and Implementation, EPRI, Palo Alto, CA: 2004. 1003361.

iii

REPORT

SUMMARY

=

Background===

The On-Line Monitoring (OLM) Group operates under the Instrumentation and Control (I&C)

Nuclear Program. There are at present two programs under this group (separately funded under individual subscription) that address the needs of power plants with regard to instrument monitoring, instrument calibration reduction/extension, and sensor validation. The two groups are the Instrument Monitoring and Calibration (IMC) Users Group, formed in 2000, and the On-Line Monitoring Implementation Users Group, formed in 2001.

EPRI's strategic role in on-line monitoring is to facilitate its implementation and cost-effective use in numerous applications at power plants. EPRI has sponsored an on-line monitoring implementation project at multiple nuclear plants specifically intended to install and use on-line monitoring technology. The goal is to apply on-line monitoring to all types of power plant applications and to document all aspects of the implementation process in a series of EPRI deliverables. These deliverables will cover installation, modeling, optimization, and proven cost-benefit.

EPRI will continue to foster the development of on-line monitoring technology and its application via the IMC Users Group. Through this group, on-line monitoring as a key technology will continue to be supported technically as its use grows throughout the industry.

The EPRI IMC Users Group will also continue to support generic technical issues (such as providing implementation guidance for calibration reduction of safety-related instrumentation) associated with on-line monitoring.

This report is the first in a three-volume set. On-Line Monitoring of Instrument Channel Performance, Volume 2: Model Examples, Algorithm Details, and Reference Information contains more detailed descriptions of the empirical modeling algorithms, specific examples and results of developed models, and further evaluations of the software used in this project. On-Line Monitoring of Instrument Channel Performance, Volume 3: Applications to Nuclear Power Plant Technical Specification Instrumentation provides an overview of how to extend calibration intervals by the use of on-line monitoring, describes the technical specification changes that are recommended to extend calibration intervals, addresses measurement and estimation uncertainty, provides guidance regarding on-line monitoring acceptance criteria, and addresses software verification and validation criteria for on-line monitoring applied to technical specification-related instruments.

v

Objectives

To provide technical information regarding on-line monitoring as a calibration extension and performance-monitoring tool and to address data acquisition, data quantity, and data quality issues related to modeling

To provide guidance regarding evaluating and responding to identified failures

To provide an overview of the Multivariate State Estimation Technique (MSET) and the SureSense monitoring software

To provide technical information describing software installation and setup for on-line monitoring and to address data management and interface issues

To explain the steps and actions necessary to implement an on-line monitoring system and to list the steps and actions necessary to declare that a model is ready for use Approach This report provides detailed information regarding the application of on-line monitoring to nuclear plant instrument systems. The MSET is described because it has been the basis for the EPRI OLM implementation project from 2000-2003. Recent modifications to the SureSense software (supplied by Expert Microsystems, Inc. for use in this project) have introduced an alternative technique, the Expert State Estimation Engine (ESEE) with at least equivalent capabilities. The ESEE model was used for all model development and implementation work in 2004. Issues related to the implementation and use of on-line monitoring systems are presented in this report to enable users to assess plant-specific needs and the limitations of the techniques.

Results Industry and EPRI experience at several plants has shown on-line monitoring to be very effective in identifying out-of-calibration instrument channels or potential equipment degradation problems. The results have been very encouraging. Substantial progress has been made over the course of this multiyear project.

EPRI Perspective EPRI's strategic role in on-line monitoring is to facilitate its implementation and use in numerous applications at power plants. On-line monitoring of instrument channels provides increased information about the condition of monitored channels through accurate, more frequent evaluation of each channel's performance over time. This type of performance monitoring is a methodology that offers an alternative approach to traditional time-directed calibration. EPRI is committed to the development and implementation of on-line monitoring as a tool for extending calibration intervals and evaluating instrument performance.

Keywords Calibration Condition monitoring Instrumentation and control Maintenance Nuclear plant operations and maintenance Signal validation vi

EPRI licensed Material ACKNOWLEDGEMENTS EPRI, the EPRI I&C Center, and Edan Engineering recognize the following individuals for their contributions to this project. Their time and attention in support of this project are greatly appreciated.

Randy Bickford Expert Microsystems, Inc.

David Carroll South Carolina Electric and Gas Pat Colgan Exelon Corporation Steve Dixon Exelon Corporation William Drendall Amergen Energy Dave Hooten Carolina Power and Light Jerry Humphreys CANUS Corporation Aaron Hussey EPRI Robert Kennedy Exelon Corporation Calvin C. King Jr.

Public Service Electric and Gas Co.

Vo Lee Expert Microsystems, Inc.

Hubert Ley Argonne National Laboratory David Lillis British Energy Edwina Liu Expert Microsystems, Inc.

Connie Love Tennessee Valley Authority Adrian Miron Argonne National Laboratory Karl Nesmith Tennessee Valley Authority vii

EPRI licensed Material Mike Norman Ken Olenginski Steve Orme Keith Pierce Jeff Richardson Richard Rusaw Larry Straub Bill Turkett Tom Wei Bill Winters John Yacyshyn Chenggang Yu Nela Zavaljevski Jack Ziegler Tennessee Valley Authority Exelon Corporation British Energy Public Service Electric and Gas Co.

British Energy South Carolina Electric and Gas Exelon Corporation South Carolina Electric and Gas Argonne National Laboratory Exelon Corporation Exelon Corporation Argonne National Laboratory Argonne National Laboratory Exelon Corporation viii

EPRI Licensed Material CONTENTS 1 INTRODUCTION............................................

1-1 1.1 Report Purpose............................................

1-1 1.2 Report Applicability.............................................

1-2 1.3 Report Audience............................................

1-3 1.4 Considerations Before Starting Model Development 1-3 1.5 On-Line Monitoring Overview.............................................

1-4 1.6 EPRI's Role in On-Line Monitoring.............................................

1-5 1.7 Terminology Used in This Report.............................................

1-6 1.7.1 Channel, Sensor, and Signal............................................

1-6 1.7.2 Modeling Terms............................................

1-7 2 DATA MANAGEMENT............................................

2-1 2.1 On-Line Monitoring System Architecture.....................................................................2-1 2.1.1 Off-Line Batch Mode Using Historical Data............................................

2-2 2.1.2 On-Line Batch Mode Using Current Data............................................

2-3 2.1.3 Real-Time Mode............................................

2-4 2.2 Data File Naming............................................

2-5 2.3 Data Storage Format...............................

2-6 2.4 Data File Configuration Management...............................

2-8 3 OVERVIEW OF THE MULTIVARIATE STATE ESTIMATION TECHNIQUE, SURESENSE ON-LINE MONITORING SOFTWARE, AND MODEL DEVELOPMENT........... 3-1 3.1 Overview................................................................

3-1 3.2 MSET Software Functions................................................................

3-2 3.3 Functional Overview and General Capabilities of SureSense..................................... 3-3 3.4 Model Development Overview................................................................

3-4 ix

EPRI licensed Material 4 ON-LINE MONITORING PROCEDURES......................................................

4-1 4.1 Software Use......................................................

4-1 4.1.1 Procedure......................................................

4-1 4.1.2 Personnel Training......................................................

4-1 4.2 Model Development and Evaluation Procedures 4-1 4.2.1 Model Documentation......................................................

4-2 4.2.2 Periodic Model Evaluation......................................................

4-2 5 SIGNAL SELECTION..............................

5-1 5.1 Signal Selection - Where to Start..............................

5-1 5.1.1 Signals as Part of a Model..............................

5-1 5.1.2 Where to Start in Signal Selection for a Model................................................... 5-2 5.2 Model Size Considerations 5-4 5.3 The Importance of Correlation......................................................

5-4 5.3.1 Why Correlation Matters......................................................

5-4 5.3.2 Why Correlated Parameters Might Not Appear to Correlate............................... 5-6 5.3.3 Correlation Equation and Meaning......................................................

5-9 5.3.4 Example of an Uncorrelated Model................................................................... 5-11 5.4 Planning for Phases (Submodels).......................................................

5-12 5.4.1 What Is a Submodel?......................................................

5-12 5.4.2 Power Level as a Phase Determiner......................................................

5-12 5.4.3 Choosing a Power Signal......................................................

5-13 5.4.4 Phases and Signal Selection.............................................................. 5-14 6 DATA QUANTITY AND QUALITY......................................................

6-1 6.1 Data Quantity......................................................

6-1 6.1.1 Quantity of Data - Sample Frequency......................................................

6-1 6.1.2 Quantity of Data - How Much Historical Data to Acquire.................................... 6-2 6.2 Dealing with Bad Data...

................................................... 6-3 6.2.1 The Effect of Bad Data on the Estimation Process............................................. 6-3 6.2.2 Examples of Data Problems......................................................

6-5 6.2.3 Removing Bad Data From Dala Sets......................................................

6-18 6.2.4 Data Limit Filters.6-20 x

EPRI licensed Material 6.3 Data Archive Historian and Its Effect on Data Quality

.............................. 6-23 6.3.1 Problem Statement.6-23 6.3.2 The Effect of a Data Archive Historian on Signal Correlation.

6-23 6.3.3 The Effect of Data Archive Historian With Bad Data.

6-27 6.3.4 Dealing With a Data Archive Historian.

6-28 7 INITIAL TRAINING AND ESTIMATION.

7-1 7.1 Training and Estimation Methods - Technical Overview..

7-1 7.1.1 Training.

7-1 7.1.2 Estimation.

7-8 7.2 How to Train a Model..

7-10 7.2.1 Selecting the Initial Training Data.

7-10 7.2.2 Evaluating the Initial Training Data Adequacy.

7-11 7.2.3 Evaluating Training Adequacy.

7-13 7.2.4 Retraining the Model.

7-17 8 FAULT DETECTION AND ALARM RESPONSE.

8-1 8.1 Fault Detection - Technical Overview.

8-1 8.1.1 Background..................................................

8-1 8.1.2 Mean Tests..................................................

8-2 8.1.3 Variance Tests..................................................

8-3 8.1.4 Unique Probability Density Functions..................................................

8-3 8.1.5 Applying Conditional Probability to Failure Declaration...................................... 8-4 8.2 Failure Evaluation 8-4 8.2.1 Summary of How Failures Are Determined..................................................

8-4 8.2.2 Recommended Response to an Identified Failure.............................................. 8-4 8.2.3 Failures Identified Because of Data Acquisition Problems.................................. 8-6 8.2.4 Operation Outside the Training Space..................................................

8-8 8.2.5 Adjusting Phase-Determiner Settings for Transients........................................ 8-13 8.2.6 Occasional Outliers..................................................

8-17 8.2.7 Using Threshold Settings for Overly Sensitive Alarms...................................... 8-18 8.2.8 Incorrect Initial Training..................................................

8-21 8.2.9 Equipment Operating States Not Covered by Available Training Data............. 8-23 8.2.10 Inadequate Initial Training Within the Defined Operating Space................... 8-25 xi

EPRI licensed Material 9 OPERATING IN ON-LINE MODE........................................................

9-1 9.1 Modes of Operation........................................................

9-1 9.2 Making the Transition From Batch Mode to On-Line Mode 9-2 9.2.1 Training as a Batch Operation.9........................................................

9-3 9.2.2 Periodic On-Line Monitoring........................................................

9-4 9.2.3 True On-Line Monitoring........................................................

9-5 9.2.4 Look-Back Functions........................................................

9-6 9.3 Data Bridge Description........................................................

9-7 9.3.1 General Description........................................................

9-7 9.3.2 EPRI On-Line Monitoring Implementation Project Data Bridge.......................... 9-8 9.3.3 Data Bridge Programming and Setup........................................................

9-10 9.4 On-Line Monitoring System Operation........................................................

9-13 9.4.1 Producing Run Results........................................................

9-13 9.4.2 Evaluating Run Results........................................................

9-17 9.5 Using the Microsoft Windows Scheduler 9-23 10 DECLARING THAT A MODEL IS READY FOR USE.......................................................

10-1 10.1 Completing the Model........................................................

10-2 10.2 Automating Data Acquisition........................................................

10-2 10.3 Anticipating Failure Alarms........................................................

10-2 11 REFERENCES........................................................

11-1 A GLOSSARY........................................................

A-1 xii

EPRI licensed Material LIST OF FIGURES Figure 1 -1 Instrument Channel in Terms of On-Line Monitoring................................................ 1-7 Figure 2-1 On-Line Monitoring System.................................................................

2-3 Figure 2-2 Typical Plant Data Archiving and Retrieval System.................................................. 2-7 Figure 3-1 SureSense Operation.................................................................

3-4 Figure 3-2 Model Development Overview.................................................................

3-5 Figure 5-1 Typical SureSense Model.................................................................

5-1 Figure 5-2 Steam System Signals Modeled for One Steam Generator..................................... 5-2 Figure 5-3 Example of Highly Correlated Data - Turbine Pressure and Reactor Power............ 5-5 Figure 5-4 Example of Poorly Correlated Data - Steam Generator Level and Reactor Power.................................................................

5-6 Figure 5-5 Typical Correlation of Reactor Power to Turbine Pressure - Entire Range.............. 5-7 Figure 5-6 Typical Correlation of Reactor Power to Turbine Pressure - 100 Percent Power.................................................................

5-8 Figure 5-7 Correlation Values.................................................................

5-10 Figure 5-8 Circulating Water Pump Discharge Pressure Model.............................................. 5-11 Figure 6-1 Observations and Corresponding Estimates With Acceptable Training Data........... 6-4 Figure 6-2 Observations and Corresponding Estimates With Two Bad Data Points................. 6-5 Figure 6-3 Bad Data Acquisition.................................................................

6-6 Figure 6-4 Data Lockup for One Channel.................................................................

6-7 Figure 6-5 Example of Data Lockup in an Analysis.................................................................

6-8 Figure 6-6 Data Lockup for All Channels.................................................................

6-9 Figure 6-7 Data Lockup for Two Channels.................................................................

6-10 Figure 6-8 Missing Data................................................................. 6-11 Figure 6-9 Incorrect Data Values - Several Signals................................................................. 6-12 Figure 6-10 Incorrect Data Values - One Signal.................................................................

6-13 Figure 6-11 Incorrect Data Value - One Point.................................................................

6-14 Figure 6-12 Incorrect Data Values - Unreasonable Change for One Redundant Channel...... 6-15 Figure 6-13 Incorrect Data Values - Unreasonable Change for a Second Redundant Channel.................................................................

6-16 Figure 6-14 Loss of Significant Digits.................................................................

6-17 Figure 6-15 Effect of Loss of Significant Digils.................................................................

6-18 Figure 6-16 Example of Conditional Format Feature............................................................... 6-19 xiii

EPRI licensed Material Figure 6-17 Typical Example of Compressed Historian Data.................................................. 6-24 Figure 6-18 Historian Data Example - Flow Signal.................................................................. 6-25 Figure 6-19 Sample Data With Random Variation Included.................................................... 6-26 Figure 6-20 Sample Data After Data Compression and Subsequent Archive Retrieval.......... 6-26 Figure 6-21 Data Interpolation Errors..................................................................

6-27 Figure 7-1 Estimation Error as a Function of Vector Specification............................................ 7-5 Figure 7-2 Small Residual Results - Sensitive Fault Detection.................................................. 7-7 Figure 7-3 Larger Residual Results - Less Sensitive Fault Detection........................................ 7-8 Figure 7-4 Reactor Power - Constant for an Extended Period................................................ 7-11 Figure 7-5 Large Difference in Redundant Flow Measurements............................................. 7-12 Figure 7-6 Process Values Relatively Unchanged Over the Operating Cycle......................... 7-14 Figure 7-7 Process Value Change During End-of-Cycle Low-Power Operation...................... 7-15 Figure 7-8 Steam Generator Level During Two Months of Low-Power Operation................... 7-16 Figure 7-9 Routine Changes in Process Values..................................................................

7-17 Figure 8-1 Mean Disturbance Magnitude Tests..................................................................

8-2 Figure 8-2 Variance Disturbance Magnitude Tests.................................................................. 8-3 Figure 8-3 Typical Drift Behavior..................................................................

8-5 Figure 8-4 Residual Plot Showing Drift Significance Upon Failure Alarm.................................. 8-6 Figure 8-5 Extreme Example of Data Acquisition Error - Stuck Data......................................... 8-7 Figure 8-6 Extreme Example of Data Acquisition Error - Data Stuck for Almost One Year....... 8-7 Figure 8-7 Test Data Outside the Training Range - SureSense Result..................................... 8-9 Figure 8-8 Test Data Outside the Training Range - Actual Data............................................... 8-9 Figure 8-9 Test Data Outside the Training Range-System Operation................................... 8-10 Figure 8-10 Tank Level Variation..................................................................

8-11 Figure 8-11 Test Data Outside the Training Range - Equipment Repair................................. 8-12 Figure 8-12 Test Data Outside the Training Range - MSET Results....................................... 8-12 Figure 8-13 Test Data Outside the Training Flange - Data Acquisition Card Replacement.... 8-13 Figure 8-14 Signal Behavior During a Transient......................................................................8-14 Figure 8-15 One Transient During an Extended Period of Operation...................................... 8-15 Figure 8-16 Example of Routine Transients Exceeding the Training Space............................ 8-16 Figure 8-17 Short-Term Outlier..................................................................

8-17 Figure 8-18 Threshold Settings Applied to Residual Plot........................................................ 8-18 Figure 8-19 Recalibrated Sensor - Model Trained on Out-of-Calibration Data........................ 8-22 Figure 8-20 Recalibrated Sensor - Initially Out of Calibration.................................................. 8-23 Figure 8-21 Many Possible Operating States..................................................................

8-24 Figure 8-22 Example of a Pump That Almost Always Runs.................................................... 8-25 Figure 8-23 Unexpected Change in Estimate..................................................................

8-26 Figure 8-24 Inadequate Training for Pump On and Off Conditions..................................... I..... 8-27 Figure 9-1 Reference Implementation for Periodic On-Line Monitoring..................................... 9-4 xiv

EPRI licensed Material Figure 9-2 Directory Structure for a SureSense Data Bridge Implementation......................... 9-10 Figure 9-3 Setting Up a Data Set to Store Run Results........................................................... 9-15 Figure 9-4 SureSense User Interface Options for Monitoring.................................................. 9-17 Figure 9-5 Monitor Window Options for Each Signal............................................................

9-18 Figure 9-6 Sample Run Information for a Signal......................................................................9-19 Figure 9-7 Sample Observation Estimate Plot for a Signal...................................................... 9-20 Figure 9-8 Sample Residual Plot for a Signal............................................................

9-21 Figure 9-9 Option to Run a Data Set If Not Enabled for On-Line Monitoring........................... 9-21 Figure 9-10 Model Run Summary Report.............................................................

9-22 Figure 9-11 Notification That the Selected Model Is Not Currently Trained............................. 9-23 Figure 9-12 Microsoft Windows Task Scheduler............................................................

9-24 Figure 9-13 Microsoft Windows Scheduled Task Wizard Introduction..................................... 9-24 Figure 9-14 Select the SureSense Command File (*.BAT) Using the Browse Button............. 9-25 Figure 9-15 Name the Scheduled Task and Select the Run Frequency.................................. 9-25 Figure 9-16 Provide Additional Scheduling Details............................................................

9-26 Figure 9-17 Provide Microsoft Windows Login Information..................................................... 9-26 Figure 9-18 Review and Install the Scheduled Task............................................................

9-27 xv

EPRI Licensed Material LIST OF TABLES Table 1-1 On-Line Monitoring Implementation Preliminary Checklist........................................ 1-4 Table 4-1 Instrument Channel Description..................................................

4-4 Table 7-1 Level Example Vectors Selected by MinMax..................................................

7-2 Table 8-1 Instrument Channel Uncertainty Sources..................................................

8-20 Table 10-1 Checklist for Determining That a Model Is Ready for Use..................................... 10-1 xvii

EPRP Licensed Material 1

INTRODUCTION 1.1 Report Purpose This report discusses the actual implementation of on-line monitoring (OLM) to nuclear plant instrument systems. A considerable amount of application and development has been completed by the EPRI On-Line Monitoring Implementation Project. The experience gained has been documented here for the benefit of EPRI members.

This report, the first volume of a three-volume set, provides an overview of the EPRI on-line monitoring project activities and definitions for the majority of the terminology used to describe on-line monitoring and its implementations. The report discusses data management issues related to implementations and describes various modes of operation possible for on-line monitoring.

Overviews of on-line monitoring and the software product used throughout this project are presented. This report is devoted mainly to presenting the various tasks that must be completed to prepare models for and implement an on-line monitoring system including data preparation, signal selection, model training and evaluation, model deployment, and model retraining. Data quality, data quantity, fault-detection techniques, and alarm response mechanisms are related issues that are also discussed. An extensive glossary of on-line monitoring terms is provided in Appendix A.

The second volume of this three-volume set, On-Line Monitoring of Instrument Channel Performance, Volume 2: Model Examples, Algorithm Details, and Reference Information [1],

serves mainly as a reference to this first volume. It contains detailed descriptions of the Multivariate State Estimation Technique (MSET) and the instrument calibration and monitoring program for redundant channels. These two algorithms are discussed in detail because they were the primary tools used under the EPRI on-line monitoring projects. Numerous examples are presented for models that were developed for the participants of this project. Model maintenance (or retraining) is also demonstrated to illustrate the process of updating models when they require modifications to their training datasets. Finally, a recent software product developed specifically for cleaning data files and removing bad data. prior to developing on-line monitoring models is reviewed and demonstrated.

The third volume, On-Line Monitoring of Instrument Channel Performance, Volume 3:

Applications to Nuclear Power Plant Technical Specification Instrumentation [2], builds on the groundwork presented in the first two volumes and discusses on-line monitoring applications specifically for safety-related technical specification instrumentation at nuclear power plants.

The report presents recommendations for the safety-related channels that are suitable for model deployment along with the related issue of single-point monitoring. The U.S. Nuclear Regulatory Commission's (NRC's) safety evaluation report [4], which reviews on-line monitoring for 1-1

EPRI licensed Material Introduction nuclear power applications, is provided for reference. The results from a detailed uncertainty analysis performed on the MSET and an additional summary of previous results obtained for the instrument calibration and monitoring program for redundant sensors are also provided.

Verification and validation studies of both MSET and the SureSense' on-line monitoring software are discussed along with a software acceptance test procedure for the MSET. Additional discussions are provided regarding redundant vs. nonredundant empirical modeling techniques as applied to safety-related instrumentation.

The objectives of this first report include the following:

To provide technical information regarding on-line monitoring as a calibration extension and performance-monitoring tool and to address data acquisition, data quantity, and data quality issues related to modeling

To provide guidance regarding evaluating and responding to identified failures

To provide an overview of MSET and the SureSense monitoring software

To provide technical information describing software installation and setup for on-line monitoring and to address data management and interface issues

To explain the steps and actions necessary to implement an on-line monitoring system and to list the steps and actions necessary to declare that a model is ready for use 1.2 Report Applicability This report addresses specific issues and considerations applicable to any on-line monitoring system. The EPRI on-line monitoring implementation project applied MSET to nuclear plant instrument systems. The modeling guidelines directly describe an MSET approach. Argonne National Laboratory (ANL) originally developed the MSET approach for nuclear power applications. The SureSense software contains an optional MSET toolkit that was selected for use by the EPRI On-Line Monitoring Implementation Project members. In some cases, SureSense software features are used to illustrate certain aspects related to modeling. In addition, the latest version of the SureSense software (Version 2.0) incorporates a proprietary empirical model of the Expert State Estimation Engine (ESEE). Version 2.0 of SureSense has been used for all applications and implementations in the last year. The guidelines presented here apply in general to both empirical models-MSET and the newer proprietary algorithm. The software can be utilized to easily convert models of one specific model type to the other.

This report is not intended to serve as a software user's guide. Instead, it addresses modeling issues at a higher level and rarely discusses specific model settings. The EPRI On-Line Monitoring Implementation Project has produced a separate report [3] that should be reviewed for SureSense-specific considerations. For other software systems, refer to the specific supplier's user's guide.

In summary, the modeling guidelines provided in this topical report apply directly to MSET and the examples have been illustrated with the SureSense software. However, many of the principles of modeling described here can be applied to other on-line monitoring methods. In

' SureSense is a trademark of Expert Microsystems, Inc.

1-2

EPRI icensed Material Introduction particular, issues associated with data quality, model training, failure detection, and retraining apply to most empirical model-based on-line monitoring methods.

1.3 Report Audience This report is primarily intended for instrumentation and control engineers and technicians at nuclear plants. The examples provided in this report apply directly to nuclear plant systems, but the concepts of on-line monitoring can be applied to virtually any application involving signal analysis and validation.

Readers of this report are assumed to have the following skills:

A basic knowledge of nuclear plant protection and control instrumentation

A general understanding of statistics and statistical analysis methods (although this report has been prepared in a manner that minimizes the in-depth discussion of underlying statistical methods)

The readers of this report are not expected to have detailed knowledge of on-line monitoring theory. Accordingly, this report attempts to maintain a balance between providing too much versus too little information or technical detail. Much of the information provided has not been previously assembled or published in a comprehensive and instructive format.

1.4 Considerations Before Starting Model Development On-line monitoring is conceptually simple--send plant computer data to the software, and it will identify any drifting channels. The reality is that data management, software model configuration, and subsequent failure detection require some effort and knowledge. Before starting model development with an on-line monitoring software package, the following steps are recommended:

Use a very good personal computer for model development. Managing and evaluating the amount of data recommended in this effort require substantially better computing ability than other tasks typically performed on the computers of most engineers. Beginning with an older computer with inadequate memory will be a frustrating experience. Everything possible should be done to ensure that the computer does not hinder the modeling and analysis effort.

Test the data acquisition method that will be used by obtaining large data files containing 1-minute sample rate data. Historical data will be required to develop and test the model; these data are typically obtained from a d ata archive. Models cannot be developed or tested unless historical data are readily retrievable. Some nuclear plants might have outdated local area networks (LANs). Accessing data across a LAN can become the limiting part of data acquisition. Ensure that personnel from the computer services department are involved and understand the quantity of data that will be managed.

1-3

EPRI Licensed Material Introduction

Determine the intended purpose of on-line monitoring at your facility. On-line monitoring for calibration optimization might require a different set of signals than on-line monitoring for performance monitoring or equipment condition monitoring. Fault evaluation techniques also might differ depending on the purpose.

Decide what models to develop. Review On-Line Monitoring for Instrument Channel Preformance, Volume 2: Model Examples, Algorithm Details, and Reference Information [1]

for examples of the typical models developed by the EPRI On-Line Monitoring Implementation Project. For each model selected, determine which signals have accessible data archived by a computer system.

Understand the importance of training a model for proper performance. Determine response methods to identify failures. The body of this report addresses these topics in detail. Initial training of the project team will be helpful.

The previously mentioned steps assume that the desired on-line monitoring software has been identified and obtained. Table 1-1 summarizes the recommended steps to take before starting an on-line monitoring program.

Table 1-1 On-Line Monitoring Implementation Preliminary Checklist Item I

Recommendation l

Ready?

Computer Equipment

1.

Obtain good computers for model development personnel.

2.

Test the data acquisition method with sample data.

3.

Verify that the data file storage location is acceptable.

Determine Project Goals and Identify Models

4.

Confirm the intended purpose and users of on-line monitoring.

5.

Identify the models to be developed.

6.

Set up a project plan with achievable milestones.

Training

7.

Train project personnel.

1.5 On-Line Monitoring Overview On-line monitoring is an automated method of monitoring instrument performance and assessing instrument calibration without disturbing the monitored channels while the plant is operating. In the simplest implementation, redundant channels are monitored by comparing each individual channel's indicated measurement to a calculated best estimate of the actual process value-referred to as the parameter estimate or estimate. By monitoring each channel's deviation from 1-4

EPRI incensed Material Introduction the parameter estimate, an assessment of each channel's calibration status can be made. An on-line monitoring system can also be referred to as a signal validation system or data validation system.

Several different implementations of on-line monitoring for nuclear plant systems currently exist.

Examples include the EPRI Instrument Calibration and Monitoring Program (ICMP), the ANL Multivariate State Estimation Technique (MSET), and the Organization for Economic Cooperation and Development (OECD) Halden Reactor Project PEANO (Process Evaluation and Analysis by Neural Operators). Some plants currently implement on-line monitoring in addition to their traditional calibration programs to provide additional performance assessment, troubleshooting, and maintenance planning capabilities.

Electricitd de France (EDF) plants have received approval from the France Safety Authority to use on-line monitoring as a basis for extending calibration intervals. Additionally, the NRC has issued a safety evaluation report authorizing the application of on-line monitoring as a calibration extension tool [4].

Due to the ease with which data acquisition and analysis of instrument channel data can be performed, on-line monitoring of instrument channels is possible and practical. In essence, on-line monitoring provides a proactive and beneficial approach to performing periodic instrument surveillances. It accomplishes the surveillance or monitoring aspect of calibration by comparison between redundant or correlated instrument channels and with independent estimates of the plant parameter of interest. It does not replace the practice of instrument adjustments; instead, it provides a performance-based approach for determining when instrument adjustment is necessary as compared to a traditional time-directed calibration approach.

1.6 EPRI's Role in On-Line Moniloring EPRI's strategic role in on-line monitoring is to facilitate its implementation and cost-effective use in numerous applications at power plants. To this end, EPRI has sponsored at multiple nuclear plants an on-line monitoring implementation project specifically intended to install and use on-line monitoring technology. The EPRI on-line monitoring implementation project serves two purposes:

To apply on-line monitoring to all types of power plant applications

To document all aspects of the implementation process in a series of EPRI deliverable reports 1-5

EPRI licensed Material Introduction These reports cover installation, modeling, optimization, and proven cost-benefits. The following EPRI reports resulted from this project:

On-Line Monitoring of Instrument Channel Performance, Volume 1: Guidelines for Model Development and Implementation (this report) addresses all aspects of modeling for on-line monitoring applications and their implementation. This report describes model development, data quality issues, training requirements, retraining criteria, responding to failure alarms, and declaring that a model is ready for use.

On-Line Monitoring of Instrument Channel Performance, Volume 2: Model Examples, Algorithm Details, and Reference Information [1] presents detailed model examples, empirical algorithm details, and further evaluations of the software utilized during this project.

On-Line Monitoring of Instrument Channel Performance, Volume 3: Applications to Nuclear Power Plant Technical Specification Instrumentation [2] addresses on-line monitoring for safety-related applications and the NRC's safety evaluation report [4] for on-line monitoring.

Topics include technical specifications, uncertainty analysis, procedures and surveillances, MSET application considerations, and miscellaneous technical considerations. Nuclear Energy Plant Optimization (NEPO) projects related to software verification and validation and uncertainty analysis provide input to this report.

SureSense Diagnostic Monitoring Studio User's Guide, Version 2.0 [3] provides detailed guidance in the application of SureSense for nuclear plant systems. This report is updated periodically as a result of user feedback or software revisions.

On-Line Monitoring Cost-Benefit Guide [6] discusses the expected costs and benefits of on-line monitoring. Direct, indirect, and potential benefits are covered. The project participants' experiences with on-line monitoring are included.

EPRI fosters development of on-line monitoring technology and its application via the Instrument Monitoring and Calibration (IMC) Users Group. Through this group, on-line monitoring will continue to be supported as a key technology as its use grows throughout the industry. Finally, the EPRI IMC Users Group will continue to support generic technical issues associated with on-line monitoring such as providing implementation guidance for calibration reduction of safety-related instrumentation.

1.7 Terminology Used in This Report Appendix A provides a glossary of terms used in this report. Some terms require additional clarification in support of using this report. The following sections explain key terms.

1.7.1 Channel, Sensor, and Signal The terms channel, sensor, and signal are often used almost interchangeably in this report, but there is an important distinction between the three terms. The sensor is the device that measures the process value. The sensor and associated signal conditioning equipment are referred to as the 1-6

EPRI licensed Material Introduction instrument channel or channel. The electrical output from the channel is the signal. Figure 1-1 shows the relationship between the three terms for a safety-related channel. A non-safety-related channel might not have the isolator or bistable as shown.

Channel Ir Signal Figure 1-1 Instrument Channel In Terms of On-Line Monitoring For a non-safety-related channel, there might be little in the way of signal conditioning because the sensor is the only real monitored device. More complex measurements might contain several signal conditioning modules. This report will. usually refer to the channel rather than the sensor in terms of what is monitored. Although other industry documents and published papers often discuss on-line monitoring using the term sensor, it is the channel (or some portion of the channel) that is actually monitored. The discussion provided in the following sections will frequently refer to sensor drift because the sensor is usually the most common source of drift, but any portion of the channel might actually be the cause of drift.

The on-line monitoring system does not know the layout of the channel-it receives only a digitized signal from the plant computer or from a historical file. Although the instrument channel is typically producing a milliampere or voltage output, the signal acquired by the on-line monitoring system is often scaled into the expected process units such as pressure, temperature, or percent. When this report refers to signals, it means the scaled or unscaled digitized output signals from the monitored channels.

1.7.2 Modeling Terms The term model is used to describe the group of signals that have been collected together for the purpose of signal validation and analysis. Depending on the context, model might refer to only the selected group of signals, or it might also include the various settings defined by the on-line monitoring system that are necessary to optimize the performance of the signal validation 1-7

EPRI Licensed Material Introduction procedure. In the context of on-line monitoring, model does not refer to some functional relationship between model elements defined by a set of equations.

The term vector is used to describe the observed values for all of the signals in the model at a particular instant in time. For example, if the signal data is contained in a spreadsheet, a single row of data for a particular time would be a vector.

The term domain is used to describe the operating states that form the basis for training a model.

The domain contains a range for each signal in the model, and it also defines different operating states within that range. For example, a domain for a pressure sensor might cover a range of 800-1200 psig (5516-8274 kPa). Within this range, there might be several distinct operating states associated with different equipment lineups or plant power levels.

The term estimate is used to describe the best estimate or approximation of the actual process or sensor value calculated by the on-line monitoring system. The term residual refers to the mathematical difference between an observed value and the corresponding estimate for that observation. The residual is important because fault detection is often based on the behavior of the residual.

1-8

EPRI iicensed Material 2

DATA MANAGEMENT On-line monitoring is a data intensive process, and the amount of data to be managed should be understood in advance. Section 2 discusses various aspects of data-file management. This information applies to either a batch mode or an on-line mode of operation.

2.1 On-Line Monitoring System Architecture On-line monitoring can be applied in various modes ranging from batch mode using data files to a real-time mode receiving a continuous data stream. The term on-line has not been well defined in the industry. Originally, the term on-line was used to indicate that signal validation was performed while the plant was operating at power, without regard to how the data were acquired.

In most cases, the on-line monitoring method has actually been off-line in terms of data acquisition, meaning that data are accumulated in batch files for processing.

The analysis procedures within the monitoring system treat all signals as if they originate from an on-line data collection system. The on-line monitoring system is not concerned with the layout of the channel; it simply receives a digitized time series of signals from the plant computer, a historical file, or another data acquisition system. Although the instrument channel is typically producing a milliampere or voltage output, the signals acquired by the on-line monitoring system are often scaled into the expected process units such as pressure, temperature, or percent.

The EPRI publication On-Line Monitoring of Instrument Channel Performance [5], defines the following possible options for an on-line monitoring system:

An automated system that performs data acquisition and analysis essentially continuously in real time at a specified sample rate

An automated system that performs data acquisition and analysis at discrete specified intervals

An automated system that is normally off and is manually activated to perform data acquisition and analysis at a set interval (at least quarterly)

A manual system in which data is acquired manually on at least a quarterly interval and entered manually into a computer program for the purpose of analysis The differences between each of the these options most often involve the degree of automation in the data acquisition step, including the method of data collection and the frequency of the data analysis. Most of these options actually operate in a batch mode in which data are accumulated in stored files. The differences in these options relate to the locations of the data files and to how 2-1

EPRI licensed Material Data Management frequently data files are generated and evaluated. For typical nuclear plant applications, either option 2 or option 3 will usually be used.

The typical on-line monitoring system consists of the following building blocks:

Separate off-line computer hardware on which the system resides.

Communications hardware and software to obtain data from the plant process computer, plant data historian, or other source if the data are automatically acquired. Manual data acquisition is also possible.

The on-line monitoring software that analyzes, displays, and archives the data and presents results interactively in graphs and reports.

2.1.1 Off-Line Batch Mode Using Historical Data The term batch mode means that data files are stored in some location and are accessed by the on-line monitoring system. The term off-line batch mode applies if the data files are manually extracted from an archive or must be manually specified by the user. Depending on the frequency of data collection, batch mode might be used to evaluate the previous quarter, month, week, or day. Notice that a system operating in batch mode is evaluating specified data files that cover some period of time-batch mode is not receiving a real-time data stream.

Even if the intent is to apply on-line monitoring in a true on-line mode, certain modeling and analysis functions are generally best performed in a batch operation, including the following:

Training - The data used for training require careful review and cleanup. Training data must be error-free and must properly characterize the normal operating states that will be monitored. The data used for training must be kept available for retraining if certain model settings are modified.

Model development - Historical data files are used to evaluate model performance as part of the model development process. Some of the historical data is used for training, and some data are typically used for verification and performance assessment.

Look back - After model development is complete and the model has been placed in service, it can be useful to review the historical performance of a signal or group of signals when evaluating current signal performance. It is often easier to maintain historical data in pre-configured files for ready access and comparison. Extracting data for an extended time period from a plant data historian or other archive can be time-consuming.

The effort to complete these functions for a model is about the same, regardless of the selected on-line monitoring approach-all models require historical data for training, retraining, model development, and historical performance analysis. The differences in the on-line monitoring mode of operation show up after the model has been developed, tested, and placed in service and relate primarily to how subsequent data are acquired and tested. In the case of an off-line batch model, the user must periodically extract data for each model, typically at monthly intervals, 2-2

EPRI Licensed Material Data Management involving a recurring maintenance cost. In ilts simplest implementation (in which no changes are made to the existing data archive software), extracting data typically involves the following steps:

Extract the previous month of data into a separate file for each model.

If the data are extracted in text format, optionally convert each extracted file into a binary format for improved performance during analysis. Required formats are specific to the on-line monitoring software application used.

Link the data file to the model and run the model.

If there are many models, this can be a time--consuming process. Therefore, some level of automation is preferable. The next section describes an automated approach to data acquisition and file management.

2.1.2 On-Line Batch Mode Using Current Data All users initially start with an off-line batch mode of operation as they develop models and learn how to use the on-line monitoring system software. As the models are placed in service, the transition from off-line to on-line mode of operation should be considered. Figure 2-1 shows a typical system architecture for a true on-line monitoring system, which includes a more efficient approach to data management.

Monitoring Station E ngineering Server Modeling Station Figure 2-1 On-Line Monitoring System 2-3

EPRI licensed Material Data Management The system shown in Figure 2-1 operates as follows:

No changes are made to the plant computer software and its associated data historian. Plant process data are acquired from the plant computer and archived just as they were prior to installing an on-line monitoring system.

Data are acquired from the data historian using an extraction routine (referred to in Figure 2-1 as a data bridge) that might be unique to the plant-specific computer configuration. At periodic intervals, data are downloaded from the data historian and stored in the appropriate format for each model. For example, an on-line monitoring system might acquire the previous day's data for each model and store it in a designated location.

The on-line monitoring software combines the new data with previously acquired data, automatically runs each model with the new data included, and stores the run results. The latest run results are available on demand by plant personnel.

In terms of power plant requirements, this description represents a true on-line monitoring system. It should be noted that the models are still technically operating in a batch mode because the models run on historical files that include up to the previous day's data. This is not real time, but it is close enough for most purposes (especially for calibration reduction and performance monitoring). The key differences between this periodic on-line approach and the off-line batch mode described in Section 2.1.1 occur in the following automated steps:

Data acquisition - An engineer or technician does not manually extract data files.

Data file formatting - File conversions are handled automatically.

Data file linking - The model automatically links to the latest dataset.

Model run - The model runs automatically and stores the run results in a specified location for ready retrieval by any user.

Making the transition from off-line batch mode to on-line mode usually requires some unique programming that can be provided by either the software supplier or by plant personnel. The benefits of on-line mode are considered important enough that it is the recommended approach for most users.

2.1.3 Real-Time Mode Real-time on-line monitoring is unnecessary for most, if not all, nuclear plant applications, especially considering the dedicated computing overhead required to operate numerous models simultaneously in a real-time environment. The NASA space shuttle is one example of a true real-time on-line monitoring application where the software used in this project has been employed. In this case, a continuous data stream is sampled and processed through the signal validation software throughout the period from just before launch until the completion of launch.

For the few minutes that it takes to reach orbit, a large quantity of data is evaluated as it is received to ensure that the mission critical sensors are providing valid data to the flight computers.

2-4

EPRI Licensed Material Data Management Nuclear plant requirements are quite different from the space shuttle in terms of the requirements for data acquisition. Rather than operating for only a few minutes during launch, a power plant might operate at near 100-percent power for an entire operating cycle of 18-24 months. During this extended period of operation, many sensors will experience only small process changes, essentially monitoring the process about a single point. Setting up on-line monitoring software to operate in a real-time mode can be done, but it involves an additional level of complexity that is not needed for typical plant applications because the software's output will not be used in real time.

2.2 Data File Naming When on-line monitoring is applied to its fullest potential, there will be many data files stored in the computer system and linked to the various models. When many data files are stored in a common location, the file-naming convention becomes important. The following naming convention works well and is recommended:

Unit Number (if needed) - Model Name - Year - Month - Portion of Month Unit number refers to the power plant designation (or designation for some monitored asset).

Model name refers to the specific diagnostic software model used to monitor some asset or system. The dates refer to the period of operation during which the data were acquired. The following three examples illustrate typical naming conventions:

Unit 1 Vessel Level-2002-01

RPS-2001-10 OTSG-A-2001-12 If this naming convention is used, all files for a single model will be located as a group within a directory, sorted by year and month. Missing months or time periods are easily recognized. If data cover portions of a month (such as the first 10 days in one file, the next 10 days in a second file, and the last 10 days in a third file), it is recommended that the three files be combined into a single file for the month.

Notice that this naming convention implies that the data for all signals in a given model will be acquired collectively as a group. This is the preferred approach. Some commercial software packages such as SureSense readily support separating signal data. However, separating signal data for a single model for a specified period of time into multiple files complicates data cleanup and review. This can limit the applicability of the data for use with other software applications.

Also, the removal of bad data within separate files must be carefully managed to prevent files with time-stamp simultaneity inconsistencies.

It should also be noted that when using this naming convention, that data for several models are not contained in a single file. For example, it might be easier during data retrieval to extract the data for five models at once. It is strongly recommended that a single data file contain the data for only a single model. While combined data files are readily supported in some commercial software packages such as SureSense, combining the data can actually complicate other aspects 2-5

EPRI licensed Material Data Management of data management. For example, suppose the data for six models, each containing 25 signals, are simultaneously extracted and stored in a single file. If data are acquired at a 1-minute rate, this file will contain up to 44,600 rows of data for 150 columns of signals-a total of 6,690,000 data points. Files will likely be in a spreadsheet format to allow review and editing as necessary.

The size of each file will be about 60 megabytes, which is time consuming to manage and difficult to review. These files are also treated as text files by spreadsheet programs. Text files take longer to read because each data access has to read through the entire file to read the data of interest. In addition, when sorting through historical data files, there are often cases where certain signal data are not valid. When reviewing a large data set for all models, simultaneously, removing periods of data due to specific invalid signal values will result in the simultaneous removal of these data from all other signals regardless of their quality. Based on the experience to date, each file should be kept as small and simple as possible.

2.3 Data Storage Format On-line monitoring is a data-intensive activity. The overall objective is to acquire plant data and to process the acquired data to determine whether it is indicative of a healthy, normal state of the system or, alternatively, of some degraded or unhealthy state of the system. Power plants are characterized by large numbers of subsystems and signals, many of which are important targets of on-line monitoring. Data acquisition rates for these large plants are reasonably high. As a result, very large data sets will be managed by either the on-line monitoring system or its operators. In most cases, a high level of on-line monitoring system automation will be preferred.

When implementing an on-line monitoring system, plant-specific data issues must be addressed and resolved. One such issue is the format of the acquired data. Computers preferentially process data in binary format, while human operators prefer text or graphical formats.

Human operators often process measurement data in the form of text. As a result, many plants have established data extraction utilities to extract operating data in a text-based format, often to ASCII text files. The signal data are typically acquired by the plant computer, stored in a data archive, and extracted on demand by a user as shown in Figure 2-2. These existing procedures will typically provide the initial data extraction capability for on-line monitoring. What is often required is a final automation step to perform the extraction from the data archive. Some on-line monitoring systems such as SureSense will provide a ready means to implement this data archive connection either on-demand by the user or via a regularly scheduled data extraction service program (also known as a data bridge in the case of the SureSense software).

2-6

EPRI Licensed Material Data Management Parogram Plant Data Plant Data Engineering Acquisition System Server Station Figure 2-2 Typical Plant Data Archiving and Retrieval System Although text files are readable by humans, they are extremely inefficient for use with a computer program. The program must read each character individually and interpret the series of characters in order to determine the number represented. Further, the space required for each value is unknown until the value has been read. This reduces file access speed when attempting to read data in the middle of the file. In contrast, computers can read binary format files with greater efficiency and speed and can readily access any data value in the file immediately and directly. Therefore, it is desirable to store plant data in a binary format for efficient use by the on-line monitoring system.

The SureSense software used by the EPRI On-Line Monitoring Implementation Project provides a general-purpose data interface that can be readily adapted to any input data format. It can be used to convert a set of data in a plant-specific format into one of several standard formats that can be readily shared between the project participants without further translation or processing.

It is recommended that initial implementation work be performed using one of these standard formats. After developing an initial set of models, a plant-specific data acquisition interface can be implemented if supported by the capabilities of the user's on-line monitoring software. The SureSense software uses a plug-in data reader capability to enable a highly adaptable and low-cost interface to any plant data system. This interface is discussed in Section 9.3 of this report, which describes the data bridge implementation.

Most users will already have text format data available from their plant data system. One common format for text-based data is the Microsoft Excel comma delimited file (CSV) format.

An advantage to this format is that Microsoft Excel can be used to examine and characterize the data. The SureSense software includes a plug-in data reader for CSV files. SureSense users should consult the user's manual for further details.

Standards for binary format files are typically specific to the industry and the software. The EPRI On-Line Monitoring Implementation Project has selected a simple nonproprietary format known as the signal data file (SDF) format. SDF files provide fast and efficient data access. The SureSense software comes with a plug-in data reader for the SDF file format. SureSense users should consult the user's manual for further details.

2-7

EPRI Licensed Material Data Management 2.4 Data File Configuration Management Data configuration management is an important element of any on-line monitoring implementation program. The following items require configuration management for an effective on-line monitoring implementation in a nuclear power plant environment:

On-line monitoring model definition files, including any supporting phase-determiner plug-ins and time-formatter plug-ins

On-line monitoring model training data files, including any supporting data-reader plug-ins.

On-line monitoring software acceptance test files and acceptance test results On-line monitoring configuration management should follow procedures currently in place for document and data management at the plant, 2-8

EPRI Licensed Material 3

OVERVIEW OF THE MULTIVARIATE STATE ESTIMATION TECHNIQUE., SURESENSE ON-LINE MONITORING SOFTWARE, AND MODEL DEVELOPMENT The EPRI on-line monitoring implementation project selected MSET as the base algorithm to use for the on-line monitoring project. Argonne National Laboratory (ANL) developed MSET for nuclear power plant applications. This section provides a brief overview of MSET and a description of the software used during this project that contains a commercial implementation of MSET. Volume 2 of this series of reports [1] contains more detailed information describing the mathematical basis of MSET. Volume 3 [2] provides additional information regarding the verification and validation of both MSET and the SureSense software. MSET is an empirical modeling program that can be used for on-line monitoring. Because it is not a user-friendly product by itself, it requires a commercial implementation that provides all of the necessary user interfaces, graphic capabilities, and data management aspects. During the course of this project, Expert Microsystems, Inc., provided a commercial implementation with the SureSense software product.

It is important to note that there are alternative methods that can be used for on-line monitoring; however, to review all of these methods is beyond the scope of this report. The focus of this project was implementation. With that focus in mind, a method was selected and initially reviewed. The greater effort then followed to understand on-line monitoring as a whole and the implementation of systems at participating plant sites.

3.1 Overview MSET is a software-based tool for on-line monitoring that was specifically developed by ANL for nuclear power applications. MSET is general in scope and can also improve performance, prevent downtime, and reduce operating costs for fossil power plants and many other industrial applications. MSET is a statistical modeling technique that learns a high-fidelity model of a process or apparatus from a sample of its normal operating data. Once built, the model produces an accurate estimate for each observed signal, given a new data observation from the process or equipment. Each estimated signal is compared to its actual signal counterpart using a highly sensitive fault-detection procedure to statistically determine whether the actual signal agrees with the learned model. MSET produces estimates for all signals included in the model.

To utilize MSET, the user starts by collecting sensor-generated data from the process under consideration that bound all normally expected operational states. These data are used by the 3-1

EPRI Licensed Material Overview of The Multivariate State Estimation Technique, SureSense On-Line Monitoring Software, and Model Development MSET system to establish the domain of normal process operation (trains MSET is trained to recognize normal behavior) and are used in the monitoring phase to identify abnormal behavior.

During monitoring, sensor data are read by MSET. An estimate of the current state of the process is determined by comparing the measured sensor data with that obtained during training. The difference between this estimate and the measurement is calculated. This difference or residual error is then analyzed by a statistically based hypothesis test that determines if the process is operating normally or abnormally. If an abnormal condition is detected, the initial diagnostic step identifies the cause as either a degraded sensor or an operational change in the process. When a degraded sensor is identified, MSET uses the estimated value of the signal from this sensor to provide a highly accurate virtual sensor that can be used to replace the function of the faulted sensor in the MSET estimation process.

3.2 MSET Software Functions MSET consists of 1) a pattern recognition system that provides empirically estimated values of all monitored signals and 2) a statistically based hypothesis test that compares the estimated signal values with the measured values to detect the development of incipient faults. MSET consists of three essential modules and a number of supporting modules. The essential modules are the following:

A training algorithm for selecting and characterizing a subset of representative data from sensors during normal operation of the system. The training module is used to produce a training matrix of operating data that ideally encompasses all expected normal operating states of the system.

A system state estimation module for parameter estimation. This module is used to calculate estimates for all signals in the model.

A statistically based fault-detection algorithm. This module is used to detect abnormal disturbances in the monitored signals by examination of the difference between the estimated and the measured signal values.

MSET provides a high-fidelity estimate of the expected response of an asset's data signals by using advanced pattern recognition techniques to measure the similarity between the signals within a learned domain of operation. The learned patterns or relationships among the signals (that is, the training data) are used to identify the operating state that most closely corresponds with the current measured set of signals. By quantifying the relationship between the current and the learned states, MSET estimates the current expected response of the signals.

The difference between a signal's estimated value and its measured value is used as the indicator for sensor and equipment faults. The sequential probability ratio test (SPRT) technique provides ANL's basis for detecting statistical changes in the sensor signals at the earliest possible time, including usable information regarding the type and location of a disturbance. The SPRT technique provides a superior surveillance tool because it is sensitive not only to disturbances in signal mean, but also to very subtle changes in the statistical quality (variance, skewness, and bias) of the monitored signals. Instead of threshold or control limits, the SPRT technique utilizes user-specified false-alarm and missed-alarm probabilities, allowing the user to control the likelihood of missed or false alarms. For sudden gross failures of sensors or system components, 3-2

EPRI Licensed Material Overview of The Multivarate State Estimation Technique, SureSense On-Line Monitoring Software, and Model Development the SPRT can annunciate the disturbance as fast as a conventional threshold limit check.

However, for slow degradation that evolves over a long time period (such as gradual decalibration in a sensor, wear-out or buildup of a radial rub in rotating machinery, loss-of-time constant degradation in a pressure transmitter, or change-of-gain failure without a change in signal mean), the SPRT can indicate the onset of a disturbance long before it would be apparent by a visual inspection of the strip chart or CRT signal traces and well before conventional threshold limit checks would be tripped.

3.3 Functional Overview and General Capabilities of SureSense The SureSense software provides the option of a commercial implementation of the MSET software licensed from ANL for use in the power industry. SureSense provides numerous additional capabilities that build upon the A!NL algorithms to significantly enhance the usability and diagnostic performance of the MSET procedures. The EPRI on-line monitoring implementation project is using SureSense for its on-line monitoring efforts.

The SureSense Diagnostic Monitoring Studio automates the production of application-specific software modules that reliably detect signal data faults and equipment malfunctions. These real-time capable on-line diagnostic modules enable improved safety, reduced operations and maintenance costs, and optimal performance for a wide range of systems and processes.

SureSense diagnostic monitoring is applicable to any process monitoring system where time-critical functions depend on sensor input or where unexpected process interruptions due to sensor and equipment failures or false alarms are unsafe or uneconomical.

A signal fault is defined as any failure in the data path that corrupts the data signal, thereby providing erroneous information to the process monitoring system. A signal validation module will also detect abnormal operating conditions for the monitored process or equipment.

SureSense uses several advanced predictive modeling techniques that calibrate a high-fidelity model of a process or apparatus from a sample of its normal operating data. Once built, the models provide an accurate estimate for each observed signal, given a new data observation from the process or equipment. Each estimated or virtual signal is compared to its actual signal counterpart using a highly sensitive fault-detection procedure to determine statistically whether the actual signal agrees with the calibrated model. Inconsistencies, if any, between the observed signals and their corresponding estimates are used to detect a wide variety of equipment problems (as well as to verify that such problems are not present).

The SureSense software automates the production of application-specific signal validation and equipment condition monitoring modules. The software user's guide [3] describes the operation and use of the development environment. The development environment is used to design a diagnostic monitoring model, to perform automated model training from historical operating data, and to verify and evaluate the resulting model. The development environment can then be used directly for on-line process surveillance or the software can be configured for inclusion in another process monitoring application.

In a process monitoring environment, the validation algorithm samples on-line signal data values and uses these observed values as the input to a parameter estimation module. The parameter estimation module produces an estimate of the signal value for each observed signal value. The 3-3

EPRI licensed Material Overview of The Multivariate State Estimation Technique, SureSense On-Line Monitoring Software, and Model Development difference between the observed and estimated signal values provides a residual error value. The fault-detection procedure uses an advanced statistical technique to determine whether the residual error value is uncharacteristic of the process and, thereby, indicative of a signal or process fault. Finally, a fault decision is made using a conditional probability analysis of a series of fault-detection results in order to reduce the potential for single observation false alarms.

Figure 3-1 illustrates the SureSense estimation and fault-detection procedure.

Training Monitoring IEquipment,~+

Acquire Data Estimation wEstimatio or Process SDetnc Opr Sure~enseOActiono 3.4 Model Development Overview Model is a general term used to describe a group of signals that have been collected for the purpose of signal validation and analysis. Typically, the model will contain the information necessary to estimate the expected state of the monitored asset or system, given an observation of its signals. The model includes the specifications for acquiring the incoming data as well as the various settings necessary to optimize the performance of the signal validation and predictive condition monitoring procedures. The model includes control settings determining how the model is trained to estimate the expected behavior and the individual signal settings for the purpose of identifying abnormal signal behavior.

3-4

EPRI Licensed Material Overview of The Multivariate State Estimation Technique, SureSense On-Line Monitoring Software, and Model Development Figure 3-2 illustrates one view of model development, which includes the following basic steps:

Select the parameters to evaluate as a group, and confirm adequate correlation between the selected parameters.

Acquire training and verification data. Ensure that training data are free of abnormal signals and abnormal operating states. Remove bad data as necessary.

Select estimation and fault-detection settings for initial analysis.

Evaluate the model using the training and verification data. Adjust the estimation and fault-detection settings as needed to optimize performance. Acquire additional training data, if needed, to bound the operating space.

Evaluate and test fault-detection settings in detail. Evaluate model sensitivity to false alarms and missed failure detection.

Continue testing new data as the data are acquired. Evaluate the calibration status of each validated signal.

Figure 3-2 Model Development Overview 3-5

EPRI Licensed Material 4

ON-LINE MONITORING PROCEDURES Section 4 provides general on-line monitoring procedures to assist with the plant-specific on-line monitoring implementation process. These procedures should be used as guidance for items to consider during plant-specific procedure development.

4.1 Software Use 4.1.1 Procedure On-line monitoring software tends to be quite complex. The underlying algorithms can be especially difficult to understand. For this reason, the software user's manual should provide adequate background information to facilitate a basic understanding of the principles of operation as well as detailed user's instructions. The SureSense Diagnostic Monitoring Studio User's Guide [3] was developed in support of the EPRI On-Line Monitoring Implementation project and has been periodically updated as required to reflect software revisions or users' requests for the duration of this project. This user's manual has been designed to serve as the software use procedure at a nuclear plant.

4.1.2 Personnel Training The software used for on-line monitoring typically implements a complex set of algorithms. The principles behind modeling and software use are not intuitive to most users. Furthermore, the application of on-line monitoring ultimately affects the on-site instrument calibration program, which also requires some training and preparation of plant personnel. The plant should conduct various training classes as part of the implementation process. Typical training classes include:

Detailed software use training for those personnel who will develop and maintain models

General software use training for those personnel who might review on-line monitoring system results

I&C training to explain how on-line monitoring affects the instrument calibration program, including how to assess the need for calibration and how to evaluate alarms as they occur 4.2 Model Development and Evaluation Procedures The following report sections provide guidance regarding how to develop, maintain, and evaluate the in-service models.

4-1

EPRI Licensed Material On-Line Monitoring Procedures 4.2.1 Model Documentation In the context used here, the term model refers to the following items:

The selected signals that have been grouped for the purpose of signal validation and analysis

The various settings defined by the on-line monitoring method that are necessary to optimize the performance of the signal validation procedure

The data used for training, including any filtering of the data The term model documentation refers to the configuration management of the completed model.

For each model, the following items should Ibe documented:

The file name and file date of the completed model as approved for use, including any revision information.

A list of the instrument channels covered by the model, including which channels have been designated as calibrate on demand, based on the model run results by the instrument calibration program.

Model settings that form the basis for model operation. These model settings include the definition of signal, data set, phase determiner, and estimation settings.

The file name and date of any plug-ins used by the model.

Training files and file dates for those files that form the training basis of the model.

4.2.2 Periodic Model Evaluation After a model has been placed in service, plant personnel will evaluate it periodically. This evaluation is intended to accomplish the following objectives:

Confirm that the instrument channels included in the model are performing properly

Identify any channels that appear to have drifted beyond acceptable limits

Identify any equipment condition monitoring concerns, depending on whether the model was designed to function as a condition monitoring tool

Determine whether the model appears to be adequately trained for the most recent data

Confirm that model settings continue to be adequate for the model and plant operating state 4.2.2.1 Example Model Evaluation Procedure The following provides a typical procedure for periodic model evaluation. This procedure is based on a single model and the instrument channels evaluated by that model. The model used to illustrate the procedure (HP FW HEATERS) contains various instrument channels in the high-pressure feedwater heater system.

4-2

EPRI Licensed Material On-Line Monitoring Procedures 1.0 Purpose 1.1 To confirm that instrument channels included within the scope of the on-line monitoring program are operating within acceptable limits 1.2 To verify that the on-line monitoring system for the evaluated model is operating normally and does not require modification to the model settings The procedure applies to the HP FW HEATERS model, which contains the instrument channels in the high-pressure feedwater heater system as shown in Table 4-1.

4-3

EPRI Licensed Material On-Line Monitoring Procedures Table 4-1 Instrument Channel Description Instrument Channel Description Tag Number Computer Point MFW Pump A Pressure PT 3-66 P2214A MFW Pump B Pressure PT 3-80 P2215A MFW Pump A Outlet Temperature TE 3-68 T2362A MFW Pump B Outlet Temperature TE 3-82 T2363A MFW Pump Outlet Header Temperature TE 3-2 T2364A MFW Pump A Outlet Flow FT 3-70 F2250A MFW Pump B Outlet Flow FT 3-84 F2251A FW Heater Al Inlet Temperature TE 3-6 T2240A FW Heater B1 Inlet Temperature TE 3-8 T2260A FW Heater Cl Inlet Temperature TE 3-16 T2241A FW Heater Al Outlet Temperature TE 3-18 T2261 A FW Heater B1 Outlet Temperature TE 3-26 T2242A FW Heater Cl Outlet Temperature TE 3-28 T2262A FW Heater 1 Outlet Pressure PT 3-34 P2273A SG IA Inlet Flow FT 3-35A F0403A SG 1 B Inlet Flow FT 3-35B F0404A SG 2A Inlet Flow FT 3-48A F0423A SG 2B Inlet Flow FT 3-48B F0424A SG 3A Inlet Flow FT 3-90A F0443A SG 3B Inlet Flow FT 3-90B F0444A SG 4A Inlet Flow FT 3-103A F0463A SG 4B Inlet Flow FT 3-103B F0464A SG 1 Inlet Temperature TE 3-36 T0418A SG 2 Inlet Temperature TE 3-49 T0438A SG 3 Inlet Temperature TE 3-91 T0458A SG 4 Inlet Temperature TE 3-104 T0478A SG 1 Inlet Pressure PT 3-37 P0403A SG 2 Inlet Pressure PT 3-50 P0423A SG 3 Inlet Pressure PT 3-92 P0443A SG 4 Inlet Pressure PT 3-105 P0463A 4-4

EPRI Licensed Material On-Line Monitoring Procedures 2.0 Plant Status 2.1 Normally operating Note that the phase determiner used to partition the model into submodels is often defined based on reactor power level. If the plant is shut down or operating at a low power level, signal validation might not be performed while in a state excluded by a phase determiner.

3.0 Prerequisites 3.1 The on-line monitoring system software is operational.

3.2 The test data for the evaluated model are available. The test data typically contain plant operating data for the most recent time period.

Note that the period of time evaluated by this procedure is plant specific. Some plants might choose to operate in an on-line mode in which data are available on a near real-time basis. Other plants might choose to operate in a batch mode and evaluate the last quarter, last month, or some other period of data.

3.3 The performer of this procedure has been trained to use the on-line monitoring software and evaluate the test results.

4.0 Procedure 4.1 Start the on-line monitoring software and open the HP FW HEATERS model.

4.2 Ensure that the model used fCr evaluation is the approved model. Verify the following initial conditions:

The model file name, date, size, and revision number are correct.

The model is trained for use and has been trained on the file(s) specified by the model documentation.

The model settings are correct in accordance with the model documentation.

Acceptance limits have been specified for the instrument channels in the model.

4.3 Locate the test file containing the most recent data. If necessary, link this file to the model, and run the model using this latest test data.

Note that, depending on the plant method of on-line monitoring implementation, the file might be a manually acquired batch file, an automatically acquired batch file, or an automatically acquired and run file.

4-5

EPRI Licensed Material On-Line Monitoring Procedures 4.4 Upon completion of the monitoring run, review the run results. Identify any channels that were identified as failed during the run.

4.5 Review the observation estimate plot and the residual plot for each channel identified as failed during the run. Classify the identified failure into one of the following categories:

The channel has drifted beyond acceptable limits. A channel recalibration will be necessary.

The channel shows evidence of some drift, but a review of the residual plot confirms that the drift is not significant. A channel recalibration should not be necessary.

Alarms were generated because of a plant or system operating transient for which the model was inadequately trained. If channel performance is acceptable before and after the transient, recalibration should not be necessary.

The model is not adequately trained for the plant operating state. Model settings or model retraining with additional data might be necessary.

Note that Section 8 provides for additional guidance regarding alarm assessment. It is assumed in this procedure that the performer has been trained to recognize when model settings require adjustment.

4.6 Initiate a work order to recalibrate any instrument channels that have drifted beyond acceptable limits.

4.7 Initiate a request to update the model if alarms were generated because of inadequate training for the plant or system-operating state.

4-6

EPRI Licensed Material 5

SIGNAL SELECTION Section 5 describes the starting point for model development-signal selection. Technical issues associated with signal selection are discussed, including the criteria for deciding how large a model should be. On-Line Monitoring of Instrument Channel Performance, Volume 2 [1]

provides numerous examples of models that have been developed using the recommended criteria.

5.1 Signal Selection - Where to ';tart 5.1.1 Signals as Part of a Model The term model refers to the procedural settings in combination with the selected signals that are collectively evaluated and validated as a group. Figure 5-1 shows a typical on-line monitoring (OLM) group model of a pressurized water reactor (PWR) steam system as displayed by the SureSense software.

lie iOdne U Plot itd Anazn *oW IC{

-. }

m r1'r1AffjQP1QnjAMJ

1111_`_"'Z11 CS Teripeiratre ROO Loop A I20 TD41 M RC9 Loop '

RCS Loop C 1626396688 TD4OtQ SGL"

~

6pox L0400GASo304E6 r_93.68722 1 04004A 4261. F0402a Feed FR LOU4CAIUZ14 1 l

[!04.4 1-04CIA L 4 L0402A161.38870 5 073407, *0402A l 4185S3 FO04D StownFlw L0403A 58.95100, __

l 0841 F04t8A 9team osnoraWA L0420Aj ri460 T

2.80620:

F042UA LA 42tt F0423A Feed Flow LD421A i,50370, r981.0372; P042iA l 4298.37T F042M L0t22AI61,9031_%

j p

t2f Fo422A 1 442S0.18 ObW L0423A[1 s9o220 r

74.2 426A Steam Bonneltor B

LOt44GA fi50q I99l

.@6 P1344U LA4383.!l; F0443A FeedFIRW LOi4414AIi5 11 13.7924E P13441A F044M L0442A1S1.9_4s6 96 P2442A1 4360.43' F044A Sloa4nFblw L0443A LS.38120 ii F04 434141 F0446A Steamn Generator C

[1097!t7, Prenr

=736.19097.

PO396A Turbine

r.

Prolua Steam System Model All Loops

'_'1'_1

Figure 5-1 Typical SureSense Model 5-1

EPRI Licensed Material Signal Selection The SureSense model shown in Figure 5-1 includes the following signals:

Reactor power (used to partition the model into submodels based on power level)

RCS hot-leg temperature

Steam generator level

Steam pressure

Steam flow and feedwater flow

Turbine impulse pressure Figure 5-2 shows the location of these typical signals in the steam system.

Steam Steam Pressure Flow pp0 Turbine tea r

Feed Impulse Steam I

Flow Pressure Generator G~enercator F0 Level A

A urbine RCS Hot l

lRCS Cold Leg l

Leg Figure 5-2 Steam System Signals Modeled for One Steam Generator 5.1.2 Where to Start in Signal Selection for a Model The MSET estimation procedure presumes a moderate to high level of correlation between signals included in the model. This is intuitive when one considers that the MSET estimates are based on the information provided by the signals in the model. Therefore, information from related signals is required. Nonrelated signals will have independent or mildly dependent variations and cannot be monitored accurately with MSET. The MSET training step includes a procedure to build a training matrix from the historical operating data that characterizes the 5-2

EPRI Licensed Material Signal Selection operating space. In the MSET monitoring step, the training matrix is used to generate estimates based on a weighted combination of the reference patterns most closely matching each new observation.

In cases such as the typical steam system model shown in Figure 5-1, most signals have a strong correlation. Some signals (such as steam generator level) are strongly correlated as a group but might have little or no correlation to the other signals depending on the plant design. As a model is developed, the concept of correlation should be specifically considered. If all signals in a model have little or no correlation to one another, the selected training vectors might model only noise in the signals rather than some actual correlated process relationship or pattern. This will be the limiting consideration when the model does not contain redundant signals.

The first models developed should be simple ones such as the typical steam system model illustrated here. This recommendation is provided for the following reasons:

The first models will test the data acquisition method and the quality of the retrieved data.

Based on the experience to date, this rarely goes smoothly the first time.

When developing the first model, the user is simultaneously learning several complex parts of model development including how the software works, the problems with data acquisition, how to review large amounts of data for acceptable quality, how process parameters vary over time, how to train and retrain the model, and how to evaluate identified failures. The first models should be simple; otherwise, the model's complexity might further confuse the overall progress.

Steam system models tend to contain redundant signals for most, if not all, parameters. Data quality issues are more readily identifiable if redundant signals are available for direct comparison.

Refer to the reference list (Section 11) for publications listing examples of numerous models that have been developed. These examples best illustrate the approach to signal selection. Locate comparable instruments for a given model on the plant's piping and instrument drawings, and confirm that the selected instruments have accessible computer points. If there are additional sensors that might be beneficial to include in the model, expand the model to include these signals. It is easier to remove signals later than it is to add signals to an existing model.

5-3

EPRI Licensed Material Signal Selection 5.2 Model Size Considerations Determine the model size as part of the signal selection. Although there are no strict limits, consider the following:

Signal validation processing time varies with the square of the model size. Very large data files are also time consuming to manage. For these reasons, models containing hundreds of signals are discouraged simply because of the limitations of signal processing and data handling. Even if a very large correlated group of signals can be identified, improved performance might be achieved by separating the signals into smaller groups for the purpose of signal validation. Typical models might be as few as 3 signals to as many as 80 signals; models containing less than 30 signals will be most common. Optimal model size is generally between 5 and 25 signals.

Numerical modeling using large data files requires more computer memory than typical business software applications. For very large models or data files, available computer memory can become the limiting factor.

Model retraining and maintenance requirements favor smaller models. If a sensor is recalibrated or replaced or if the process-operating characteristics change over time, it will likely be necessary to retrain the model. As the model becomes larger, it might require retraining more frequently simply because it contains more signals and, therefore, represents a more complex operating state space.

5.3 The Importance of Correlation This section discusses the importance of correlation in developing models for on-line monitoring. SureSense has the ability to compute correlations and a series of other basic statistics that aid in assessing the validity of the current model.

5.3.1 Why Correlation Matters Correlation is important in a model because the estimates are based on the learned behavior or correlation patterns among a group of signals. Implicit in this learning method is an assumption that drift or failure in one signal can be detected, based on the normal behavior of the remaining signals. This is an important point because the estimate has to distinguish between valid process changes and sensor drift or failure. A process change will be reflected in several correlated parameters, whereas a sensor drift will occur independently within a set of correlated parameters.

For this reason, the signals combined in a model should be checked for correlation as part of model development.

Figure 5-3 shows a real example of highly correlated data. It can be seen that there is a clear relationship between turbine first-stage pressure and reactor power. As power increases, turbine first-stage pressure directly increases in a virtually linear manner. The square of the correlation coefficient is 0.9993, indicative of a high linear correlation. Drift in one pressure signal can be detected by the continued normal behavior of the other pressure signals as well as by reactor power and other similarly correlated signals.

5-4

EPRI Licensed Material Signal Selection 800 -

600 -

Pressure (psig) 400 200 0-200 y = 0.712x + 8.1475 R2 = 0.9993 400 600 800 1000 Power (MWe) 1 psi = 6.894757 kPa Figure 5-3 Example of Highly Correlated Data - Turbine Pressure and Reactor Power Figure 5-4 shows a real example of steam generator level data, which has a poor correlation to reactor power in this case. As can be seen, steam generator level remains virtually constant at about 61 percent regardless of power level. Notice that the square of the correlation coefficient is 0.00003, indicative of no linear correlation between level and power. Drift in one steam generator level signal will not be predicted with changes in power. The learned behavior is that the steam generator level is about 61 percent regardless of the state of most other signals, with only the other redundant level signals providing any useful information. This does not necessarily mean that steam generator level is an unlikely candidate for on-line monitoring, but it does mean that its fault-detection capability will rely more on the joint response of the redundant channels than of other uncorrelated signals.

5-5

EPRI Licensed Material Signal Selection 62.8 62.4 y -3E-06x + 61.495 R2 = 3E-05 Level 62.0 (percent) 61.6 a

A*+

.1 ax 1

v' Js^'

61.2 60.8 60.4 200 400 600 800 1000 Power (MWe)

Figure 5-4 Example of Poorly Correlated Data - Steam Generator Level and Reactor Power Correlation at some level is presumed in the MSET training procedure. The MSET MinMax training procedure (Section 7. 1.1 describes the MinMax algorithm) establishes the boundaries of the presumed operating state space by selecting the training data vectors containing the smallest and largest value for each signal channel. The MSET vector-ordering training procedure adds to the MinMax vectors by selecting vectors within the operating state space that are approximately evenly distributed across this space. When MSET computes an estimate, it assigns greater weight to the training matrix vectors closest to the new observation vector. The assumption of correlation is inherent in this method. In other words, it is presumed that an estimate can be determined based on the training vectors closest to the observation vector. If all signals in a model were completely uncorrelated, the model would effectively be trained on noise, meaning that the training vectors closest to an observation might actually have little or no relevance to the actual observed values.

5.3.2 Why Correlated Parameters Might Not Appear to Correlate The data used to check for correlation should be reviewed. Even if there is a known physical relationship between parameters, the data might not always show a correlation. In cases such as this, additional data might be needed. The following sections provide examples of such potential data issues.

5-6

EPRI Licensed Material Signal Selection 5.3.2.1 Small State Space Figure 5-5 shows an example of highly correlated data-reactor power and turbine first-stage pressure vary in a virtually linear manner over the entire power range. The square of the correlation coefficient is 0.996, indicative of the high correlation.

100 +

80 -

Turbine Impulse Pressure (percent) 60-40-y = 0.8748x + 13.99 R2 = 0.96 20 20 40 60 Reactor Power (percent) 80 100 Figure 5-5 Typical Correlation of Reactor Power to Turbine Pressure - Entire Range The method by which correlation measures are calculated will not produce the same result if only a portion of the range is evaluated. As an example, the graph provided in Figure 5-5 contains 2645 points. If the highest 645 points alone are plotted, the result is shown in Figure 5-

6. Notice that these data points are in a very small range, all close to 100 percent power, and the square of the correlation coefficient is only 0.0995-indicative of a negligible correlation. This example illustrates the value of engineering knowledge in the modeling procedures. Signals in a particular model should be initially selected based on their known physical relationships. Next, data that adequately characterize the known relationship should be gathered and included in the model training procedures. As this example illustrates, it is preferable if the training data include normal operating points having correlated variations in these signals rather than data for a single operating point.

5-7 I-

EPRI Licensed Material Signal Selection 100.5 100.3 Turbine Impulse 100.1 Pressure (percent) 9 F*l t

y = 0.3338x + 66.93 a

R =h

=0.0995 99.5 98.0 98.4 98.8 99.2 99.6 Reactor Power (percent)

Figure 5-6 Typical Correlation of Reactor Power to Turbine Pressure - 100 Percent Power If the data deviation from the regression line is small with respect to the range of the evaluated data, the correlation coefficient can be quite high as shown in Figure 5-5. If the data deviation from the regression line is large with respect to the range of the evaluated data, the correlation coefficient will be low as shown in Figure 5-6. This is a common problem with nuclear plant data. Most plants operate at or near 100 percent power for an extended period. Data extracted during this period will show little variation over a range, and any correlation analysis is effectively evaluating the correlation in the noise of the data. In these cases, it will be necessary to acquire additional data that cover a wider range if possible.

5.3.2.2 Data Archive Historian Some data historian systems apply data compression techniques to minimize archive file size. If the signal value does not vary outside a specified range-referred to as the data compression limit, factor, or tolerance (depending on the computer system), the value is assumed to be unchanged. When the value eventually exceeds the data compression limit, the signal value is updated in storage. When the data are later extracted from archive, a linear interpolation routine might be used to derive intermediate points between the recorded values. By this approach, a significant reduction in file storage size can be realized for archived data. Unfortunately, the data historian creates false data between the stored points, which can adversely affect any signal correlation analysis. Section 6.3 discusses this problem in detail.

5-8 Co'f

EPRI Licensed Material Signal Selection 5.3.3 Correlation Equation and Meaning The discussion of correlation is usually based on an assumed linear correlation between signals.

The linear correlation coefficient is given by::

n E (Xi-X hi-5;)

r =

C~xY i=

Gx<ay na,,cy where R

Correlation coefficient yXY Covariance of x and y ax 0-

-Y Standard deviation of x and y Pairs of x and y

'~Y

=Sample mean of x and y The calculated value of r is an indicator of how well the points (xi, y,) fit a straight line, and r can range from -1 to + 1. If the absolute value of r is close to 1, the points lie close to a straight line; if the absolute value of r is close to 0, the points are not correlated. The following interpretation of r is often used:

if 0 < Irl < 0.3, there is little or no linear relationship between x and y.

if 0.3 < Ir < 0.7, there might be a weak linear relationship between x andy.

If 0.7 < Irl < 1.0, there is a basis to claim some type of linear relationship between x and y.

Figure 5-7 shows how the value of r will vary with the relationship between x and y.

5-9

EPRI Licensed Material Signal Selection F

0 a

S S

6 a

6 x

x r=+1: perfect positive correlation

.6. 0 a

O Is W.

5 r=-1: perfect negative correlation S

0o 6*u Negative r: y decreases as x increases Positive r y increases as x increases 04 a

a a

0 0a 9

x r near 0: no linear relationship between x and y x

r near 0: no linear relationship between x and y Figure 5-7 Correlation Values When applied to samples, the correlation coefficient is often expressed as Pearson's sample correlation coefficient in which the sample standard deviations are used and the denominator is adjusted by n-I rather than n. Statistics textbooks can provide additional information regarding the interpretation of the correlation coefficient.

5-10

EPRI Licensed Material Signal Selection Notice that an assumed linear relationship is often used for evaluation purposes because the associated statistic is relatively easy to calculate and understand. If the actual relationship between signals is correlated in some nonlinear relationship, as is often the case, this correlation test will indicate a lower level of correlation than actually exists. Also, the correlation coefficient does not by itself imply any sort of functional relationship between the two parameters; it only indicates the degree to which the two parameters are linearly correlated. For these reasons, engineering knowledge of the physical relationship between a group of signals should also be considered.

In addition, it is important that there is more than one signal in a model that has a strong correlation with any other signal (such that a given signal's estimate is not based solely on a correlation with only one other signal). This is a cautionary issue when developing models with four or five signals, where the average correlation might be significantly influenced by a single high correlation.

5.3.4 Example of an Uncorrelated Model Figure 5-8 shows a small model for circulating water pump discharge pressures. When this model was first developed, it was not initially recognized that the discharge pressure signals for each of the four circulating water pumps did not sense a common pressure. In this context, the discharge pressure for each circulating water pump is entirely uncorrelated to the other pump discharge pressures. Figure 5-8 also shows two signals for each pump. Actually, these are two different computer points for the same physical sensor-they are not independent measurements even though they appear slightly different in value. Thus, there is no correlation between pumps because the two signals for each pump are actually from the same sensor. The correlation analysis readily shows this. This is an example of a model that looked promising at first.

However, it was concluded upon further review that it was not a suitable candidate for on-line monitoring.

IC

,2_

.R

.FLA=..

5

=.

X _.._._._._ : :.

E_.g.=

~ -

t-<me5 9 <Sul t~~kH6 E

IrM Ml3,XLMV1!= Arfifl lF"

""', "'I""'444,

, i:

" 4, 5 -

,., -

, -, ` ",

I, I _,

l43.E9000 PTO09-117A(FD)S

PT

___7____

FS i

496E1500l P9117C ERFOS) 5 50.020000j PFT-0-11TC FMS)

-A-C9K: PtN~1P vCPO

+

PT-000-1170 (RSFO)

+

lT.o41s7 P-009-11T0TERFOS)

A g~o iii T-0O9-1tB~iiS l s8.360 T.009_11lDPMS)

T CR4C Ptt

-D CRC PILP Figure 5-8 Circulating Water Pump Discharge Pressure Model 5-11

EPRI Licensed Material Signal Selection 5.4 Planning for Phases (Submodels) 5.4.1 What Is a Submodel?

Some software applications such as SureSense will readily allow a single physical model to be divided into multiple submodels based on the various operating states (also referred to as phases) of the monitored asset. Experience has shown that dividing a model into submodels for certain distinct operating phases can significantly improve the model's diagnostic performance.

Submodels are defined regions of operation that are separately trained in a model; signal validation can be performed on each phase independent of the signal behavior in other phases.

This is an important feature to consider when process values can change considerably over the entire domain (range) of system operation. Phases in this context are very simply defined as regions where a given signal's value is between an upper and a lower limit.

For a given domain, typical phases that might be defined include:

Power level - Submodels might be established for 100 percent power, 90 percent power, or other power levels.

Ambient temperature - Process variations might be a function of ambient temperature so that a change in ambient conditions will shift the signal validation model into a different phase.

System pressure - The correlated group of instruments in the model might behave differently at low pressure rather than at high pressure.

Seasonal - As a variation on ambient temperature, it might be preferable to create a summer model and a winter model.

Equipment lineups - There might be operating states that vary depending on which pumps are running. This will often be a consideration during low-power operation.

5.4.2 Power Level as a Phase Determiner Many parameters in a power plant are either strongly or weakly directly correlated to power.

Virtually all parameters in a power plant are affected by power level even if they are not correlated to power. For this reason, power level is a natural signal to include in a model for the purpose of defining and detecting the plant's operating phases.

The application of phases divides the model into separate submodels. Each submodel is separately trained and evaluated. For example, consider the use of several submodels based on reactor electric power in which the measured signal has units of percent. Suppose that the 5-12

EPRI Licensed Material Signal Selection principal operating modes (or phases) are near 100 percent power, 90 percent power, or about 70 percent power. For this operating behavior, the phases might be defined as follows:

OPERATING_100

>98 percent power

OPERATING_90 Between 90 and 98 percent power

OPERATING_70 Between 60 and 80 percent power

NONOPERATING

<10 percent power

OPERATINGOTHER Any other power level For the purpose of signal validation, only the 100 percent, 90 percent, and 70 percent power levels might be validated. The model would then be trained with three submodels based on the power level. As input signals are processed, the power level would first be checked to determine which submodel applied. The input data would then be evaluated using the appropriate model training matrix and settings. Furthermore, the sensitivity and other settings can be individually tailored to the optimized performance of each submodel. During the period that the power level is outside the trained regions, the phase determiner would identify the phase state as either NONOPERATING or OPERATINGOTHER. If these two regions were not trained for signal validation, no fault-detection processing would be performed.

The use of phases is particularly useful because they enable the model to ignore data outside the regions of diagnostic interest or regions with insufficient training data. By this approach, failure alarms caused by data outside of the training range can be minimized. Regions with insufficient training data can be quite common for a power plant model. For example, a typical nuclear plant might operate above 90 percent power for more than 99 percent of the time that it is at power (an assertion that is based on data provided by the participants in the EPRI On-Line Monitoring Implementation Project). A typical plant might operate at better than 99 percent power for up to 98 percent of the time that it is at power. This means that there is an enormous amount of data near 100 percent power operation, but typically very little data for low-power and intermediate states. This is further complicated by equipment lineups that can vary at low power levels for which it might take years to observe suitable training data for each possible operating state.

There is also the possibility of physical changes to this equipment, necessitating that new data be collected. The application of phases allows signal validation to be performed for virtually all of the period at power while excluding low-power states that have little or no available training data.

5.4.3 Choosing a Power Signal Different types of power signals are readily available at each plant:

Nuclear instrumentation signals

Calculated reactor power (expressed either in megawatts or as a percent)

Electric power output (expressed either in megawatts or as a percent) 5-13

EPRI Licensed Material Signal Selection The preferred power signal to use might depend on the signals in the model. The calculated reactor power signal might be preferred for models involving primary system signals. The electric power output signal might be preferred for models involving secondary system signals, mainly because the generated electric power will vary with some secondary system parameters.

5.4.4 Phases and Signal Selection The application of phases affects signal selection in that the data for the phase determiner must also be acquired and verified for training. There will be very few models that should not initially include a power signal, even if there is not an obvious correlation between power and the signals in the model. As part of signal selection, it is important to think through the concept of phases and to choose signals that might have potential value as a phase determiner. It is always easier to remove signals from a model than it is to add signals later.

5-14

EPRI Licensed Material 6

DATA QUANTITY AND QUALITY The model's learned definitions of both the normal operating states and the normal signal behavior are established during training. The model's learned definition of its own parameter estimation performance is also established during training. Therefore, bad training data will be learned as normal and can significantly degrade fault-detection sensitivity. Actual operating states not included in the training data will often cause many or all signals to annunciate faults after the signals exceed the boundaries of the trained state space. For these reasons, the quantity and quality of training data warrant special care in the model development process.

The data used for training and testing should be of the best available quality. In this context, the term data quality refers to several attributes including:

Training data should be representative of the signals' expected behavior under expected monitoring conditions.

Bad data should be removed from training data sets so that the model does not learn incorrect operating states or bound a larger state space than is reasonable. In this instance, bad data refers to erroneous data recorded by the data acquisition system that is beyond the range of what is physically possible.

Abnormal process variations should be evaluated for removal from the training data set. Even if the process variations really did occur, the associated data might not be desirable for training. Any data contained in the training data sets will be learned as normal behavior.

Hence, an abnormal process included in ithe training data will result in the assumption that the given abnormal process is an expected phenomenon.

If operating in batch mode, bad data should be removed from monitored data sets to minimize verification efforts due to fault-detection events that might occur. It should be recognized that bad data can produce false alarms or fault-detection events when operating in on-line mode.

The following sections address various aspects of data quality that should be considered when developing a model.

6.1 Data Quantity 6.1.1 Quantity of Data - Sample Frequency It is important to ensure that the training data set contains adequate data for each operating phase to be validated and that the data are representative of the evaluated operating conditions. A phase 6-1

EPRI Licensed Material Data Quantity and Quality is a defined region of operation. Phases are used to separate the model into submodels (refer to Section 5.4 for more information). If a particular phase does not appear to have sufficient data for training, monitoring for that phase should not be enabled so that detection events are not annunciated during these phases.

Data files should generally contain data sampled at a frequency consistent with normal process variations-for large systems such as power plants, sampling once every minute is recommended. If data are obtained at a frequency that is too low (such as once every hour for a power plant), some data filtering and analysis options will not be meaningful or appropriate.

Although monitoring data might not need to be acquired at the same sampling frequency as training data, it is recommended that all data should be acquired at the same frequency for the following reasons:

As a model is initially developed, it is not always known how much training data will be needed. In some models, the first month of data might be adequate for training. In other models, it might take several months to acquire data that covers the expected operating space.

Subtle process changes might occur over time that prompt retraining with additional data.

If equipment or sensors are repaired, recalibrated, or replaced, it might be necessary to retrain the model with either new data or additional data.

Data acquired at a 1-minute rate are typically frequent enough to allow data filtering by an averaging or median-select approach (refer to Section 6.4). The dynamics of typical power plant systems do not usually cause large fluctuations in signals over a period of a few minutes while at steady-state conditions. However, wider fluctuations are possible as the sampling interval is extended, which can eliminate potential data filtering options.

The method by which signals are identified as faulted depends on the number of alarms received over a set of sequential measurements. As a signal drifts outside the expected range and the initial alarms are generated, it generally requires additional time points before the signal is declared faulted. If signals are acquired at longer frequencies (such as every hour), it might take one day or more for signal failures to be identified. Higher sampling frequencies increase the likelihood of early fault detection.

When a problem is identified with a monitored channel, the user will typically want to evaluate the monitored data in more detail. By having monitored data available at a relatively high data acquisition frequency, the user can more easily evaluate the potential problem. For example, the user might initially monitor available data sampled at a 15-minute rate.

However, if a problem is identified, the user might instead look more closely at the results by reprocessing the data at a 1-minute rate.

The method by which data are acquired should be consistent. Attempting to acquire monitoring data at a rate different than that used for the training data increases the likelihood of data processing inconsistencies. Note that even if monitoring data are acquired at a higher rate (such as every minute), it need not be validated at that frequency.

6.1.2 Quantity of Data - How Much Historical Data to Acquire Model development includes testing a model for adequate performance using actual plant data.

Even if a plant operates at about 100 percent power for an entire operating cycle, process values 6-2

EPRI Licensed Material Data Quantity and Quality can change over time. For this reason, it is important to test the model with as much historical data as possible so that the model response to these process changes can be anticipated. The use of data from the previous operating cycle up to the present is recommended to evaluate the model's performance. In some cases, this might mean acquiring up to 24 months of data.

Although this is a large amount of data, it allows the model to be thoroughly tested and evaluated during development. The response to fault alarms that might occur in the future associated with different equipment lineups, periodic transients, or minor process changes (temporary or permanent) can be determined before deploying the model for in-service use. Acquiring this amount of data will also identify any data acquisition issues or problems.

6.2 Dealing with Bad Data Bad data is a general term that covers several types of data problems and erroneous data that will likely be encountered. In this context, the presence of bad data is usually caused by failures in the data acquisition system or by programming limitations in the manner by which data are archived. Whenever possible, bad data should be removed from files to minimize evaluation and fault-detection errors that might occur. Bad data absolutely must be removed from training data files so that the model is not trained to treat the bad data as normal system behavior. In the following subsections, the most common types of bad data are presented. The second volume of this three-volume report series [1] presents an evaluation of a stand-alone software product that was designed to aid in data preprocessing efforts related to OLM model development.

6.2.1 The Effect of Bad Data on the Estimation Process The model's learned definitions of both normal operating states and normal signal behavior are established during training. The model's learned definition of its own parameter estimation performance is also established during training. Bad data that are allowed to remain in the data provided for training will affect the calculation of the estimate for each corresponding observation. Figure 6-1 shows an example of a model that has been trained with acceptable data.

Although this particular signal has a relatively high noise content, the estimates (shown as red triangles) track the observations (shown as blue crosses) reasonably well.

6-3

EPRI Licensed Material Data Quantity and Quality 3b.9n I'

36.47 6

36.04 V 35.61 35.18 A 34.75 M 34.32 i

?i 33.89.

I 33.46 :

A 33.03 i, 1 3) An x

x wx x

05124100 07f31/00 10/08/00 12/16100 02f23A10 05/03101 0711101 09118A01 1 1I6101 02A)3AJ2 Time on vs. Time

" LT-042-IN004A: Estimation vs. Time x LT-042-1N004A: Observati 1 in. = 25.4 mm Figure 6-1 Observations and Corresponding Estimates With Acceptable Training Data As shown in Figure 6-1, this level signal ranges from about 32-37 inches (812.8-939.8 mm) over a two-year period. To demonstrate how bad data can affect the quality of the training process, two vectors were added to this model-each contained bad data for this signal. One bad data point had a value of 5 inches (127 mm), and the second bad data point had a value of 55 inches (1,397 mm) (which ensures that these two vectors will be selected by the MinMax training method). Figure 6-2 shows the result. It should be noted that the estimates no longer track the observations well. The two outlying bad data points have influenced the estimation calculations for this signal, resulting in poor model performance for the signal.

6-4

EPRI Licensed Material Data Quantity and Quality n qn

[

37.59 U

36.88 R 36.17 id 35.46.

ro 34.75 34.04 1 t,

33.33; I

32.62 31.91 I;S5 1.2 M

_ Lb

a.

OS24/00 0731£O 10/80W0 1216v00 02f23i01 051/3301 O7V1UMJ1 09U18A01 1 1X26101 02/03Aj2 Time

^ LT-042-1N004A: Estimation vs. Time x LT-042-1NO04A: Obserration vs. Time A___I 1 in. = 25.4 mm Figure 6-2 Observations and Corresponding Estimates With Two Bad Data Points Bad data must be removed from training data sets. Otherwise, incorrect data values will adversely affect both estimation and fault detection in the model.

6.2.2 Examples of Data Problems Figure 6-3 shows an example of bad data acquisition. The values are clearly in error and are well outside of any expected signal range. If the data are left in the training data set, the MSET training vectors will select these values as part of the expected operating state space boundaries.

This type of data problem can identify failures with the data acquisition modules.

6-5 C.OD

EPRI Licensed Material Data Quantity and Quality 10/111/00 11 36 AM 34033334975 33.88000218 33.73333403 29.905332 31.32333233 37.446650205 3

1011I1/00 11 37AM 34.42333328 34106166632 33.533155 30.696663565 28.91839948 37.59000054 37 10110011i3 A LT442333324 LT42-1N01666 33T442-1N004 30T442-115A L428.1999948 LT44-101700 10/11/00 11:38 AM 34.00333278 32.76833382 33.64500052 29.58499997 28.74333185 37.44833292 36.

10/11/00 11:39AM 33.98999972 33.5616654 33.70500137 30.17333287 29.96333368 377166651 31 10/11/00 11:41 AM 2887499873 28.52666705 28.79999919 25.95666593 25.43666737 31 25333321 36 10/11/00 11:42 AM

-3.094.E-07 3.04542E-07 7.00335E-07 7.7486E-07 4.76837E-07 9.53674E-07

-3.6 10/11/0011:43 AM

-6.10948E-07 3,04542E-07 7.00355E-07 7.7486E-07 4.76837E-07 9.53674E-07

-5.:

10/11/0011:44 AM

-6.10948E-07 304542E-07 70003E-07 7.7486E-07 4.76837E-07 9.53674E-07

-5.'

10/11/00 1145 AM

-6.10948E-07 3.04542E-07 7.00395E-07 7.7486E-07 4.76837E-07 9.53674E-07

-5.3 10/11/00 11:46 AM

-6.10948E-07 3.04542E-07 7.00355E-07 7.7486E-07 4.76837E-07 9.53674E-07

-5.<

10/11/00 11:47 AM

-6.10948E-07 3.04542E-07 7.00355E-07 7.7486E-07 4.76837E-07 9.53674E-07

-5.S 10/11/0011 48AM 11.53333359 11.40000054 11 3666668 10.00000085 9.866667318 12.06666797 12.

10/11/00 11:49AM 34.63666527 33.9700017 34.22333197 30.5599998 29.4733337 36.8633318 37.

10/11/00 11 50 AM 34.08666813 33.40666593 33.50666707 29.26000107 27.3333328 37.06666453 36.

10/11/00 11 51 AM 34.82666427 33.8533334 34.0866656 30.03333333 28.09333387 37.5933332 31 10/11/00 115l AM 34.2800016 33.02666593 3362000107 29.94000053 29.65333333 37.6466664 37.

10/11/0011:52AM 34.6033335 34.1666657 34.02333293 30.0533328 29.85333227 3779999983 36.

10;11/0011:53AM 34.2133342 33.9599996 33.8599992 31.2133336 2755333307 3776666587 3

10/11/00 11:54 AM 34.5600003 34.15333303 33.84333433 30.47333377 30.09999917 37.40666703 36.

10/11/00 11:55AM 34.18666797 3361333603 33.96333443 290200008 29.31666667 38.46333063 37.

? 10/11/00 11:56 AM 34.34999973 33~90999937 34.103335 30.139999 29.90999947 37.4299983 Figure 6-3 Bad Data Acquisition Figure 6-4 shows an example of data lockup for one channel. Because of a data acquisition problem, one signal is frozen at a value of 40,000, and its actual value is unknown. This data should not be used for training because it can degrade the estimation results.

6-6 co7

EPRI Licensed Material Data Quantity and Quality 10/6I004:54AM 397025 39877.33333 39275.16667 39235 39200 39519.66667 10/6/004 55AM 39700 39870 39571.66667 39271 33333 39047.66667 39385.16667 39(

10/6/004 56 AM 39790.33333 39971 39557.5 39391 33333 39366 3940783333 1016/00 4 57 AM 39697.83333 39880.16667 39436.83333 39388 66667 39446 39554 10/6/00 4.57 AM 39708.83333 39890.33333 39460 5 39386 39606 39720 39' 1016/00 458AM 39761.83333 39762.5 39635.16667 39410 39586 39700.5 10/6100 459 AM 39678.16667 40000 39625.33333 39404.66667 39502 39587,83333 1016100 5.00 AM 39675 40000 39523.33333-391951 39476.66667 39762 5 39:

1016/00 501 AM 39660.5 40000 39457.33333 39254.66667 39469.33333 39683.16667 39:

1016/005:02 AM 39772.66667 40000 39579.16667 39287.66667 39487.33333 39557.16667 10/6/005:03 AM 39615.33333 40000 39350.16667 39332 39543.66667 39751 39:

10/6100 5:05 AM 39630.83333 40000 39406.83333 39322.66667 3936033333 39652.16667 39:

10/6100 5:06AM 39611 40000 39408.5 39231.33333 39398 39664.5 39:

1016/00 5:07 AM 39739.66667 40000 39549 39340.33333 39397 39742.33333 39!

10/6100 5:08 AM 39630 40000 39528.16667 39340 39335 39721.33333 39' 1016/00 5:09 AM 39810.33333 40000 39577 39452.66667 39571.33333 39807 39' 10/6/00 5:10 AM 39603.5 40000 39465.16667 39293 39327 39802 10/6(00 5:11AM 39749.66667 40000 39465.83333 39268 3938533333 39675.5 10161005 12AM 39643.33333 40000 39492.83333 39258.66667 39632 39612.5 39:

10/6/00 5:13AM 39687.83333 40000 39331 39404 39510 39639.33333 10/6100 5.14 AM 39750.83333 40000 39460.5 39313 33333 39506.66667 39641.33333 39:

10/6100 5:15 AM 39766.83333 40000 39476.33333 39298.33333 39533.66667 39626.5 10/61005 15 AM 39694.5 40000 39614 39425.33333 39493,33333 397485 10/61005 16 AM 39704.5 40000 39621.66667 39444.66667 394113333 39686 39:

10/6/00 517 AM 39756.66667 40000 39550 39130 39466 39614.16667 39:

10/6100 5:18 AM 39650.16667 40000 39500 16667 39408.66667 39302.66667 39608 10/6/00 5:19 AM 39684.33333 40000 39519 39347.66667 39402 39688.66667 Figure 6-4 Data Lockup for One Channel Data lockup creates constant-valued signals that should not be included in any training data. The MSET model training algorithms are intolerant of signals that are constant valued, whether these arise from data lockup or from signals that are truly unchanging over time. Constant-valued signals cannot be effectively correlated to any other signal's behavior because such a signal is clearly independent of changes in other signals. Constant-valued signals in the training data also introduce numerical instabilities in the MSET training procedures.

If a signal is stuck for an extended period, it will be necessary to remove it from the model until satisfactory data are available (in one instance, two signals were stuck for several months because dummy signals had been inadvertently inserted in place of the actual signals). Figure 6-5 shows an example of a stuck channel during an on-line monitoring analysis. It should be noted that the estimate (shown as red triangles) continues to predict where the value should be even though the observations are stuck for an extended period. This is the usual result when a stuck data value occurs during monitoring. This behavior can be identified by an OLM system through the use of the SPRT variance test (see Section 8). While it is essential that stuck data should not be included in the training data, stuck data occurring during monitoring will be readily identified, and the corresponding instrument channel can be investigated to determine the cause of the error.

6-7

EPRI Licensed Material Data Quantity and Quality 111.00, 1 j

.1 I

W.1 jx

,6 108.00 105.00 102.00 99.00 96.00 93.00 90.00 87.00 84.00 Ri 81 i

AA A

A IAb Aa A

a 05124I100 071121D0 08i30100 10/18W00 12106/00 01/24io1 03/1411 0510211 Tirne Xc APRM 1: Observation vs. Time a APRM 1: Estimation vs. Time 061201o01 081081 I

Figure 6-5 Example of Data Lockup in an Analysis Figure 6-6 shows an example of data lockup for all channels. Because of a data acquisition problem, all channels are frozen, and the actual signal values are unknown. These data should not be used at all. The data immediately before and after the frozen data are often also suspect.

6-8 Cod

EPRI Licensed Material Data Quantity and Quality Timne LT4042-1 N004A LT04242-N00B LT4042-1 N04C LT4042-1i6A LT-0424116B LT4042-1NM17 LT~i 6116100 6:05 PM 33.74499985 34114666687 33.87333173 32.75333353 31.5400007 38.52333187 37 3 616/00 6:06 PM 34.17999893 33.67666725 33.79499943 31.29333498 29.20166663 39.45500007 37.

6/16100 6:07 PM 33.8716666 3.166666702 33.53666702 30.2983334 30.0833324 37.12000107 37 6/16/00 6:0 PM 33.7350007 34 04166413 33.78166662 31 38500018 32.31333092 38.39666518 36 6/16/00 6:09 PM 33497833227 34310166635 34.13999895 31264333333 31.11000175 38002666927 6/16/006 13PM 33.638332828 33.72118137 33.6750007 30.8 6 2 30.88999957 3 7301666763 37.

6/16/0 6: 10 PM 33.85166815 33.5900011 34.09999848

30.

1 30.3549990 7.37999948 37.

6/16/00 6:11 PM 33.99499975 33.20833353 33.71333382 30.59666662 32.10666692 37.42500088 37.

6/16/00 6:12 PM 3417999962 33.53999797 33.95666478 30.24166667 32.02166682:

38.06000038 36.

6/16/00 6:13 PM 33.92666458 33572999817 33.67500077 30.81666745 31.23333343 36.90333633 37.

6116/00 6:14 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6.15 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6116/00 6:17 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6:18 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6:19 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6:20 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6.21 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6:22 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/1 6/0006.23 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 66/00 6:24 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 6/16/00 6:25 PM 33.899997 33.599998 33.600001 30.700001 30.8 36.500004 m 6116100 6:26 PM 33.899997 33.599998 33.600001 30.700001 308 36.500004 6/d~16/00 6 27 PM 33.899997 33.599998 33.S00001 30.700001 30.8 36.500004 6/S16100 6:27 PM 33.899997 33.599998 33.60001 30.700001 30.8 36500004 6/16100 6:28 PM 33.899997 33.599993 33.600001 30.700001 30.8 36.500004 6J16/00 6:29 PM 33.899997 33.599998 33.600001 30.700001 308 36.500004 6 /16!00 6.30 PM 33.899997 33.599998 33.600001 30.100001 30.81 6.000 Figure 6-6 Data Lockup for All Channels Figure 6-7 shows an example of data lockup for two channels. Because of a data acquisition problem, two signals are frozen, and their actual value is unknown. These data should not be used for training because they can degrade the estimation results.

C6-6-9

EPRI Licensed Material Data Quantity and Quality 1 ime FT1P 42-1 N033A3 FT1 42-1N033B FT042-12N33C3 FT3642-N033D FT443-9N614A FT443 0000 F39 10/6/00 12:31 PM 8.654333053 8.065333163 8.611333167 7024666643 39680 40000 39:

10/6/00 12:32 PM 8 514333333 8.1416662770 8.668999783 6 704 5016 39557 40000 39 10/6/00 12:33 PM 9554000007 7.97999974 8.644000033 6381533358 39657.33333 40000 39 10/6/00 12:35 PM 8.291333193 8.090666127 8.549333353 6.931333149 39632 40000 39:

10/6/00 12:36 PM 8.714333067 8.166333587 8.520666663 7.05333523 39617.66667 40000 39:

10/6/00 12:37 PM 8.66399985 8.3099998 8.56699982 6.756 39699 40000 10/6/00 12 34 PM 9.083999773 8.3099998 8.45399998 6.326000113 39652.66667 40000 10/6/00 12439 PM 8.944666433 8.3099998 8.645666497 6.55099996 39642 66667 40000 39:

10/6/00 12:40 PM 8 84733356 8.3099998 8.452667143 6.53933243 39603 40000 39:

10/6/00 12:41 PM 8.66600035 8.3099998 8.64999975 6792999985 39720 40000 10/6/00 12:42 PM 8.886990353 8.3099998 8.744333 6 532999863 39702 40000 39:

10/6/00 12:43 PM 8.803666653 8.3099998 8.557999893 6.46133341 39695.66667 40000 10/6/00 12:44 PM 8.61800025 8.3099998 8.580000087 6.688333143 39781.66667 40000 39 10/6/00 12:45 PM 89524666883 8.3099998 8.707666417 6971533327 39854.66667 40000 39 10/6/00 12:45 PM 8.589000373 8.3099998 8.701666527 6.74466642 39779 40000 10/6/00 12:46 PM 8.680666167 8.3099998 8.6426666 6.32866676 39736.66667 40000 39f 10/6/00 12547 PM 8 99333294 8.3099998 8.541333327 6.77799986 39636 40000 10/6/00 12:48 PM 8.978333283 8.3099998 8.61233326 6.918666843 39756166667 40000 10/6/00 12:49 PM 8.7700001 8.3099998 8.4949995 6 3499999 39630 40000 10/6/00 12:50 PM 8652999985 8.3099998 8.617999953 6.69166671 39735666667 40000 102 1 o6/00 12:51 PM 8.199333453 8.3099998 8.517333687 7.0466667730 39576 40000 39:

10i6/00 12:53 PM 8.496999587 8.3099998 8.389333727 6.943666817; 39612 40000 39, 101/600 12:54 PM 8.50600038 8.3099998 8.514999923 7.18033334 39686 40000 X 10/6/00 12:55 PM 8.62833326 8.3099998 8.661333307 6.788666913 39761 66667 40000 39, 10/6/00 12:56 PM 8.33000026 8.3099998 8.626666947 7.006 39622.66667 40000 10/6/00 12:57 PM 8.664999767 8.3099998 8.656666867 6.985000017 39696.66667 40000 39' 10/6/00 12 58 PM 8.81033309 8.3099998 8.58233352 6 78800021 39679 333331 40000 39!

Figure 6-7 Data Lockup for Two Channels Missing data will cause fault alarm problems with the model in operation. Removing these occurrences from the data sets before use will reduce the occurrence of alarms. Figure 6-8 shows an example of this problem type. Note that if these data were used during monitoring, the missing data would result in annunciated alarms. While thorough data cleanup is required for training, it is not a requirement for monitoring. Bad data that are included in the monitoring data will result in alarms from the OLM system that can then be investigated to determine the source of the alarm. Although it is not a requirement, data cleanup of monitoring data sets will ensure that the number of alarms is reduced. This discussion is intended to address data cleanup concerns for a real-time implementation. It is not the case that every bad data point needs to be removed prior to monitoring; however, it is the case that these bad data points will produce alarms from the OLM system. Note that removing bad data from the training data is still required.

6-10 cl I

EPRI Licensed Material Data Quantity and Quality Il il 11/20/00 2.55 PM 11/20/00 2:56 PM 11/20/00 2:57 PM 11/20/00 2.59 PM 11/20/00 3:00 PM 11/20/00 3:01 PM 11/20/00 3:02 PM 11/20100 3:03 PM 11/20O00 3:04 PM 11/20/00 305 PM 11/20100 3:06 PM 11/20/00 3.07 PM 11/20/00 3:08 PM 11/20/00 3:09 PM 11/20o0o 3:09 PM 11/20/00 3:10 PM 11/20/00 3:11 PM 11/20/00 3:12 PM 11/20/00 3:13 PM 11/20/00 3:14 PM 11/20/00 3:15 PM 11120/00 3:17 PM 11/20/00 3:18 PM 11/20/00 3:19 PM 11/20/00 3:20 PM 11/20/00 3021 PM I

11 /ln

'A 22 PM 38995.1 39041 75 38903.9 39014.88333 39033.18333 38984.93333 38844.1 39026.56667 390434 39040.68333 38882.36667 38848.71667 39006.75 38995.38333 38908 53333 39025.58333 39046.86667 38992.43333 38991.63333 38859.06667 38882.21667 38930.91667 38865.45 38891.63333 39025.15 38906.2 39014.78333 39103.96667 39105.5 39089.16667 39167.8 39151.83333 39123.8 38988.43333 38959.5 39265.36667 39097.6 39186.03333 39107 39225.73333 39198.63333 39070.5 39221.83333 39130.2 38998 96667 39057 16667 39230.5 39161.06667 39166 33333 39143.26667 39191.36667 39197.13333 39107.63333 3249A 38731.26667 38912.5 38879.9 38714.66667 38830.16667 38819.6 30456.36667 99.01666792 38619.5 99.1 38591.2 99,0016672 38422.9 99 0533328 38537.9 99 0533328 38535.53333 99.12166597 38543.1 99.100006 38457 99.10000445 38613.16667 99.153331 38395.4 99.06500158 384776 99.08666312 38545.73333 99.14166678 38498.4 99.04833437 38445.33333 99.1966662 38414 7 99.06666842 38547.83333 99 0849964 38575.2 99.1249993 38403.86667 99.056668 38433.93333 99.1116677 38598.5 99.15166702 38435.8 99.10166972 38561.56667 99.04666605 38523.63333 99.099994 38471.13333 99 099994 38549.93333 99.1149959 38539.93333 99.14166792 8A4071 0

91 AMMr.

3)1

.i Figure 6-8 Missing Data Sometimes the data values are incorrect but might appear to be within a reasonable range. Figure 6-9 shows four steam-flow signals as well as the total steam-flow signal. The total steam flow should be the sum of the four individual signals. Throughout the period shown, reactor power was constant at about 100 percent; other correlated signals behaved normally. As can be seen, three of the four steam flow signals are fluctuating because of a problem with the data acquisition system. The total steam-flow signal is nearly constant during this period and is at the expected value. These erroneous individual steam-flow signals must be removed from any training data sets to prevent the model from learning this type of behavior as normal or expected.

6-11

EPRI Licensed Material Data Quantity and Quality 6/1/00 0:00 6/1/00 0:01 6/1/00 0:02 6/11/00 0:03 6/1/00 0:04 611/00 0:05 611/00 0:06 611/00 0:07 611/00 0:08 6/1/00 0:09 6/1/00 0.09 6/1/00 0:10 6/1/00 0 11 6/1/00 0:12 611/00 0 13 6/1/00 0:14 6/1100 0 15 6/1/00 0 17 6/1/00 0:18 6/1100 0:19 611/00 0.20 6/1/00 0:21 6/1/00 0022 611/00 0:23 611/00 0:24 6/1/00 0:25 6/1/00 0:26 3.74 3.74 3.74 3.73 3.73 3.73 3.74 3.73 3.74 3.73 3.73 3.72 3.73 3.73 3.74 3.73 3.73 3.73 3 73 3 72 3.73 3 73 3.73 3.74 3.73 3 73 3 73 3.61 3.62 3.43 1.88 3.60 3.61 3.60 3.60 3.60 3.19 2.11 3.60 3.60 3.61 0.54 1.92 2.59 362 3.43 0.00 1.63 1.69 2.46 2.83 3.61 3.18 2 in 3.70 3 70 3.50 1.91 3.71 3.69 3.70 3.70 3.69 3.26 2.15 3 70 3.70 3.69 0.55 1.97 2.66 3.71 3.51 0.00 1.67 1.72 2.53 2.90 3.71 3.26 215 3.66 3.66 3.47 1.89 3.66 3.66 3.66 3.65 3.65 3.23 2.13 3.64 3.66 3.66 0.55 1.95 2.62 3.65 3.47 0.00 1.64 1.70 2.49 2.85 3.65 3.22

.12 14.70 14 70 14 70 14.70 14 75 14.80 14.71 14.75 14.70 14,75 14.70 14.70 14.70 14.70 14.73 14.72 14.70 14.70 14.70 14.70 14.70 14.75 14.70 14.70 14.70 14.70 14 70 Figure 6-9 Incorrect Data Values - Several Signals Sometimes, a single signal fluctuates wildly because of a data acquisition problem. Figure 6-10 shows an example in which a steam pressure signal, (normally about 970 psig/6688 kPa), varies from -280 psig to over 2000 psig (-1931 kPa to over 13790 kPa) in a 30-minute period during which reactor power was constant. This is not a sensor problem-it is a data acquisition problem.

Notice also that another signal (FW-FT-7) is stuck during this time period.

6-12 CQ3

EPRI Licensed Material Data Quantity and Quality

U M

M H

IN N

M N

a19 M

R2 Al W

N M

M M

M 29 Rf W

M M

99 99 W

M A

1%

M Ml mg EN N

ff9 IN 91 Time Sec Time of Data 1111099 19:13 11M109919:14 1111O9919:15 11M1099 19:17 1110199 19:18 1110199 19:19 11 A 0i99 19:20 1111019919:22!

111019919:23 1 1 0199 1924 11110199 1 9:25 11110199 19:26 11 M 0199 19:27, 1 1M 019919 :28 1111 0199 19:30 11110199 19:31 1111019919:32 11110199 19:331 11110M19919:34

,111M 0199 9:35 1110199 19:36 A110199 19:37 11110199 19:38 11M0199 19:39 11 M0199 19:40 111 199 19:41 11110199 19:42 11110199 19:43 11101,99 19:45 11110199 19:46 11 Ol99 1 9:47 11110199 19:49 111/0199 19:50 1111019919:51 rMAE FW-FT-7 IEX-PT-29,EX-PT-30 ~FV-PT-64 MWE A5004 860.996 860.996 860.933 860.933 855.666, 862.71 862.71 856.3 861.06 860.806 860.806 860.806 860.933 856.871 856.871 856.871 856.871 i 856.871 861.377 861.377 856.617 856.491 856.491 856.491 861.123 856.237 856.808 856.808 856.808 862.075 855.919, 862.392 862.392 MLBIH A0021 5.48438 5.48438 5.48438 5.48438 5S.48438i 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 1 5.48438 5.48438i 5.48438 5.48438 5.48438 5.48438 5.48436 5.484386 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 5.48438 S.48438 5.484361 5.48438 5.48438 5.48438 PSIG PSIG PSIO A0024 A0025 A0032 478.516 269.238 969.238 478.516 269.336 969.238 478.516 269.336 530.273 478.516 268.262 397.949 479.883, 269.238 1575.68 478.125 269.727 1575.68 478.906 269.043 2008.3 478.906j 269.531 2008.3 478.711 269.629 2004.88 479.492; 269.434 2004.88 476.711 9.531 1990.15 478.321 269.336 1990.15 478.516i 269.336 1966.7 478.516 269.141 1997.04 478.32' 269.141 1997.04 478.711 268.848 2004.41 478.711 268.945 1997.04 478.125 269.141 997.273 478.32 269.043 997.273 477.734 268.555 -1.88184 478.32 269.141 -1.88184 476.367 268.164 8.00464 477.734 268.75 1004.97 478.32 269.141 1004.97 478.32 269.238 1793.67 478.906 269.238 1093.59 478.516 269.238 762.108 478.516 269.043 270.332 478.125 269.043 -1.41162 478.32, 269.043 8.00464 477.734 268.652 -281.324 479.102 268.457 -281.324 477.93 269.531 992.946 478.125 268.75 992.946 Figure 6-10 Incorrect Data Values - One Signal Data files will occasionally contain a single bad data point for no apparent reason. Figure 6-1 1 shows an example in which steam pressure is about 985 psig (6791 kPa), and the value for one measurement is instead stored as -4.99 psig (-34.4 kPa). This erroneous value cannot remain in training data sets because it will be selected in the training data as representative of normal behavior. Some on-line monitoring applications such as SureSense provide user-configurable training data quality filters that can detect and remove some of these problem types from the training data automatically.

C6L1 6-13

EPRI Licensed Material Data Quantity and Quality Time or 3331 3332 3333 3334 3335 3338 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 Transmiter P0420A 264A3 984.92 985.06 9W4.92 965.08 984.92 964.90 964.40 984.26 984.28 964.28 983.94 904.26 964.24 964.28 964.10 984.12 Transmiter P0421A 986.73 987,06 987,23 987.04 987.23 996.90 987.23 996.72 986.88

-4.99 987.04 98857 987.04 987.04 988.90 987.04 987.06 Trammitter P*b422A 988.72 987.86 967.21 967.04 987.70 967.04 967.86 967.21 967.04 987.04 987.37 967.04 967.04 f37.#;3 907.37 907.04 987. 53 Figure 6-11 Incorrect Data Value - One Point Data quality problems are not always apparent at first review. Figure 6-12 shows an example in which one redundant channel experiences a step-change drop from 541 0F to 91 0F (2830C to 33 0C). The other redundant channels show no significant change. The new values for TE-001-10 IB are incorrect and are probably caused by a data archive or data extraction error. Notice that another signal is stuck during this period. Figure 6-13 shows a later result when the frozen channel clears from a slightly different value-it takes a step change from 542 0F to more than 10,000 OF (283 0C to more than 5538 0C). This is truly terrible data that requires careful screening before use. Some on-line monitoring applications such as SureSense provide user-configurable training data quality filters that can detect and remove some of these problem types from the training data automatically.

6-14

EPRI Licensed Material Data Quantity and Quality liiiMCI1IS.

I Time 3/15/00 11:35 3/15/00 11:36 3/15/00 11:37 3/15/00 11:38 3/15/00 11:39 3/15/00 11:40 3/15/00 11:41 3/15/00 11:42 3/15/00 11:43 3/15/00 11:44 3/15/00 11:45 3/15/00 11:46 3/15/00 11:47 3/15/00 11:48 3/15/00 11:49 3/15/00 11:50 TE-001-101A 543.-19995 543.19995 543.19995 543.19995 543.19995 543.19995 543.19995!

543.19995 543.19995 543.19995 543.19995 543.19995 543.19995 543.19995 543.19995 TE-001 -101 B 541.59996 541.59996.

541.59996 541.30004 541.2 541.39995 541.40001 90.9 90.9 90.999998 90.599997 91.4 90.700003 91.099997 90.900003 r-JwU -1 543.20004 543.29999 543.20001 543.29999, 543.59998 543.59998 543.2 543.49999 543.49999 543.5 543.30005 543.79995 543.20004 543.39999 543.49997 543.20004 541.80001 541.80001 541.99999 541.80004 541.60002 541.89998 542.19997 542.00002 541.80001 541.50002 542.19995 542.19995 541.99994 541.90003 541.7000 1 541.99999 Figure 6-12 Incorrect Data Values - Unreasonable Change for One Redundant Channel cAGI'(

6-15

EPRI Licensed Material Data Quantity and Quality MgI

=

liii 1.3-3.pmg I

001-1 TE-OUl-luic fFE-UU1-1U1U a

3/23/00 0:21 542.40004 92.4 543.30001 541.60005 3/23/00 0:22' 542.40004 92.699995 543.20003 541.89998 3/23/00 0:23 542.40004 92.200003 543.00002 541.8 3/23/00 0:24 542.40004 92.599997 543 541.89998 3/23/00 0:25.

542.40004 92.300001 543.5 541.69997 7

3/23/00 0:26 542.40004 92.4 543.30005 541.60005 3/23/00 0:27 10603.2 92.200003 543.99994 541.89998 3/23/00 0:28 10596 91.700003 543.30005 541.70003 3/23/00 0:29 10619.8 92.4 543.10004 542.29994 3123/00 0:30 10609.2 92.200003 543.19996 542.39998 3/23/00 0:31 10594 92.4 543 542.10005 T

l 3/23/00 0:32 10599.8 92.000006 543.39997 542.00002 4

3/23/00 0:33 10587 92.199995 543.4 541.90004 1

3/23/00 0:34 10544.4 93.199995 543.59995 541.90004 16S 3/23/00 0:35 10602.699 92.400007 543.20005 542.39998 E17 3/23/00 0:36 10567.801 92.699995 543.29996 542.00002 Figure 6-13 Incorrect Data Values - Unreasonable Change for a Second Redundant Channel Signal data should have a resolution of several decimal places. The method by which data are stored in archives or obtained from archives sometimes results in truncating the signal data, resulting in integer data as shown in Figure 6-14. Figure 6-15 shows the effect of this truncation on the fault-detection analysis. MSET continues to produce estimates (shown as red triangles) between the integer data values (shown as blue crosses), which results in spurious fault-detection alarms. In general, the estimation results cannot be more accurate than the data used for training.

6-16

EPRI Licensed Material Data Quantity and Quality l~f1 Year of date sampled every 2 hours2.314815e-5 days <br />5.555556e-4 hours <br />3.306878e-6 weeks <br />7.61e-7 months <br /> TRUE 527 S1 3262 TIME PT-001-1038 PT-001-103A LT-042-lN004,LT-042-1N004[LT-042-1NO04(LT-042-115A LT-042-1 158 LT-042-11N sec PSIG PSIG inches inches inches inches in H20 in H20 Time of Data T015 T014 E1236 E1237 E1238 E1338 E1418 E1239 6/1/00 2:43 994 983 33.05166608 32.25833227 32.95166615 29.96166663 30.73500003 35.2716-6/1/00 4:56 994 983 33,41833153 32.98666538 33.4299991 30,75 29.91833418 35.5933' 6/1/00 7:07 994 983 33.51666602 33.47499847 33.7233342 30.05500012 30 99000047 35.2316(

611/00 912 994 983 34 59999948 33.98999845 34.54166473 31 33166673 31,90999897 6/1/00 11:13 994 983 34.61999933 34.34833297 34.40999972 31.1416665 30.10833333 38.4266f 6/1/00 13:13 995 983 34 75999972 34.076666 34 7533352 32.0616656 32 48999858 37.9400(

6/1/00 15:22 994 983 3413666853 33.87666587 34.56333217 30.15000007 31.11666703 38.3066(

6/11/0017:21 999 988 34.31333303 34.21999853 34.51666477 30.71666703 29.93999843 38,99.

6/1/00 19:21 999 988 34.43666517 34.0199995 34 48666643 31.73999887 31.37333437 38.7833.

6/1/00 21:22 999 987 34.37333283 33.5233337 34.37666687 30.52333323 29.51999993 38.499' 6/1/00 2321 999 987 34.35333293 35.25333253 34 90666707 31.020001 33.09333267 38.9933' 6/2/00 1.21 999 988 34.05999933 33.87333387 34i26000033 30.4733338 32 27999993 38.21.

6/2/00 3:22 999 987 34.153332 33.96000143 34.59999877 29.96666693 30.72333293 37 540(

6/2/00 5:21 999 987 34 42333237 33.8400004 34 34666847 31.71333377 30.82666677 36.5866(

6/2/00 7:21 999 987 3438333377 34.56666583 34.52666707 31.07000107 30,58999973 376100(

6/2100 9:22 998 987 3437333427 34.5133314 34i75333213 31.23999987 32.3599992 38.1533.

612/00 11:21 999 987 34.2066652 34.333333 34.33999933 31.89999933 30.426667 38.2399S 6/2100 13:21 999 987 34.7466846 34.38666533 34.6799978 31.86666467 30.7933322 39.8461 6/2/10015:22 1000 988 34.3799985 34.1799985 34.44 31.1399991 309533337 37.8899S 6/2100 17:21 1000 988 34.33332933 34.25999887 35.11999933 31 7533328 30.7 37.7466(

6/2/00 19:21 1000 988 3433999773 34.06666707 35.16666567 31,1466652 29.12666807 39.5400(

Figure 6-14 Loss of Significant Digits 6-17

I EPRI Licensed Material Data Quantity and Quality iR

.2

'X 0t; 1002.00 1001.60 1001.20 1000.80 1000.40 1000.00 999.60 999.20 998.80 998.40 998.00 07i27?0 09u1/20 t

-,A 10o29n0 12/15AJO 01X31£i 1

03/19/D1

- 0516101 06121JO1 087i01 Time

& PT-001-103B: Estimation vs. Time X PT-001-103B: Observation vs. Time 1 psi = 6.894757 kPa Figure 6-15 Effect of Loss of Significant Digits 6.2.3 Removing Bad Data From Data Sets The previous section provides several examples of bad data in actual data sets. Bad data must be removed from any files used for training; it is beneficial to remove bad data from historical data sets used for verification testing. Data representing actual sensor problems or component failures must also be removed from the training data. Our intention is not to train the system to treat bad data or signal failures as normal. Regardless of whether the data are bad or represent an actual sensor or equipment problem, the data do not belong in the training data set.

6-18

EPRI Licensed Material Data Quantity and Quality It is recommended that the following points be considered and followed for the identification and removal of bad data:

Many users will generate their data sets in a form readily accessible by Microsoft Excel or similar spreadsheet programs. The Microsoft Excel conditional format tool can be used to identify missing data or data that are outside of an expected range. It is necessary to highlight the data to be checked and to specify the conditional format criteria. This approach is valuable for data sets containing blanks or bad data that are well outside of the expected operating range. Figure 6-16 shows an example of the conditional format feature. In this example, any cells with a value of less than 1.0 are highlighted in yellow, and any blank cells are highlighted in red.

Figure 6-16 Example of Conditional Format Feature The data set should be reviewed for data lockup or frozen data. This check is usually performed visually (although it could be checked using a macro that calculates a running standard deviation). Frozen data left in the training set might be selected as part of the training vectors. The estimation technique would then treat this frozen data as normal behavior. Any rows containing frozen data should be removed. The rows just before and just after the data that were removed should be reviewed because these rows might also have problems. The value of this check will vary from plant to plant, depending on how the computer system acquires and stores data. In some computer systems, data lockup is readily identifiable because all signals randomly vary in value each minute, and the computer stores the measured value at the sampling frequency. In other computer systems, data values are updated in storage only as values vary by more than some specified amount from the last reading. In these systems, most or all signals tend to remain unchanged for some period of time and then experience a step change to the next value.

After removing obviously bad data, basic statistics for the remaining data should be reviewed to identify any other potential problems. Additional rows of data might be removed by this screening.

6-19 I

EPRI licensed Material Data Quantity and Quality

It might sometimes be difficult to distinguish between bad data and actual problems with sensors or other signal conditioning components. Bad data will usually "heal" itself at some point, whereas failed sensors rarely recover.

Data limits should be applied in the SureSense signal definition settings to keep data that are outside of the possible range of sensor operation from being considered in the training set.

The training data should be tested after the model has been trained to review the data quality.

If necessary, additional outliers in the training data set should be removed. Even if outlying measurements appear to be real data rather than bad data caused by data acquisition errors, the following question should be asked: Should the model be trained on this data, thereby allowing the model to treat this data as normal behavior?

The basic data statistics should be reviewed to check for data that are outside the expected range. Note that this check is distinctly different than using the data limits described previously. Simply stated, this review is intended to identify the presence of data that physically do not make sense. For example, a reactor coolant system temperature of 200'F (930C) is not possible if the reactor is operating at 100 percent power. Steam system temperature cannot be 10,0000F (55380 C'). Similarly, an emergency water storage tank (such as a refueling water storage tank), probably will not have a low level when the plant is operating at power. Turbine first-stage pressure will not be negative when the reactor is at 100 percent power. These and other examples have occasionally been found in the data obtained from the plant computer archive.

The data should be reviewed for unreasonable signal value step changes. For example, tank level cannot instantaneously change from full to empty, and reactor power cannot step change from 25-100 percent power. This check works best with data acquired at a relatively high frequency such as every minute. If the data are acquired at longer intervals, larger step changes in the data might well be possible.

Bad data must be removed from training data sets. A question that is frequently asked is whether bad data also need to be removed from historical data sets used for testing. Bad data in testing sets degrade the model performance and hinder the evaluation of the model. Each time the model is used, the location of the bad data must be remembered. Questions regarding the bad data will continue to come up with each review. It is recommended that all historical data files be cleaned up before use regardless of whether they are used for training or testing.

6.2.4 Data Limit Filters SureSense can also be used to automatically identify signal data that are outside of a reasonable minimum or maximum range, that exhibit regions of abnormally high or low standard deviation, or that exhibit unreasonably large positive or negative changes in value between successive observations. These tasks are collectively handled through the use of data limit filters.

6-20

EPRI licensed Material Data Quantity and Quality Data limit filters for each signal provide an automated method for screening out bad data. The following four types of data limit filters are provided in SureSense:

Range - Minimum and maximum reasonable observed signal value

Delta - Maximum positive and negative change in sequential observed signal values

Noise - Minimum and maximum standard deviation of observed signal values

Not Present - Not present or invalid signal as marked by the data source Each filter type is specific to both a signal and a phase (each of which is optionally specified by the user during signal definition). For each observation used in training or on-line monitoring, the limit-filter procedure is performed on its specified signal. The limit-filter procedures return the results for each signal indicating the results of each type of limit test. A FAIL result is returned if the signal fails the test.

Limit filters can be selectively enabled or disabled for both training and on-line monitoring. The net effect of these filters is to exclude any data outside the specified limits from training or to declare a signal fault during on-line monitoring if enabled. The following information describes each of the limit filters:

MIN RANGE - A lower limit on the value of the signal. User-definable constants are the minimum reasonable value (Minimum Value) expected for the signal and the number (Filter Interval) of consecutive single-cycle alarms required to generate a FAIL indicator. To return a FAIL indicator for each and every observation within which the lower reasonableness limit is violated, the interval must be set to 1. To filter spurious limit violations and return FAIL only when the signal stays out of range for multiple observations, set the interval to an integer greater than 1.

MAXRANGE - An upper limit on the value of the signal. User-definable constants are the maximum reasonable value (Maximum Value) expected for the signal and the number (Filter Interval) of consecutive single-cycle alarms required to generate a FAIL indicator. To return FAIL for each and every observation within which the upper reasonableness limit is violated, the interval must be set to 1. To filter spurious limit violations and return FAIL only when the signal stays out of range for multiple observations, the interval must be set to an integer greater than 1.

MAXPOSDELTA - An upper limit on ihe positive change in value for a signal between two consecutive observations. The user-definable constant is the limit value (Positive Limit).

After the first observation is acquired in a new applicable phase, the difference between the current observation and the previous observation is computed. Missing values are ignored, and the most recent previous value is used to compute the change in value for the current observation. The delta value (current - previous) is compared to the absolute value of the filter's positive change limit value. If the delta value is greater than the absolute value of the limit, a FAIL condition is returned.

6-21

EPRI Licensed Material Data Quantity and Quality

MAX NEGDELTA - An upper limit on the negative change in value for a signal between two consecutive observations. The user-definable constant is the limit value (Negative Limit).

After the first observation is acquired in a new applicable phase, the difference between the current observation and the previous observation is computed. Missing data values are ignored, and the most recent previous value in the current phase is used to compute the change in value for the current observation. The delta value (current - previous) is compared to the negative of the absolute value of the filter's negative limit value. If the delta value is less than the maximum negative limit, a FAIL condition is returned.

MAXNOISE - An upper limit on the standard deviation of a time series of signal values.

User-definable constants are the standard deviation limit value (Maximum Standard Deviation Limit), the number of consecutive observations used to compute the standard deviation (Standard Deviation Window Size), the delay on entering a new phase (New Phase Delay Interval) before filter application is initiated, and an option control flag (Remove Trend Line). The algorithm computes the standard deviation of a time series of observations for the signal after adjusting for an optional line fit to the data. If the current standard deviation value exceeds the Maximum Standard Deviation Limit, a FAIL condition is reported.

MINNOISE - A lower limit on the standard deviation of a time series of signal values. User-definable constants are the standard deviation limit value (Minimum Standard Deviation Limit), the number of consecutive observations used to compute the standard deviation (Standard Deviation Window Size), the delay on entering a new phase (New Phase Delay Interval) before filter application is initiated, and an option control flag (Remove Trend Line).

The algorithm computes the standard deviation of a time series of observations for the signal after adjusting for an optional line fit to the data. If the current standard deviation value is less than the Minimum Standard Deviation Limit, a FAIL condition is reported. The operation of the settings is the same as for the MAX_NOISE filter. If Remove Trend Line is checked, the standard deviation is calculated with respect to a line fit to the data in the window. Otherwise, it is calculated with respect to the mean of the data in the window.

NOT PRESENT-Checks whether the signal value matches the Not Present value exactly. If a match is found, the value is treated as a flag that the signal is not present or has been marked as invalid by the data source. If the signal is not present, the observation containing the Not Present value receives special handling during processing. The entire observation will be ignored during training. During monitoring or the various analysis routines, those procedures that depend on the Not Present value will be modified or skipped.

Limits provide an additional method of screening data to ensure that the best possible data quality is allowed. If a signal value is outside its specified range, that vector of data (the observation) is excluded from training if enabled. During on-line monitoring, a signal-specific fault is declared when a limit filterfail condition is detected.

If data limits are set too close, a significant amount of data can be excluded from training. The data limits are intended to exclude obviously bad data and should be established on the basis of the full range of reasonably expected data values for each of the defined phases. Similarly, if data limits are set too close, false alarms might occur during on-line monitoring.

6-22

EPRI licensed Material Data Quantity and Quality 6.3 Data Archive Historian and Its Effect on Data Quality 6.3.1 Problem Statement Some data acquisition systems apply a data compression technique, included as part of a data historian, to minimize archive file size. If the signal value does not vary outside a specified range (depending on the computer system, referred to as the data compression limit, factor, or tolerance), the value is assumed to be unchanged. When the value eventually exceeds the data compression limit, the signal value is updated in storage. When the data are later extracted from archive, a linear interpolation routine is used to derive intermediate values between the recorded values. By this approach, a significant reduction in file storage size can be realized for archived data.

This data compression technique has an adverse effect on data quality with respect to on-line monitoring. The following list highlights the main problems:

The linear interpolation routine generates artificial data between recorded data points, which affects the apparent correlation between signals. In some cases, signals that are known to have a high physical correlation can have data with effectively no apparent correlation.

Section 5.3 explains why correlation is important in an on-line monitoring system.

Training data can consist of a combination of real data and artificial data with a degraded correlation between signals. This degrades the overall training quality.

Analysis of training and test data can be no more sensitive than the applied compression tolerance. In some cases, the compression limit has been set at several percent. This can cause a substantial reduction in the on-line monitoring system sensitivity.

6.3.2 The Effect of a Data Archive Historian on Signal Correlation A data historian reduces the size of the archive data file by the following process:

1. A tolerance limit is specified for each signal.

2. Data are acquired by the computer system at some frequency.

3. The current data point is compared to the previous data point for that signal. If the difference between the data points exceeds the specified tolerance, the latest data point is stored. If the difference between the data points is less than the specified tolerance, the latest data point is not stored.

4. Data are added to the archive by the this process only if the change in a data point exceeds the tolerance. When data are later retrieved, data for the missing sample times are generated by a linear interpolation between whatever data were stored. The generated data are artificial because they are, at best, estimates of the real value based on a simple linear assumption about the data behavior.

6-23

EPRI licensed Material Data Quantity and Quality Figure 6-17 shows an example of data archive historian data. Notice that random fluctuations have been filtered out of the data and the signal varies linearly between data points. Referring to Figure 6-17, notice the change in power level from April 6 to April 7. It appears that power was carefully increased from 99.9-100.2 percent in a linear manner over this one-day period.

Actually, power was so constant at near 100 percent during this period that only two data points were stored; the rest of the data were artificially generated.

100.4 -_

100.2 -

Power (percent) 99.8 -

99.6 5-Apr 6-Apr 7-Apr 8-Apr 9-Apr 10-Apr 11-Apr lime Figure 6-17 Typical Example of Compressed Historian Data Figure 6-18 shows another example of historian archive data. Notice that the signal varies linearly between data points, clearly indicative of a linear interpolation routine in a data historian. Notice also that the compression limit is several percent, which adversely affects the possible model accuracy.

6-24

EPRI licensed Material Data Quantity and Quality I

II

.ia

File Define Help 2000-11 36.20 85.60 t83.R1 X

1 11118MO 18:.41:17 1 /I1tOO 23:07:09.

11119100 03:33:01 11/19/00 07:58:53 Time x FT-68-29& Obscni.

Tim e Figure 6-1 8 Historian Data Example - Flow Signal An on-line monitoring method such as MSET depends on the behavioral pattern observed between signals. Included in this pattern recognition is an assumption that a model can be developed based on the learned behavior of a group of correlated signals. For example, if a model consists of four signals that have a positive linear correlation, the model would be trained (hopefully) on data that show how the signals tend to perform as a group. MSET develops an estimate for one signal based on the observed values of the other signals.

The historian data compression technique has the effect of degrading the correlation between signals, even if the raw data are highly correlated. The degree to which the correlation is degraded depends on the specified compression limit. As the limit is made larger, fewer data are stored in the archive with the result that most of the data retrieved from the archive are artificially generated by the linear interpolation routine.

As an example, Figure 6-19 shows three signals that have a high linear correlation. Data for the three signals were created so that the signals are correlated with some noise content by the following equations:

Signal 1 = sin(t) + random number Signal 2 = sin(t) + random number Signal 3 = sin(t) + random number Figure 6-19 shows the result in which the three signals vary in a sinusoidal manner together with noise added by the random number. With the noise content, the correlation coefficient for these signals is about 0.7.

6-25

EPRI Licensed Material Data Quantity and Quality 3

2 1

0

-1

-2

-3 0

20 40 60 80 100 Figure 6-19 Sample Data With Random Variation Included The data historian compressed the data in Figure 6-19 by specifying a compression limit of 1.2.

With this tolerance, the original 300 data points were reduced by about two-thirds, leaving just over 100 data points. A linear interpolation routine was applied to the gaps in the data to retrieve the stored data. The results are shown in Figure 6-20. By generating artificial data, the correlation coefficient for these signals is reduced from about 0.7 to 0.3. With actual plant operating data acquired from such a data historian, it is common to find data correlations between physically correlated signals to be near zero.

3 2 -

1 0

-2

-3 0

20 40 60 80 100 Figure 6-20 Sample Data After Data Compression and Subsequent Archive Retrieval 6-26 c(--i

EPRI Licensed Material Data Quantity and Quality 6.3.3 The Effect of Data Archive Historian With Bad Data Anomalies can occur in the data acquisition system that result in bad or missing data. A common problem occurs with data dropouts in which the signal value falls to zero for some period of time. Data dropouts can also occur in which the value falls to some low level, but still greater than zero. Data spikes can also occur.

Figure 6-21 shows an example of a data dropout. Three values are recorded at different times-a normal value, an anomalous zero value, and the normal value again. The period in which the signal was unavailable is almost two days. Notice that the historian linear interpolation routine generates artificial data down to zero followed by artificial data up again to the normal value.

This type of data error is readily observable in plots of the data by the characteristic slanted "V' shape.

100 90 80

/

Flow 70 (percent) 60 0

/

50 407 30 Pr 20 10 10 Apr-05 Apr-06 Apr-07 Apr-08 Date Figure 6-21 Data Interpolation Errors Data dropouts in a historian are particularly detrimental to on-line monitoring. If the type of data shown in Figure 6-21 is left in training data sets, the on-line monitoring system will train the model to recognize this incorrect behavior as normal.

6-27

EPRI Licensed Material Data Quantity and Quality 6.3.4 Dealing With a Data Archive Historian As shown in the previous sections, a data archive historian can degrade the natural correlation between signals by generating artificial data between stored data points. The following steps are recommended for systems that utilize this data compression feature:

Disable the historian, and allow the system to acquire data at some specified frequency.

If the historian cannot be disabled, lower the compression limit to the smallest possible setting for each signal. Evaluate the data after reducing all tolerance settings, and determine if the correlation between signals is acceptable.

Models are generally created in batch mode using historical data. Recognize that archived data will be subject to the problems just described. Hopefully, the models will perform well enough with this historical data to allow model settings to be defined adequately.

6-28

EPRI Licensed Material 7

INITIAL TRAINING AND ESTIMATION The MSET method is based on developing a training data matrix from which process value estimates are calculated for comparison with corresponding data observations. Although they are treated separately, initial training, fault detection, and retraining are directly related. The quality of the initial training data affects how faults are identified and includes the number of false alarms. Fault identification during model development subsequently leads to retraining in which expected operating states not covered by the initial training are captured. In other words, the evaluation of failure alarms often identifies operating states not adequately described by the data used for initial training.

Section 7 describes initial training (including how estimates are developed). Section 8 continues this discussion by describing fault detection, the evaluation of identified faults, and retraining.

7.1 Training and Estimation Methods - Technical Overview 7.1.1 Training After a comprehensive and error-free set of training data has been assembled, training algorithms are used to build a diagnostic model of the selected signals. Training is a two-step procedure that begins with the development of an MSET parameter estimation model followed by calibration of the fault detectors.

Training characterizes the expected behavior of signals in a model using historical operating data. Specified data files are used for training; MSET selects those observation vectors from the files that best define the expected operating space. An observation vector is a complete set of signal data values for a given point in time. For example, if the data are contained in a typical spreadsheet, an observation vector is one row of data.

Two types of training methods are currently available in MSET:

MinMax

Vector ordering The following sections describe each of these methods.

7-1

EPRI Licensed Material Initial Training and Estimation 7.1.1.1 MinMax Training The MSET training procedure evaluates the training data and selects a subset of the observations that are determined to best characterize expected operation for the selected signals. The MinMax procedure is used to identify and select the observations containing the minimum and maximum observed values for each included signal. The minimum and maximum observed values define the boundaries of the valid operating range of the MSET model. The selected observations are placed in the MSET training matrix, which is also known as the D-matrix (from its mathematical description).

MinMax builds the smallest possible trained model and is alone only suitable for a model having a very limited operating state space (in fact, no models described in this report use MinMax alone for training). The MinMax option bounds the operating state space represented in the training data by selecting the extreme value vectors for each included signal. Thus, two vectors are selected for each signal-the vectors containing the smallest value and the largest value. In many instances, the selected vectors for other signals will be the same; when one signal is at its extreme value, other signals might also be at their extreme values. Therefore, MinMax selects between 2 and 2n vectors, where n is the number of included signals. Typically, the MinMax option chooses a number of vectors that is greater than the number of signals, but less than twice the number of signals.

The method by which MinMax selects training vectors and subsequently affects estimation is important to understand. Suppose the training data used in a model contain 2998 vectors for four signals. If MinMax is used for training, eight vectors might be selected for training as shown in Table 7-1. (From the possible 2998 vectors, the minimum and maximum vector for each signal is highlighted in yellow and green, respectively.)

Table 7-1 Level Example Vectors Selected by MinMax Signal Vector #1 Vector #2 Vector #3 Vector #4 Vector #5 Vector #6 Vector #7 Vector #8 LT-01 62.950 62.963 63.174 63.439 63.590 63.590 63.351 LT-02 62.611 62.598 62.722 62.976 63.200 63.164 62.888 LT-03 61.728 61.716 61.653 61.806 62.080 62.030 61.816 LT-04 58.738 58.725 58.798 58.486 58.735 58.848 58.926 The following observations are important to consider here:

Out of 2998 vectors available for training, MinMax selected only the eight vectors that bounded the signals in the data set. By this method, the boundary of the operating state space is mapped, but the interior states are not characterized at all.

Regardless of whether the operating space covers a small or a large region, MinMax defines only the extreme boundaries of the region. If the operating space is large, the estimates produced from the MinMax training vectors will likely include significant uncertainty.

7-2

EPRI licensed Material Initial Training and Estimation MinMax selects between 2 and 2n vectors, where n is the number of included signals. For highly correlated signals, one would expect that one signal is at its maximum (or minimum) value at the same instant that the other signals are also at their maximum (or minimum) values. This line of thought would lead one to expect that the number of vectors selected would routinely be close to the minimum number of two. This turns out not to be the case for typical power plant data. For the models presented in the appendices with n signals, the number of vectors selected by MinMax tends to vary anywhere from n to 1.5n vectors. There are enough random variations in the data that multiple minimum and maximum vectors are selected. Oddly enough, this seems to be relatively insensitive to the size of the training file. For example, a boiling water reactor (BWR) steam system model contains 39,566 vectors of data in the training file for the 19 signals; the MinMax algorithm selected 33 vectors. If the data in the training set are sampled at every 150 points, the number of available vectors is reduced to only 263 vectors of data, yet MinMax still selected 32 vectors.

MinMax is not recommended as the sole training method. In the following section, a training method is described that uses MinMax to define the operating state space boundaries, then fills in the training matrix with additional vectors from the interior operating states.

7.1.1.2 Vector Ordering Training After MinMax vector selection is complete, the training procedure selects a number of additional points that best characterize the model's operating states between the minimum and maximum limits. The method used to fill in the additional states is a statistical procedure known as the vector ordering technique. The procedure begins by first ordering the training data observations based on the weighted value of each vector (described in the following paragraph). The procedure then selects observations at equal intervals using a spacing criterion (excluding those vectors previously selected by the MinMax procedure) to fill the training matrix with the user-specified number of unique observation vectors.

The vector ordering technique operates as described here.

First, the algorithm calculates an ordered vector E = lE1 E2,..., EPI such that the elements of the vector are sorted in ascending magnitude for each of P observations in the training data. The elements of E are each the weighted value of the individual observation vectors in the training data. The weighted values are computed as the square root of the sum of the squares of the signal data values contained in an observation vector, or Ej =:

where Xj is the value of signal i within the observation vector and Ej is the calculated weighted value for vectorj.

Note: One consideration for the vector ordering procedure is the average value of each signal in the model. If the input data have been normalized, each signal will have approximately equal weight in the vector ordering method. If the input data have not been normalized, the larger-7-3

EPRI licensed Material Initial Training and Estimation valued signals will have greater influence in the vector ordering method. The feature is software specific and should be discussed with the software provider. As a rule of thumb, data should be normalized prior to performing vector ordering to eliminate these effects.

The number of vectors chosen depends upon a user-specified spacing parameter, F, which ranges between 0-1. The selection procedure begins by selecting the column vector that corresponds to element E l. It then loops through each elementj in the vector E. The algorithm finds the next vector element E, that satisfies the equation:

E j -E PMV > F (EP -El) where E pr is the element of the vector E that was previously selected (initially E P, = E,).

The value of F determines the number of vectors that are selected. SureSense automatically computes the value of F that will result in a training matrix of the user-specified size. All vectors selected are compared to those vectors previously selected by the MinMax procedure. Only those vectors that were not already included by the MinMax procedure are added to the training matrix.

The final step in the MSET model training procedure is a matrix inversion operation that computes a similarity matrix (also known as the G-inv matrix) from the training matrix. The similarity matrix is used during monitoring to compute a measure of similarity or overlap between a new observation and the observation vectors stored in the training matrix. The matrix inversion operation is a computationally intensive procedure that prepares the MSET model for on-line monitoring use. The matrix inversion operation is required only once during the training step.

In summary, vector ordering begins with MinMax and fills in the operating space with a user-defined number of additional training vectors. For example, the previous section provided an example for four signals in which MinMax selected eight vectors. If 20 vectors are specified for the vector ordering option, the eight MinMax vectors will be selected as well as 12 additional vectors. The additional training vectors are intended to be as evenly spaced as possible across the operating state space so that the estimation technique is likely to have training vectors available for pattern matching that are reasonably close to observations to be validated.

For steady-state operation, the model might riot need a large number of vectors in the training matrix. However, for a larger operating space, a correspondingly larger number of vectors might be needed to cover the operating space adequately.

The training matrix size (number of training vectors selected) is related to the estimation uncertainty or modeling error. Using too few vectors reduces the estimation accuracy because of inadequate coverage of the operating state space. Using too many vectors "overfits" the model to include noise from the training data. Figure 7-1 shows an example of the estimation error as a function of the number of user-specified training vectors. There is usually a point of diminishing returns beyond which the model starts including noise in the training data. In this example, the point of diminishing returns occurs at approximately five times the MinMax size. The observation processing time varies with the square of the number of vectors, which provides additional motivation for not specifying too many vectors.

7-4

EPRI licensed Material Initial Training and Estimation 0.16 0.14 Normalized 0

RMS Percent 0.10 ° Error 0.08 -

0.06 - -

0.04 -OeftigRgo 0.02 0.00 0

100 200 300 400 500 600 Number of Vectors Figure 7-1 Estimation Error as a Function of Vector Specification The training matrix size also affects the susceptibility for overfitting and spillover as described here.

Overfitting is a tendency for the estimate to follow a signal disturbance. Models exhibiting overfitting typically contain groups of poorly correlated signals or a limited state space with a large number of training vectors. Overfitting delays fault detection (reduces sensitivity) because the estimate follows the observations for some period of time. Overfitting can be minimized by improving the level of correlation among the signals in the model or by reducing the number of training vectors so as to minimize the modeling of noise. Another common approach to preventing overfitting is regularization. While some empirical models incorporate differing levels of regularization, the standard MSET algorithm does not.

Spillover is a tendency for the estimate of one signal to follow a disturbance in a second highly correlated signal. Models exhibiting spillover typically contain groups of highly correlated signals or very few signals. Spillover is the most common cause of false alarm problems and can be minimized by desensitizing the model for the affected signals. Methods of minimizing spillover effects include adjusting model settings, adding more signals to the model, or possibly increasing the number of training vectors.

Typically, more vectors should be specified for transient data than for steady-state data. Consider the following guidelines for the training matrix size (the software user's guide provides additional information):

4-10 times the size of the MinMax size for transient data with a large operating state space

3-6 times the size of the MinMax size for steady-state data with a large operating state space

2-4 times the size of the MinMax size for steady-state data with a relatively small operating state space 7-5

EPRI licensed Material Initial Training and Estimation 7.1.1.3 Fault Detector Calibration as the Final Step in Training Training results are stored in an MSET parameter estimation model that is used during system monitoring to estimate the expected values of the signals. However, before the fault detectors can be used, they must be calibrated for the expected predictive performance of the model. This is accomplished by processing the training vectors that were not included in the training matrix through the model to produce estimates for the signals. The difference between the signals and the estimates are used to compute the residual error values-also referred to as the residuals-over the range of operation. The residuals for the unselected training data are used to calibrate the fault detectors to accommodate the expected modeling error.

The fault detector calibration procedure operates the model as if in monitoring mode with the unselected training data vectors as input. Each vector is sequentially processed, and the residuals are computed and stored. For large training data sets, this can be a memory-intensive procedure.

The fault detector calibration procedures are applied to calculate the expected statistical properties of the training data residuals for each signal included in the model. These statistical properties implicitly capture the various uncertainties in the model. Uncertainty or noise in the underlying signal, modeling errors, and noise in the measured process will all be represented in the training data residuals. These uncertainties can affect the resulting sensitivity of fault detection.

Figure 7-2 shows an example in which the residuals vary only about 0.15 percent of the sensor's calibrated span (the inner set of horizontal lines represent +/-1 percent of span). In this example, fault detection can be quite sensitive. Figure 7-3 shows a different example in which the residuals vary by up to +/-1 percent of span, which can result in less-sensitive fault detection.

These examples illustrate that fault detection sensitivity is directly related to the underlying uncertainty in both the measured and predicted signal data values. Thus, it is not realistic to expect that on-line data errors in a signal with expected operating noise of 1 percent could be detected at a signal error magnitude of 1 percent or less.

7-6

EPRI licensed Material Initial Training and Estimation 18.00 14.40 10.80 720 3.60

_=

0.00______

_S

-720

=_

-14.40

-lS.00 t

0.00 3750.00 7500.00 11250.00 15000.00 18750.00 22500.00 26250.00 30000.00 Time xP0421A: Resi~d vs. Time

.12 1 psi = 6.894757 kPa Figure 7-2 Small Residual Results - Sensitive Fault Detection 7-7

EPRI licensed Material Initial Training and Estimation a_,

-8000 64.00 3

5 1

1 1

2 2

3 r

32Jo~o° I I x Timefxvt 4 1600 i 32.60 Ad s

164.00 ll

80.00,I 01 00 3750.00 7500.00 11250.00 15000.00 187' 0.00 22500.00 26M:DO0 300(

.00 X M42SA: Resi&n. Tim Figure 7-3 Larger Residual Results - Less Sensitive Fault Detection As discussed in Section 8, the fault-detection technique is used to distinguish between a signal that is operating normally and a signal that is operating abnormally. This is accomplished by statistically evaluating a series of new residual data values to determine whether the series is characteristic or uncharacteristic of the residual data distribution determined in the training step.

As long as the signal characteristics remain the same (including the uncertainties), the fault detectors will judge the signal to be operating normally.

7.1.2 Estimation During monitoring, a new observation of the model's signals is acquired and is compared to the previously trained MSET model to estimate the expected values of the signals. The estimation procedure is accomplished by comparing the new observation to the previously learned examples stored in the training matrix. Similarity between the current observation and the learned examples is computed using multivariable pattern matching techniques. The weighted combination of the most similar learned examples is used to compute the estimated signal values for the current set of observations. Those examples most similar to the current observation are heavily weighted; those that are dissimilar are negligibly weighted.

There are several nonlinear similarity operators and methods that can be used to perform the multivariable pattern matching and parameter estimation calculations. For example, depending on the toolkit selections included in the user's version of the software, SureSense provides a 7-8

EPRI licensed Material Initial Training and Estimation number of these alternatives. For general use in the government and power markets, three similarity operators developed by ANL are provided in a single toolkit: the Bounded Angle Ratio Test (BART) operator, the Vector Pattern Recognizer (VPR) operator, and the Vector Similarity Evaluation Technique (VSET) operator. U.S. government users can also request the ANL-developed System State Analyzer (SSA) operator. The Universal Process Modeling (UPM) operator and toolkit is a proprietary alternative to the ANL toolkit. The UPM toolkit is available in all fields of use and can be provided separately or in combination with the ANL toolkit. The latest release of SureSense (Version 2.0) includes additional options for empirical estimators-the ESEE and the Parity Space Averaging (PSA) toolkit.

For power plant applications, the following methods are the more commonly used:

BART - For most applications, the BART operator is used as the principal estimation method. The algorithm defines the similarity between two signals as a function of the angle formed by drawing lines from each measurement to a reference point. The domain of measurements collected during normal operation of the sensor defines a number line for the signal data. The reference point conceptually represents a point that lies above the number line in such a way that a line drawn through the median and the reference point will be perpendicular to the number line. The height of the reference point is such that the angle between the line drawn between the reference point and the minimum value and the line drawn between the reference point and the maximum value is 90 degrees. The user-defined domain extension is a value that extends the range of the number line by the domain extension amount (max-min) on either end. The number line is extended past the minimum value of the domain extension (max-min) units and past the maximum value of the domain extension (max-min) units. This has the effect of increasing the height of the reference point.

VSET - The VSET algorithm defines the similarity of two vectors as a function of the ratio between the Euclidean distance between the vectors and the sum of the root sum squared (RSS) values of the vectors.

VPR - The VPR algorithm compares two vectors and defines their similarity as a function of the inverse of the Euclidean distance between the vectors.

Regardless of which similarity operator is chosen, the similarity calculations basically follow a similar procedure. The procedure begins by computing a measure of the similarity of each new observation relative to each of the training observations stored in the training matrix. While the means by which similarity is computed will differ, the end result is the identification of the most similar vectors contained in the training matrix. The derived measure of similarity is then used to construct an estimate of the signal values based on a weighted combination of the observations in the training matrix. The specific approach taken to construct an estimate based on a weighted combination of the observations in the training matrix will again depend on the method chosen.

The k-nearest neighbors approach provides a simple example of how a pattern matching and weighting method are used to compute an estimate. The weighted value of each new observation vector is computed as described in the previous section for vectoring ordering. The k vectors in the training matrix having the most similar weighted value are selected as the k-nearest neighbors for the new observation. This completes the pattern-matching step.

7-9

EPRI licensed Material Initial Training and Estimation A simple weighting scheme combines the values for a specific signal in each of the k-nearest neighbor vectors to produce the estimate. The estimated signal values are derived by computing the weighted average of the k-stored values using a linear combination with inverse similarity weights, or:

v Xi k~x es' k

(

1 iJ j=l Eobs -Ej Il where Xest

=

The estimated signal value E abs

=

The weighted average of the current observation vector Xi

=

The training signal value E

=

The weighted average of the selected training observation vector Notice that the individual observation values are not used in the estimation method illustrated here. The weighted value (square root of the sum of the squares) of all observations, or E 0, is used as the link between the vector of observations and the data that are stored in the training matrix.

7.2 How to Train a Model 7.2.1 Selecting the Initial Training Data Initial training represents a milestone in model development. The signals for a model have been selected, the preliminary model settings have been specified, and the data have been acquired for training and verification testing. The model is trained with some portion of this data for the purpose of evaluating the model's settings and its response to various conditions. This is referred to as initial training. Early in model development, the various system operating states might not be known or fully understood. As a system's normal operating variations become better understood, it is common to add more training data to the model to characterize these additional operating states. This addition of training data to the initial training data is one form of retraining.

Initial training data are expected to bound most but not all of the possible operating state space.

There are likely to be valid operating states for which the model was not initially trained. Even if a large amount of data are made available for training, this does not guarantee that all possible operating states have been covered. For example, Section 6.1 discusses data quantity in terms of sample frequency. Generally, a 1-minute sample rate is recommended, which can result in very large data files. Although a 1-minute sample rate can create as much as 44,640 vectors of data in a single month, what is important for training is not the quantity of data as much as the operating states described by the data. Suppose that data for two months were selected for training. This 7-10

EPPJ licensed Material Initial Training and Estimation amount of data might contain over 80,000 vectors of data. However, if process values do not vary during this period, only a single operating state might be defined. Figure 7-4 shows an example in which power is virtually constant at about 100 percent power for two months. If these data are used for training, the 100 percent power level is defined quite well, but no other power levels are included. It is important to assess the data selected for training and to understand the limitations.

~'

100.70 100.40 I

o D

v I_

L_

99.20 S 9890 l

6_

9830 a

=

98.00 04/04i00 04/1I b0 04/8MiAO 04U25i00 O5=02i00 051 17/00 05 WNWO OSQ4A0 05M31 I0 06AJ7A0 Time x Reactor Power: Observation vs. Time

.ad=.

Figure 7-4 Reactor Power - Constant for an Extended Period 7.2.2 Evaluating the Initial Training Data Adequacy The data used for training must be error-free and should represent the normal expected behavior and variation of the signals contained in the model. With regard to data quality, the following four attributes are important for the historical operating data used to train the model:

The data should contain observations from all modes and ranges of operation that are to be considered the normal domain of operation. MSET will produce a high-fidelity model of this domain of operation and will determine any other modes and ranges of operation to be abnormal.

The data should contain one or more signals that are reasonably well-correlated to each signal included in the model. Correlated behavior between the included signals is essential for effective modeling using the MSET procedures.

7-11

EPRI Licensed Material Initial Training and Estimation

The data should not contain any operating anomalies, sensor failures, or equipment failures that would be considered abnormal operation. If included, the modeling procedures will learn these conditions as normal behavior.

The data should not contain any signals that are constant valued over the range of operation.

These provide no diagnostic information and will introduce singularities (resulting in numerical instabilities) into some of the mathematical procedures.

These criteria are prerequisites to characterize normal operation properly and completely.

Recommended procedures for acquiring a comprehensive and error-free set of training data are provided in Section 6.

In addition to verifying acceptable data quality, the data set should be reviewed to confirm that it adequately represents the desired calibrated state of the evaluated sensors. For example, if two redundant sensors with identical calibrated spans are monitoring the same process, one should expect that the measurements from the sensors should be about the same. Figure 7-5 shows an example in which identical flow transmitters each monitor reactor coolant system flow, yet the two sensors display a difference of approximately 9 percent in measured values; something is probably not right with the data being used for training.

91.90.

90.67 P

N W

.2 40 i

i i

I i

.W

i i

Time A Flw M2: Observation vs. Time x Flow #1: Observation vs. Time Figure 7-5 Figure 7-5 Large Difference in Redundant Flow Measurements 7-12 c-t;3

EPRI licensed Material Initial Training and Estimation Verifying adequate data quality is fairly straightforward, although it can be tedious. Verifying that the data used for training represent an in-calibration condition is more difficult and requires a careful review of the data. The presence of redundant signals helps by providing a means of direct comparison for some signals. The algorithm used by EPRI's ICMP is easy to set up and run in a spreadsheet program. It provides one method of independently assessing the calibrated state of the sensor data used for training. On-Line Monitoring of Instrument Channel Performance, Volume 2 [1] provides an overview of the ICMP algorithm.

7.2.3 Evaluating Training Adequacy Most nuclear plants are on an 18-month or 24-month fuel cycle. For a given model, it is unlikely that the signals remain constant over the entire fuel cycle. Signal values can change for a variety of reasons such as the following:

System or process changes that occur over time

Seasonal variations

Operational changes such as operating at less than 100 percent power

Equipment lineup changes

Instrument drift The data used for training should be compared to historical data to evaluate how well a model can be trained and allow subsequent signals t:o remain within the training space. The best method of evaluating how well the initial training will cover all possible operating states is to obtain historical verification testing data for an extended period of operation and to compare it to the training data. Figure 7-6 shows an example of the desired result; the data appear to remain essentially constant valued over a two-year period. Notice that main steam pressure varied from about 905-915 psig (6240-6309 kPa), which covers a range of about 0.83 percent of span for this sensor. With respect to this signal, the model has an excellent chance of performing well without requiring retraining.

7-13

EPRI Licensed Material Initial Training and Estimation t.

a,

.2 4

P.

0r

.2 l.g 917.00 X

915.60 i 914.200 912.80 911.40 910.00 908.600 907.20 905.80

)

904.40 903.00 -

X :

xx I X

^

A f46 A&.

a X~' X A X I

As A

xx IX&

X x A X

A ar A

x x

10f27199 01/13100 04A01i00 06/190100 09A)6M00 11/2v400 0211101 05101101 07/19101 10106101 Time a MS-PT-1184: Estimation vs. Time x MS-PT-1184: Observation vs. Time 1 psi = 6.894757 kPa Figure 7-6 Process Values Relatively Unchanged Over the Operating Cycle Figure 7-7 shows a different example in which a steam generator level transmitter signal tended to remain almost constant until near the end of the operating cycle when extended low-power operation resulted in a wide variation in the level signal. If the model was trained using data from only the beginning of the operating cycle, the model training will be ineffective during this new operating state. Figure 7-8 shows the final two months of power operation in more detail. It should be noted that this particular model was trained for this condition because the observations and the estimates are close together.

7-14 C24-

EPRI Licensed Material Initial Training and Estimation 63.39°A..A-A..A o.

o 62.48 61.57 60.66 59.75A t

58.84 I R57.93 j......

57.02 A

g 6.11.......................

55.20 10f2709 01113/00 04101/00 06119100 09/06100 11/24100 02/1 1101 OE0 07119101 10/06/01 Time X FW-LT-1048: Observation vs. Time A FW-LT-1048: Estimation vs. Time Figure 7-7 Process Value Change During End-of-Cycle Low-Power Operation 7-15

EPRI Licensed Material Initial Training and Estimation CA Kn 63.65 i

i I-62.80 I Xl R 61.95 61.10 l '

60.25 I-lo 59.40

.2 58.55 i

57.70

.0 o 56.85

';e nn

_1 11 -J. -1,

A j

A.

I A

A 09/28101 x,

07/31101 08MO7M1 08113101 08120101 08126101 09102101 09108101 09/15101 09121101 Time X FW-LT-1048: Obserration vs. Time a FW-LT-1048: Estimation vs. Tim NER___

Figure 7-8 Steam Generator Level During Two Months of Low-Power Operation Figure 7-9 shows an example of recirculation pump flow in a BWR. There are routine changes in flow that appear as transients even though power is constant. Historical data are particularly important for evaluating a model such as this one. Obtaining training data that include all operating states can be challenging.

7-16

EPRI Licensed Material Initial Training and Estimation at 1.1 p.

0 4

A

,6 Time 6 FT-043-1l'N024D: Estimation vs. Time x FT-043-1N024D: Observation vs. Tine Figure 7-9 Routine Changes in Process Values Historical data have an important role in model development because they readily show the typical operating states that might occur. The plots shown in the previous figures illustrate the types of variations that can often be encountered.

7.2.4 Retraining the Model Training, fault detection, and retraining are closely related. When a model is initially trained, it often performs well for some period of time until alarms are generated as the operating space starts deviating outside of the initial training boundaries. The first indication that a model is operating in an untrained space is usually a significantly increased number of signal failure alarms. When this occurs, it is time to consider the need for retraining the model.

-Z7 7-17

EPRI licensed Material Initial Training and Estimation 7.2.4.1 Retraining Terminology The selection of the training matrix represents an essential step in the training of a model. When the training matrix is changed, the model has been retrained. This might be necessary for the following reasons:

Certain model settings are modified. Changing estimator settings, changing the number of signals, adjusting data screening limit filters, or modifying phase determiner submodel partitioning definitions will require retraining. This is referred to as retraining for settings, and it optimizes model performance for a given set of training data.

The data used for training are modified. I[f the pool of historical data used for training are modified, the vector selection for the training matrix will likely change even if the model settings are unchanged. This is referred to as retraining for operating space.

This section discusses issues associated with retraining for operating space. Instances where retraining for settings is undertaken are usually to further optimize model performance through fine tuning of fault-detection settings or phase-determiner settings.

7.2.4.2 Retraining Philosophy and Limitations At some point during model development or a model's life, a need for retraining will be identified. Any of the following data-related conditions can warrant retraining:

The data files used for initial training contained bad data that had to be removed. This is not considered retraining in the context of thi.s discussion. Instead, the model was inadequately trained initially. With cleaned-up data files, the model is again initially trained.

Signals are added to or removed from the model, which requires a change in the training matrix to accommodate these signals. Again, this is not considered retraining in the context of this discussion. Instead, the model has been modified by adding or removing signals and their corresponding data. With the data for these signals included or excluded, the model is again initially trained.

Typical initial training data will characterize most, but not all, of the possible operating space. Depending on the model, it might be necessary to retrain with additional data that expand the operating space. Based on models developed at nuclear plants, operating outside the training space at some point during an operating cycle is common. One of the following situations can occur when a new operating state is outside of the trained space:

The new operating space might be a permanent change in the system operation. The model will require retraining.

The new operating space might represent a valid operating state that occasionally occurs and can continue in this space for an extended period. Once again, the model will require retraining.

7-18

EPRI Licensed Material Initial Training and Estimation The new operating space might be a short-term transient that rarely occurs. In this case, it might be preferable not to train for this event and allow spurious fault alarms during the event. Accounting for these transients in the training data set can desensitize the model, which is not desirable for rare transients. Additionally, notification that the process has moved to a possible transient condition may be useful.

Depending on the plant design, it might be difficult to accommodate all possible combinations of equipment operation in the training data. For example, a nuclear plant operating at low power might run one, two, or three feedwater pumps in various combinations and pump speeds. At lower power levels, there can be many different combinations of valid system operating states, and it is unlikely that the training data will ever adequately cover all of these stales. A phase determiner should be used to exclude states with little or no data available for training. One possibility is to exclude all low-power data and focus mainly on the 1 00-percent power region. Notice that this approach does not require retraining. It simply excludes operating data for which the model has not been trained from fault-detection processing.

For nuclear plant systems, some models might have a finite life before requiring retraining.

The following are examples:

In some cases, subtle changes in only a few sensors in a model can have the appearance of drift when actually small process changes are occurring over time. These are the hardest cases to evaluate. After confirming that instrument performance is acceptable, it might be necessary to retrain with all new data or to retrain with supplemental data.

If the sensors or signal conditioning equipment associated with a model are recalibrated, there might be sufficient shift in the signal output to cause spurious fault alarms. It might be necessary to retrain with all new data.

As can be seen from the previous examples, retraining for operating space has a specific definition in model development. Starting with a model adequately trained with good-quality data, retraining for operating space either expands the operating space or redefines the operating space. If retraining expands the operating space, supplemental data were probably added to the existing training data. If the operating space is redefined, the existing training data were probably replaced with new training data. If a model is not trained for all possible operating states, such as some transients, additional guidance should be provided to users regarding the interpretation of fault alarms during these conditions.

7-19

EPRI Licensed Material 8

FAULT DETECTION AND ALARM RESPONSE Section 8 describes fault detection, provides guidance on how to assess identified failures, and discusses model retraining.

8.1 Fault Detection - Technical Overview 8.1. 1 Background Fault detectors operate on model residuals where a residual is the difference between the observed value and the estimate for a signal. The residuals for each monitored signal are used as the indicators for sensor and equipment faults. Instead of using simple threshold limits to detect fault indications (that is, declaring a fault when a signal's residual value exceeds a preset threshold), the fault-detection procedure employs statistical hypotheses test techniques to determine whether the residual error value is uncharacteristic of the learned model and, therefore, indicative of a sensor or equipment fault. This technique is a superior surveillance tool because it is sensitive not only to disturbances in the signal mean but also to very subtle changes in the statistical quality of the signals.

While changes in the residual mean, variance, skewness, and kurtosis can be monitored in this way, the only two tests that are regularly used in OLM applications are the mean and variance tests. For sudden gross failure of a sensor or item of equipment, the fault-detection method will annunciate the disturbance as fast as a conventional threshold limit check. However, for slow degradation, the procedure can detect the onset of the disturbance long before it would be apparent with conventional threshold limits. The fault-detection setup allows the user to specify false alarm and missed alarm probabilities, thereby allowing some control over the likelihood of false alarms or missed detection.

In general, the fault-detection procedure is accomplished by first establishing the expected statistical distribution of the residual values when the model is operating normally. This step is accomplished during the MSET model training procedure. After an MSET model is trained, the remaining (unselected) training data observations are processed through the model to characterize the expected distribution of the residual values.

Having characterized the expected distribution of the residual values when the model is operating normally, fault detection identifies those conditions that deviate from the learned MSET model. In operation, a time series of residual values are evaluated to determine whether 8-1

EPRI Licensed Material Fault Detection and Alarm Response the series of values is characteristic of the expected distribution (the null hypothesis) or, alternatively, of some other specified distribution. The following four possible fault types are considered:

The residual mean has shifted high (positive mean test)

The residual mean has shifted low (negative mean test).

The residual variance has increased (nominal variance test).

The residual variance has decreased (inverse variance test).

Each of the four fault-detection tests is a binary hypothesis test. The residual signal is analyzed to determine whether the signal is consistent with normal behavior for each test. When a decision about current residual signal behavior is reached (either that the signal is behaving normally or abnormally), the decision is reported, and the test continues analyzing the data from the signal.

A user-configurable setting (referred to as the system disturbance magnitude setting in SureSense) is used to set the sensitivity for selecting between the expected (null-type) distribution and a fault-type distribution. This controls the crossover point at which a disturbance in the residual values is deemed uncharacteristic of the system normal operating states.

8.1.2 Mean Tests The positive mean test and the negative mean test detect changes in the signal average value and are most commonly used for problems related to drift and calibration error as well as complete failures (open-or short-circuit failures). The system's disturbance magnitude setting controls the point at which a deviation in the signal mean value will be deemed abnormal. Figure 8-1 illustrates the change in the mean value required to produce a fault-detection event using the positive mean test and the negative mean test.

s'neg Po Upos Figure 8-1 Mean Disturbance Magnitude Tests 8-2

EPRI Licensed Material Fault Detection and Alarm Response 8.1.3 Variance Tests The nominal variance test detects an increase in the signal's variance and is useful for detecting cable damage or loose connectors that can cause signal spiking in an otherwise normal signal.

The inverse variance test detects a reduction in the signal's variance, which might be caused by a loss of response-type failure with a nonresponsive signal remaining in the normal expected range. In addition, stuck data occurring during monitoring will trip an inverse variance test. The system disturbance magnitude setting controls the point at which a deviation in the signal variance value will be deemed abnormal. Figure 8-2 illustrates the change in the variance value required to produce a fault-detection event using the nominal variance test and the inverse variance test.

Nominal Inverse Variance Variance A

g Test

\\Test A

X Figure 8-2 Variance Disturbance Magnitude Tests 8.1.4 Unique Probability Density Functions Real-world data often have thicker tails (more outlying data) than would be predicted by a normal (Gaussian) distribution. A limitation of the SPRT technique is an underlying assumption that the residual data distribution is characterized by a normal probability density function. While many real-world models give residuals that are nearly normal, most have thicker tails (greater numbers of outlying points). These thicker tails can cause an undesirable increase in the number of false alarms that the SPRT procedure generates.

An adaptive sequential probability (ASP) fault-detection method is available in SureSense that can have fewer false alarms by fitting the data to a unique (but still approximately normal) probability density function (PDF). By allowing the PDF to contain more outlying data than would be expected for a purely normal distribution, the false alarm rate could be reduced.

ASP determines the PDF during training by evaluating the residuals of the training data. The residuals typically exhibit near-normal behavior with somewhat more outlying data than would be predicted by a standard normal distribution. This approach requires a large amount of training 8-3

EPRP Licensed Material Fault Detection and Alarm Response data to ensure that the unique PDF is properly described; at least 10,000 training data points are recommended. This amount of data is usually easy to obtain if the data sampling is performed at a 1-minute rate.

8.1.5 Applying Conditional Probability to Failure Declaration Occasional false alarms are an inevitable consequence of any statistically based fault-detection test. For this reason, SureSense uses a conditional probability analysis of a series of fault-detection results to distinguish between real and false alarms. The SureSense Multicycle Event Filter (MEF) requires that a certain number of alarms occur in a short sequence of results before a failure is declared. In the MEF technique, each new decision reached by a fault-detection test is treated as a new piece of evidence about the state of the signal. The conditional probability of failure for the signal is updated on the basis of the new evidence. The conditional probability of failure is compared to a predefined limit. For probabilities below the limit, the signal is declared to be operating correctly even if occasional alarms are generated. For probabilities above the limit, the signal is declared to be abnormal. The MEF technique improves on a conventional multicycle voting approach by allowing the user to explicitly control the statistical confidence level used in the final fault decision.

8.2 Failure Evaluation 8.2.1 Summary of How Failures Are Determined As data observations are processed through a model, estimates for the observations are calculated. Each residual (the difference between an observation and its corresponding estimate) is then evaluated by a series of statistical tests and, if appropriate, an alarm is generated.

However, an alarm is only the first part of a failure determination. As described in Section 8. 1, failure declaration is a two-step procedure involving the following steps:

If a series of residuals presents a probability ratio that trips one of the SPRT tests, an alarm will be generated for that signal.

Depending on the model settings, a failure declaration will be made based on the number of alarms generated within a sequence of observations. For example, it might take five alarms out of a series of 10 observations to achieve a failure declaration.

Even if everything is working well, occasional alarms will occur because of random variations in the data or minor process variations. This is why a single alarm is not considered a failure. It takes a certain number of alarms within a short sequence before a failure will be declared.

8.2.2 Recommended Response to an Identified Failure Based on the experience to date, instruments used in nuclear plants are generally well behaved with only occasional occurrences of unacceptable drift or failure. Accordingly, most failures identified by the on-line monitoring system are not failures. They often represent operating states 8-4

EPRI Licensed Material Fault Detection and Alann Response that have not been adequately covered by the training data. Sometimes, the failure alarms are caused by overly sensitive fault-detection settings. The following summarizes the most likely causes of alarms or failure declaration:

Data acquisition problems resulting in bad data

Operating outside the training space-permanent or temporary changes in process values

Operating outside the training space-short-term transients

Overly sensitive fault detection The EPRI On-Line Monitoring Implementation Project has developed dozens of models for hundreds of sensors at a variety of nuclear plants. Significant instrument drift or failure has rarely been observed in the models developed to date. Even if an instrument channel is drifting, the fault-detection method can be very sensitive, often producing failure alarms long before the drift is significant. Figure 8-3 shows an example of typical drift behavior (the observations shown as blue crosses, and the estimates as red triangles). It is apparent that this pressure transmitter is slowly drifting low and that it has drifted low by about 1.5 percent over a period of three months. Figure 8-4 provides the residual plot for this signal. The failure was first identified when the drift was about -3.0 psig (-20.7 kPa) or about 0.35 percent of span.

'730.00 722.00 2 721.00 A

722.00..Irn

,,718.00,

~

1 F

XAd LE-T 716.00 71400 O 712.00 7 1 0.0 0 0707101 07122/01 08M51O1 08f20J01 0903i01 09/18M1 10i02i1 10/1701 10f3111 11115M1 Time x P0398A: Observation vs. Time P0398A: Estimation vs. Time 1 psi = 6.894757 kPa Figure 8-3 Typical Drift Behavior Coo 8-5

EPRI Licensed Material Fault Detection and Alarm Response 12.00 9.60 7.20 2.40 0.00

-7. 0 _

-9.60 i

__A___

07A03W1 07121A31 08AJ5i01 08f19A3 003A1 09/02101 09A11A3I 10/10 =1601 W030101 11f14A01 Time XP0398A: Rasi&ga vs. Time Rad y.

'1',

I 1--l-

1-1 - I

-1 I I-1, 1 psi = 6.894757 kPa Figure 8-4 Residual Plot Showing Drift Significance UJpon Failure Alarm The first response to a failure declaration is to determine if it is a false alarm. The experience to date is that most identified failures are caused by either data acquisition problems or operation outside the training space. The following sections discuss different types of spurious or false alarms and provide recommendations for how to respond to these alarms. These topics lead naturally into a discussion of retraining.

8.2.3 Failures Identified Because of Data Acquisition Problems Data acquisition problems will cause occasional failure alarms. Data quality problems are often readily identifiable, particularly if there are redundant signals to allow direct comparison of the data. In extreme cases, a signal data quality problem might persist for months in archived data.

Figures 8-5 and 8-6 show examples of data that were stuck for several months. In these examples, the actual process signals are not stuck; instead, the data stored in the data archive are incorrect. As a result, the data provided to the on-line monitoring system are incorrect and will be identified as such by a failure alarm.

8-6

EPRI Licensed Material Fault Detection and Alarm Response 4'

iX j

I.9 17.70...

17.46 1722 16.98 16.74 1650.

16.26 16.02 15.78 15.541 1 53 1

-rx t-X X

A&

s He 01109101 03/1601 05/22101 0712811 10102101 12108101 02113102 04120102 06126102 Tize X PT-003-102A: Obseration vs. Time

PT-003-102A: Estimation vs. Time 09101102 PedP MR 1 psi = 6.894757 kPa Figure 8-5 Extreme Example of Data Acquisition Error - Stuck Data S4O S03f x

File Define AutoType -I.Ip F

0 0.

.9 IS la Time a APRM 1: Estimation vs. Time x APRM 1: Observation vs. Time

) o Figure 8-6 Extreme Example of Data Acquisition Error - Data Stuck for Almost One Year In the cases shown in Figures 8-5 and 6-6, these stuck signals were left in the model, and it was recognized that the archived data were erroneous. The signals were not removed from the model 8-7

EPRI licensed Material Fault Detection and Alarm Response so that signal validation could be performed again when acceptable data were obtained. Notice that the data storage problem in both examples was eventually corrected.

Not all failure alarms will be caused by long-term data problems as shown here. Many data-related problems will involve random spurious data values, occasional data dropouts, or other short-term data errors. Occasional short-term failure alarms caused by data acquisition problems can be handled in various ways that might include the following actions:

Annotate the model with a note explaining that the identified failures are caused by bad data or a data acquisition problem. This approach leaves the bad data in historical data sets.

Remove the bad data from the data set so that they do not continue to cause failure alarms whenever the data are rerun. This is the preferred approach for data files that are developed or archived for use as model training or verification testing data sets. Bad data must be removed from training data sets; otherwise, the model will be trained to recognize the bad data as normal.

Annotate the database with data quality information that specifies the bad data as invalid.

This is the preferred approach when the plant's operating database is used as the permanent repository for the data in question.

8.2.4 Operation Outside the Training Space The estimation procedure depends on the quality and range of the training data. MSET cannot produce an estimate significantly outside the range of data stored in the training matrix.

However, not every system operating state will be known or available in the data when developing a model. It is not unusual for some operating states to shift signals to outside the initial training boundaries. In these cases, these signals will be appropriately identified as failed by the model because they are outside the region defined as "normal and expected behavior."

The model likely needs additional training data to describe these new operating states. For some models this will rarely happen; other models will have multiple operating states, and it might be difficult to initially train for all possible states. The following sections show instances in which model retraining might be necessary.

8.2.4.1 Permanent Change in Operating Space - Normal Process Changes If the operating space shifts significantly, multiple failure alarms might be generated as shown in Figure 8-7. It should be noted that almost all steam pressure channels are identified as failed.

This is a SureSense display in which the failed signals are highlighted in yellow rather than red (where yellow indicates that a group of signals are outside the training space). This type of fault-detection behavior indicates that the model is operating outside its training space, which is clearly shown for the actual data (refer to Figure 8-8). Notice that the model is trained for a steam pressure of about 980 psig (6757 kPa), but average pressure for all signals permanently increased to 985 psig (6791 kPa) during the evaluated time period. The signals are not faulty; the model is inadequately trained for the new operating condition. The model should be retrained with additional data that reflect this operating state.

8-8

EPRI Licensed Material Fault Detection and Alarn Response Simple Steam System Model from Steam Generator to Feedline 1-L-01 61.38870 -1988.61340 1-P-01 1-L-02

6213060 1 1988.67680 1-P-02 1-L-03 J 61.82880 j98.06130 1-P-03 l-L-04 58 84720 Steam Generator A 1-PE-0129630 I-P-11 tAF-,

1734.57670 1-P-12 Reactor Power Turbine I

r 4143.04 1-F-11 f 4192.09 1-F-01 Steam Une A r

4145.32 1-F-12 Feed Line A l 4227.57 1 -F-02 1-L-05 63.51400 1-L-06 6289780 1-L-07 62.04260 1-L-08 j 58.86090 Steam Generator B 85.40720 1-P-04 87.53250 1-P-05 87.85940 1-P-06

+____

4304.98 1-F-13 Steam uneB l

4311.54 1-F-14 4t3 f4320.26 1-F-03 Feed Line 8 l

4302.02 1-F-04 1-L-09 60.97370 1-L-10.6150190 1-L-11 r 61.89170 1 -L-1 2 r 59 21 300 Steam Oenerator C 93.80860 1-P-07 12.95460 1-P-08 37.85940 1-P-09

+ r 4355.34 1-F-15 SteamLinec l

4312.99 1-F-16 Z

F4401.01 1-F-05 Feed Line C 1 4413.14 1-F-06 Figure 8-7 Test Data Outside the Training Range - SureSense Result 995 990 985 Pressure (Psig) 80i 980 975 970 Training Data 965 0

500 1000 1500 2000 2500 3000 Time (sequence) 1 psi = 6.894757 kPa Figure 8-8 Test Data Outside the Training Range - Actual Data 3500 4000 4500 5000 8-9

EPRI Licensed Material Fault Detection and Alarm Response 8.2.4.2 Permanent Change in Operating Space - System Operation Changes The initial training of a model is usually based on the best available understanding of how the system varies over time. Hopefully, the training data set bounds all normally expected operating states. However, there can be periods of plant operation during which system process parameters are constantly changing even if power is nearly constant. One example is the end-of-cycle coast-down period for a BWR. Figure 8-9 shows a typical example. The flow for this system is essentially constant until the end-of-cycle coast down. If the model was trained to recognize only the earlier period of system operation as normal, failure alarms would start as soon as the system operation changes.

I2 TiD fi T;pRe N

15.70 15.47 1524 A_

14.78 14.55 1

S 14.32 IC09

5 14.09.

.9 13.86 13.63 13.40 04R25AJ1 05/25f01 06i24A01 07123J01 08t22/01 09f201 1020Jl 11/19J01 12119101 01117102 Time x FY-006-1K615: Observation vs. Time FY-006-1K615: Estination vs. Time Figure 8-9 Test Data Outside the Training Range-System Operation Despite the best efforts to acquire training that represent normal system operation, system operating changes can occur that invalidate all previous training data. As an example, consider the water storage tank level shown in Figure 8-10. This plot shows normal behavior in which the tank level exhibits daily level variations as the tank heats up during the day and cools down at night. Over the period of one to two weeks, tank level slowly falls to just below 98 percent, at which time operators refill the tank to almost 98.5 percent.

,. in C

3s 8-10

EPRI Licensed Material Fault Detection and Alarm Response 98.4 98.38 98.31 AK E9825__

M

-y1 9793 X

9j730, X_

9793 0511Ao0 052IM O5aA2 OMJA0 C159W OIO 0M O

0,5A 607A 0tA0 0 Timn X LT-50: Cb~servatio.

Time e.7y7 Figure 8-10 Tank Level Variation Suppose operators change the way in which 1he tank level is controlled. For example, they might choose to let the tank level fall to 97 percent before refilling and then refill to over 99 percent. If this happens, the range of operation will have almost quadrupled from the range initially used for training. Once it is recognized that this has happened, it will be necessary to retrain the model with data that describe the new operating space. This type of change can occur with any system in which operators have some control in how the system is operated, including any adjustable control system.

8.2.4.3 Permanent Change in Operating Space - Equipment Repair The process values that define a normal operating space can change because of an equipment repair or replacement. Figure 8-11 shows an example in which a reactor coolant pump impeller was repaired. Before the repair, measured flow was about 78 percent of span. After the repair, the measured flow increased to about 84 percent of span. As can be seen, there is an immediate step change in the measured flow. Figure 8-12 shows that MSET cannot produce an estimate anywhere near the new operating state (the observations shown as blue crosses, and the estimates as red triangles) and the estimates are flat-topped at the limit of the data used for training. The only approach that can be taken here is to retrain with data reflecting the new operating condition.

8-11

EPRI Licensed Material Fault Detection and Alarm Response 92 -

88 Before Repair Flow 84 (percent) 80 i tII 76.

After Repair

'71 I-I{L '_

Mar May Jul Sep Nov Date I -1F7594A

1F7597A

1F7600A Figure 8-11 Test Data Outside the Training Range - Equipment Repair God I:.1 F i e D fi e A t 0 8e H l hi<

.219

X:

9a I-he 0.

W

.2 87.00 85.30 83.60 8190 8020.

x.

a-d.

i.

j

-m 76.801f 75.10'0 73.40 71.70 A

,70.00 M0325100 05116100 0710710 08128100 10119W 12111W00 02101101 03/25101 05116101 07108101 Tim X IT-68-71D: Obserrationvs. Tirm

£ FT-68-71D: Estimation s. Time r0* =

I 9av id Figure 8-12 Test Data Outside the Training Range - MSET Results Figure 8-13 shows a more subtle type of equipment repair or replacement (the observations are shown as blue crosses, and the estimates are shown as red triangles). Engineers wanted to monitor a signal at a higher data acquisition frequency, and they replaced the data acquisition 8-12

EPRI Licensed Material Fault Detection and Alarm Response card in the plant computer. The old card sampled at a 2-second rate, and the new card sampled at a 0.1-second rate. Several other signals are also passed through this module. Unfortunately, there is an obvious change in signal noise before and after the card replacement that cannot be explained by the higher sample frequency alone. Suddenly, relatively clean data appear to be very noisy and routinely wanders outside of the training range. The model recognizes this behavior as abnormal in comparison to its training data. In this example, the data acquisition card should probably be restored to its original configuration.

Before Input Module Change After Input Module Change Time A FU442A: Estimation vs. Time Y9 F0442A: Obse-ation vs. Time Figure 8-13 Test Data Outside the Training Range - Data Acquisition Card Replacement 8.2.5 Adjusting Phase-Determiner Settings for Transients Section 5.4 explains the advantages of using submodels that are established by partitioning the overall model along the boundaries of specific operating states. The procedure used to define and partition these operating states is referred to as a phase determiner, and its result determines the phase in which the system is operating at any given time. For example, reactor power is commonly used to segregate high-power operation (near 100 percent power) from lower power operation. Even when a phase determiner is applied to a model, abnormal conditions might occur near the phase boundary between the submodels. Figure 8-14 shows an example in which turbine first-stage pressure operates for an extended period at about 735 psig (5068 kPa), but drops to almost 725 psig (4999 kPa) during a short transient (which is outside of the range of the data used for training).

8-13

EPRI Licensed Material Fault Detection and Alarm Response 740.00, 738.50.

tk 737.00 R 735.50, 734.00.

I 732.50.

731.00 l

729.50 728.00 O 726.50 I

A._

L_,i l

4 is

&X

.X.

'7125r n00

'I- - -- - -

O7AJJ O1 07t2211 08J8£1 08/25J1 09111i1 09128J1 Tize 10115101 1 NOVO1 11118Jl I 12A05101 x P0399A: Observation vs. Time

"' P0399A: Estimation s. Time Figure 8-14 Signal Behavior During a Transient Figure 8-15 shows another example in which a single transient occurs during an entire year of operation. The model was not trained to recognize this transient as normal behavior and, therefore, declared the channel as failed during the transient. Immediately after the transient, the channel returned to its expected operating state. In this instance, there is little benefit in training the model to recognize this transient behavior.

8-14 CO(

EPRI Licensed Material Fault Detection and Alarmn Response 4

6 K

FIIeD

it i.W P_

s 433.00 431.00 _

429.00 427.00 425.00 423.00 421.00 419.00 417.00 415.00 413.000 12114100 01/25/01 03101101 04/1710 05/29101 07109101 08/1910 09/3 i

11/10101 12/21101 Time a TE-006-108A: Estimation vs. Time x TE-006-108A: Observation vs. Tim ReaW a

vein' Figure 8-15 One Transient During an Extended Period of Operation Short-term transients can occur that cause signal values to vary outside of the training space. As these transients occur, signal failures will appropriately be identified because the model does not recognize this operating state. Figure 8-16 shows another example of short-term transients.

Notice that there are periodic spikes in the flow rate as pumps are adjusted. These flow spikes are normal and expected events. The training data included the first two spikes because this is a known and expected operating state. It should be noted, however, that the next three spikes exceed the limits of the training data, thereby resulting in alarms.

8-15

EPRI Licensed Material Fault Detection and Alarm Response p.r.

r..0 49000.001 47500.00, 46000.00, 44500.00, 43000.00, 41500.00.

40000.00.

38500.00, 37000.00, I

A A

Time X Fr-043-1NO 14B: Observation v's. Time FrTI-043-1NO 14B: Estimation vs. Time Figure 8-16 Example of Routine Transients Exceeding the Training Space One of the three following approaches can be taken here:

Retrain the model with additional data from the largest observed transient so that future smaller transients are not treated as signal failures.

Do not retrain the model with additional data, and accept that failure declarations might occur during these transients. This is generally the preferred approach for short-term transients (as shown in Figure 8-16) after each transient appears normal with little deviation.

This channel appears to be operating normally with no problems.

Define a separate phase (using a phase determiner) for the transient behavior. This would cause transient data to be partitioned from the other data, effectively turning off all fault-detection processing while the system was in the transient phase and eliminating the false alarms.

If the transient rarely occurs, the preferred approach is to take no action. If the transient routinely occurs, retraining to include the transient or excluding the transient using the phase determiner is probably the preferred approach. Transients and the signal behavior should be reviewed at the phase boundaries. In some cases, the phase determiner should be modified to exclude untrained transients such as this.

8-16

EPRI Licensed Material Fault Detection and Alarm Response 8.2.6 Occasional Outliers Occasional data spikes or dips will occur that defy any explanation. Some of these occurrences are probably data acquisition problems, although the specific cause is often difficult to determine. Figure 8-17 shows an example in which two redundant steam pressure transmitters simultaneously dip from the normal operating range of about 830 psig (5723 kPa) to approximately 810 psig (5585 kPa). This change is sufficient to cause both channels to be simultaneously declared as failed. This dip occurred only once during the operating cycle. A review of other correlated data shows that no transient occurred during this period and that the data are most likely a partial data dropout caused by the data acquisition system.

840.00.

836.50 833.00 829.50 6'fi826.00 R 822.50 819.00 6

815.50 812.00 808.50 805.00

ia.r.

.-

x4 A,

a

'*.i 08101101 081031 08105101 081071D1 08108101 08110A)1 08112101 08114101 08/16101 08/18J10 Tire X PT-I -12: Obserration vs. Time

^ PT-1-9B: Observation vs. Time loom=

NOMMI Figure 8-17 Short-Term Outlier If an identified failure "recovers," the problem was probably not a sensor or instrumentation problem. What is important in terms of monitoring calibration and assessing drift is whether the channel behavior changes over time. Referring to Figure 8-17, it is apparent that a short transient of some sort simultaneously affected both redundant channels with a full recovery to normal conditions within a short period. Any failures identified during this period are not actual instrument failures; they are a consequence of either inadequate training for this event or data acquisition problems.

The signal behavior shown in Figure 8-17 is acceptable before and after the event. Rather than attempting to train the model to recognize an unexplained transient that only happened once, it is c -17 8-17 us

EPRI Licensed Material Fault Detection and Alarm Response recommended that no action be taken. Rare and unexplained events will occur in real-world data.

The model's recognition of these events as abnormal is an expected and desirable result. It is preferable to evaluate such events on a case-by-case basis rather than to include these rare cases in the training data and, therefore, risk desensitizing the model to failure events in general.

8.2.7 Using Threshold Settings for Overly Sensitive Alarms 8.2.7.1 Overview of Threshold Settings Fault detection is based on a comparison of the difference or deviation between each observation and its corresponding estimate (referred to as the residual). Depending on 1) the signal's fault-detection settings and 2) the variance of the training data residual values, fault detection can be very sensitive, resulting in alarm generation for deviations of less than 0.1 percent. For many applications, this is more sensitive than necessary. It might be preferable to include simple threshold settings as a fault-detection tool.

Threshold settings are applied directly to the residuals for an analysis. Figure 8-18 shows an example of threshold settings applied to a signal. Two levels of drift have been specified-an allowable drift of +/- 1 percent and a maximum drift of +/- 1.5 percent. As the sensor starts drifting low, the residuals become increasingly more negative, eventually reaching the allowable drift limit.

Residual Plot Showing Allowable and Maimum Limits 1.20 _

_=_

0.60 g030

-030__

-0.60

-0.90__

-120_

02£ lm1 02107AJ1 02114J01 021101 0228;101 03A307A1 03f101 03121A31 03M28/01 04104101 Time

' LT-02: Resi&al vs. Timne

.ay

==

Figure 8-18 Threshold Settings Applied to Residual Plot 8-18

EPRI Licensed Material Fault Detection and Alann Response It should be noted that the channel shown in Figure 8-18 was identified as failed when the residual was only about -0.2 percent. This type of residual plot provides another tool to assess the urgency of any required corrective action. As shown in Figure 8-18, drift is occurring at a slow rate, which can allow for preplanning of any calibration activity.

Although threshold settings are less sensitive than statistically based fault-detection tools, it is recommended that the maximum acceptable drift be specified for each signal validated by the model. This will allow the user to retrieve residual plots during monitoring. Whenever fault alarms are received, a residual plot (such as the one shown in Figure 8-18) can provide additional insight into the severity of the detected drift.

8.2.7.2 Establishing Threshold Limits Threshold limits are useful for evaluating fault detection during model development, but they should be adjusted as necessary before the model is placed in service. As part of model completion, threshold limits should be established based on the drift allowances of the monitored instrument channels. For safety-related channels, the threshold limits depend on the allowances specified in the corresponding set point study, if applicable. For non-safety-related channels, the threshold limits depend on the allowed channel variation for acceptable system performance.

Table 8-1 shows the typical elements of uncertainty that are considered important for most instrument loops. The uncertainty terms associated with the sensor are usually of the most interest. Rack-mounted signal-conditioning equipment, which usually performs quite well and often has a corresponding lower contribution to the channel uncertainty, should be considered whenever additional modules are in the instrument circuit.

8-19

EPR! Licensed Material Fault Detection and Alarm Response Table 8-1 Instrument Channel Uncertainty Sources Term Present In On-Line Present In Safety-Included In Sensor Uncertainty Monitoring Path?

Related Trip Path?

Calibration?

Process measurement effect (PME)

X X

Process element accuracy (PEA)

X X

Sensor reference accuracy (SRA)

X X

X Sensor drift (SD)

X X

X Sensor temperature effect (SD)

X X

X (partial)

(normal variation)

Sensor pressure effect (SPE)

X X

Sensor vibration (SV)

X X

Sensor M&TE accuracy (SMTE)

X X

X Isolator reference accuracy (IRA)

X Isolator drift (ID)

X Isolator temperature effect (ITE)

X Isolator M&TE accuracy (IMTE)

X Computer input A/D accuracy (A/D)

X Bistable reference accuracy (BRA)

X Bistable drift (BD)

X Bistable temperature effect (BTE)

X Bistable M&TE accuracy (BMTE)

X With regard to on-line monitoring, the sensor uncertainty elements can be grouped according to whether they are associated with 1) process/environmental effects or 2) calibration effects. The uncertainty elements associated with process/environmental effects explain why redundant channels might not display the same value. There is some random variation in the measurements caused by these uncertainty elements. The uncertainty elements associated with calibration effects represent specifically what an on-line monitoring program is evaluating, and it is these terms that should relate directly to the specified threshold limits.

Suppose the following uncertainty values are provided (or are considered allowable) for a non-safety-related channel:

SA SD

=

+/-0.5%, sensor reference accuracy

=

+/-1.5%, sensor drift SMTE

=

+/-0.25%, measurement and test equipment uncertainty 8-20

EPRI Licensed Material Fault Detection and Alarm Response The combined uncertainty of these values can be calculated as follows:

SU = iSA 2 +SD 2 + SMTE2 SU = +/- 0.52 + 1. 52 + 0.252 = 1.6%

The uncertainty of the MSET estimate is a consideration because it can also affect the drift allowance for the channel. For example, if the estimate uncertainty is +/-0.25 percent, this uncertainty calculation would be adjusted as follows:

SU= +/- JSA2 + SD2 + SMTE2-MSET7 2

SU = +/-_0.52 + 1.52 + 0.25 2-0.252 = 1.58%

For this channel, threshold limits might be established as follows:

Allowable = 1.2 percent (arbitrarily selected)

Maximum = 1.58 percent The EPRI On-Line Monitoring Implementation Project has coordinated with two Department of Energy (DOE) Nuclear Energy Plant Optimization (NEPO) projects to review the estimation of uncertainty for various plant models and data types. Results from these related efforts are provided in On-Line Monitoring of Instrument Channel Performance, Volume 3 [2].

Safety-related instruments have additional considerations for uncertainty that have been documented elsewhere under the tasks of this project [5].

8.2.8 Incorrect Initial Training Figure 8-19 illustrates another type of potential problem in which a model was trained with one steam flow sensor already out of calibration (the observations are shown as blue crosses, and the estimates are shown as red triangles). The model has been trained to recognize the erroneously low signal of this steam flow sensor as normal. Eventually, the out-of-calibration condition was identified by conventional methods, and the transmitter was recalibrated. After calibration, the sensor was identified as failed because the model was trained with bad data. The model should be retrained with new data after the calibration. The prior training data must be excluded because it incorrectly identifies the out-of-calibration condition as normal.

8-21

EPRI Licensed Material Fault Detection and Alarn Response H

W p.

.2 hi.

04 P.

En 4

a!

I 6

,0 70.001 68.00 66.00 64.00 62.00 60.00 58.00, 56.00.

54.00 52.00,

'zn nn x

x

&x.

--;X YX.

x 1,

1

1.

'l

.1 I

IC

~

1 I

1 A

~

^#

A I

A

-~~-~

i O3110AJO 04i03A00 04106A00 0411000 041 41700 04i20C00 04/24A00 04127AO0 05i01i00 Time O FT-1-28A: Estimation vs. Time X FT-1-28A: Observation vs. Time Figure 8-19 Recalibrated Sensor - Model Trained on Out-of-Calibration Data Figure 8-20 shows a minor example of an incorrectly trained model (the observations are shown as blue crosses, and the estimates are shown as red triangles). This temperature sensor varied from other redundant sensors by about four degrees, but the model was trained to recognize this behavior as normal. After recalibration, this sensor matched the signals from the other redundant channels, but the model routinely declared this channel as failed because it was not trained for this new operating condition.

8-22

EPRI Licensed Material Fault Detection and Alarm Response

II OK 223.00 221.60 1 a 220.20

.9 218.80 1 217.40 E 216.00 2

214.60

S 213.20 211.80 oS 210.40 ens no xX X X X ScMarnO_

>QW W=_

I

--A-I.

x IO i:U7.UL 01AO9i01 03116101 05r2M01 07f28A31 X TE-004-104C: Observation vs. Time 10A02J1 12X0801 02/13502 04120J02 06 Time

" TE-004-104C: Estimation vs. Time Qi26i2 0910'1/02 POW MM Figure 8-20 Recalibrated Sensor - Initially Out of Calibration 8.2.9 Equipment Operating States Not Covered by Available Training Data There are likely to be some models for which it will be difficult to train for all possible operating states. Figure 8-21 provides a simple example. Depending on the system requirements, either one, two, or three pumps might be running.

Co' 8-23

EPRI Licensed Material Fault Detection and Alarm Response Pum A Figure 8-21 Many Possible Operating States Referring to Figure 8-21, the following possible system flow conditions should be considered:

Low flow-only one pump is running. Three different operating states are possible.

Medium flow-two pumps are running. 'Three different operating states are possible.

High flow-all three pumps are running. This is probably the easiest operating state to model.

As shown, there are at least seven different operating states if the three pumps are individually instrumented. This becomes even more complex if the pumps can operate at variable speed.

Some plants preferentially operate two specific pumps with a third pump off. Figure 8-22 shows an example of a condensate booster pump that almost always runs. While running, a pump bearing temperature measurement usually indicates about 150TF (65.60C). When the pump motor is turned off, the temperature drops to about '75TF (24TC). During an 18-month period, this pump was stopped once for only two days. For the non-operating state, it is unlikely that adequate training data will be available for such an infrequent operating configuration.

8-24

EPRI Licensed Material Fault Detection and Alarm Response 165.00 156.001 1

147.00 r

138.00

.1 129.00

'!i 120.00

~.111.00 102.00 6

93.00 84.00 I

xx A!

Ak 2;

I X

XXj 75.001 a

WNf2W 0611m40 07J6i0 07t2710 08/18JO0 09/09/00 0930100 10f22100 11 13A00 12IT4J0O Time x CO-TE-352: Observation vs. Time A CO-TE-352: Estimation vs. Time Figure 8-22 Example of a Pump That Almost Always Runs Uncommon operating states are a real issue for training and operating a model. The important question to be resolved is which operating states will be validated and which operating states will not. For many models, it is unlikely that adequate training data can be acquired to train the model on all possible operating states. For that reason, phase determiners should always be considered so that untrained operating states are excluded from signal validation and subsequent failure alarms.

8.2.10 Inadequate Initial Training Within the Defined Operating Space The MSET training vector selection method is not perfect. Even when adequate training data are available to define an operating space, estimation can be influenced by the combination of observations compared to the corresponding vectors contained in the training matrix. Figure 8-23 shows an example in which the observations (blue crosses) and the estimates (red triangles) track together very well for an extended period, followed by the estimations jumping while the observations remain almost unchanged. This change in the estimate was prompted by a small dip in the observations, which took some time before the estimates again tracked with the observations. The vectors selected for training do not adequately cover the operating space in this particular region, and additional vectors should be specified for the training matrix.

8-25

EPRI Licensed Material Fault Detection and Alarm Response 5.191 a

5.16.

- 5.13 r

5.10.

5.06.

'!! 5.03 g E 5.00 l

4.97.

4.94 W 4.91 1

A RR I-- a w...I x '

)5(

06iO1101 061 110J1 06f21J01 071A011 0711231I 07Q22A1 08101101 Time 08/11101 08122A1M 09101A01A SX FY-006-IK606C: Observation vs. Time a FY-006-1K606C: Estimation vs. Time Figure 8-23 Unexpected Change in Estimate Figure 8-24 shows a more dramatic example of a pump bearing temperature sensor (the observations are shown as blue crosses, and the estimates are shown as red triangles). The model was trained to recognize both running and off conditions. As trained, the model has trouble differentiating between the two states because of the influence of other signals in the model. The model needs to be retrained with additional training vectors. Or, if the pump is seldom off, it should probably be trained with either 1) a phase determiner based on the pump state or 2) without the pump off data so that it does not try to perform fault detection in both of the two possible states using a single model.

8-26

EPRI Licensed Material Fault Detection and Alarn Response 0 5.0 iJ14023 3-A 125.46 N118.08 w110.69 A

A s~103.31A 95.92 a

i; 88.54

~~i jq * ~.

8115 A

06/16101 06W27e1 071J9 1 0712111 O8AJ1IOI 08/13AO1 08t24101 09J10501 09/17101 09t28A01 Time x CO-TE-360: Observation vs. Time CO-TE-360: Estimation vs. Time Figure 8-24 Inadequate Training for Pump On and Off Conditions C82 8-27

EPRI JLcensed Material 9

OPERATING IN ON-LINE MODE An on-line monitoring system can provide timely assurance of signal data validity and equipment condition to a plant operator. The means of implementing an on-line monitoring system and the frequency at which the plant operator evaluates the monitoring system's results might vary significantly from application to application. In limited cases, it might be required to provide real-time monitoring with essentially continuous reporting to the operator. More commonly, on-line monitoring can be optimally implemented in a less-than-real-time mode. On-line monitoring systems can readily satisfy both real-time and less-than-real-time on-line monitoring requirements, thereby making them suitable for a wide variety of applications.

9.1 Modes of Operation This section provides an overview of what is meant by on-line monitoring and describes the process by which a user would make the transition from a manual batch mode to an on-line mode of operation.

On-Line Monitoring of Instrument Channel Performance [5] defines the following possible options for an on-line monitoring system:

An automated system that performs data acquisition and analysis essentially continuously in real time at the system-specified sample rate

An automated system that performs data acquisition and analysis at discrete specified intervals

An automated system that is normally off and is manually activated to perform data acquisition and analysis at a set interval (at least quarterly)

A manual system in which data are acquired manually on at least a quarterly interval and entered manually into a computer program for the purpose of analysis The differences among these options relate primarily to the degree of automated signal acquisition and the frequency of data collection and analysis. All but the first of the options are actually operating in a batch mode in which data are acquired and stored in files prior to performing the analysis.

9-1

EPRI licensed Material Operating in On-Line Mode The EPRI On-Line Monitoring Implementation Project has considered what is meant by on-line monitoring and has developed the following descriptions:

Off-line - The software operates only in batch mode with training and testing data files prepared separately and evaluated on user command. This describes most on-line monitoring methods implemented to date.

Periodic on-line - The data files are automatically updated with the latest information each time the software is run. This type of operation is effectively real-time and on-line in that it acquires the latest available data when the user accesses the model; however, this mode does not mean that each model is running continuously in the background. Also, data continue to be stored in data files rather than acquired in real time.

Real-time on-line - The software is always running and sampling data at a specified frequency. Upon failure detection, the system notifies the user by some method. Upon notification, the user then opens a software viewer to review the cause of the failure alarm.

Custom on-line - On-line monitoring as described previously with a unique user interface developed by the power plant. The software might be embedded in another application and might be inherently subservient to the host application.

The degree of automation in the on-line monitoring method can vary substantially among users.

Some users will operate quite successfully in a batch mode and never operate in an on-line mode.

It is possible, though arguably less cost effective, to meet most nuclear plant sensor and equipment condition-monitoring objectives while operating in batch mode. Other users will operate in a periodic on-line mode to optimize the method of fault detection, alarm, and graphical display of results.

On-line monitoring can provide an early indication of deteriorating sensors and equipment. With advanced warning of impending problems, plant personnel can take corrective action during periods of planned downtime, increasing the productive availability of the monitored equipment.

To accomplish these objectives, the on-line monitoring system must integrate smoothly into the daily workflow of the I&C engineers and technicians, must be easy to operate, and should require only modest levels of specialized training.

In one EPRI project implementation, the on-line monitoring software was operated autonomously on a nightly basis so that the prior day's monitoring results were available each day for evaluation by the plant's I&C technicians. This periodic on-line monitoring approach eliminates the need for an I&C technician to manage or wait for the data extraction and diagnostic processing, thereby enabling the technicians to most efficiently complete their reviews on a daily basis. The approach integrates smoothly into the daily workflow of personnel involved in on-line monitoring at a nuclear power plant.

9.2 Making the Transition From Batch Mode to On-Line Mode The primary difference between on-line mode and batch mode is the nature by which the plant data are acquired and processed. An on-line system will typically implement some form of direct or automated connection to the plant data historian. Further, the on-line system will typically 9-2

EPRI licensed Material Operating in On-Line Mode implement automated processing of the acquired data. A batch system will typically accomplish these objectives via several manual steps under the direction of the user.

Most users will initially operate their monitoring system in batch mode. Batch mode is typically a file-based mode of operation with a manually implemented interface to the plant data archive.

Batch mode is useful for evaluating software capabilities, establishing plant-specific requirements, training software users, developing plant-specific monitoring models, and performing model acceptance testing. Several important roles for batch mode operation are described here. The remainder of this section will discuss on-line modes of operation.

9.2.1 Training as a Batch Operation On-line monitoring relies on a user-provided, set of historical operating data (known as the training data) to learn its internal model of the normal operation of the monitored signals and equipment. There are two important attributes of the data used to train a model. First, the data should contain all modes and ranges of operation that are to be considered normal operation of the monitored signals and equipment. Second, the data should not contain any operating anomalies, sensor failures, or equipment failures that would be considered as abnormal operation of the monitored signals and equipment. These criteria are prerequisites for training an effective model for use in on-line monitoring.

Ultimately, the quality of the estimates depends on the fidelity of the training data. For this reason, the training data require a careful review prior to use for model training. At the present, this is accomplished by a combination of automated screening techniques and engineering analysis. Data files evaluated in batch mode are generally more appropriate for this type of work than are on-line data sources. The experience at all participating power plants to date is that all training data files require an evaluation to identify and remove bad data.

In many cases, it might not be desirable to include all operating states (such as normal but infrequent transients) in the on-line monitoring model. In this case, the data for the unmodeled operating states should be removed from the training data. Some on-line monitoring software (such as SureSense) can provide the means to perform this type of operating state data, partitioning automatically without requiring manual modification of the training data files.

In all cases, it is important to maintain a configuration-controlled archive of the training data used as the basis for training an on-line monitoring model. The training data might be required for periodic model updating (retraining) and for model verification or acceptance testing. Data files are preferred over on-line sources for the purposes of archiving the cleaned-up and approved training data.

These considerations mean that training data will normally be contained in data files and the training process will likely be performed as a, batch operation. Retraining considerations also suggest that new training data might be added to, rather than replace, existing training data.

9-3

EPRI licensed Material Operating in On-Line Mode 9.2.2 Periodic On-Line Monitoring Periodic on-line monitoring will be the most common mode of implementation for most power plant systems. The frequency at which the monitoring results are evaluated will depend on the criticality of the system, the rate at which the monitored failure modes can progress, and the availability of personnel resources. For example, an on-line monitoring program for sensor calibration reduction requires a monitoring frequency of at least once per calendar quarter.

Software available for on-line monitoring will generally automate the periodic evaluation task to the extent that much higher frequencies (such as daily evaluation intervals) are cost effective.

An implementation of periodic on-line monitoring was completed as part of the EPRI project and is described here. The implementation was based on a Microsoft SQL Server plant data historian system in combination with the SureSense on-line monitoring software. The implementation included a data bridge for periodic data extraction and management (as described in Section 9.3).

The configuration of the overall periodic on-line monitoring system is shown in Figure 9-1.

Monioring 9

(tBride

+>

X~e

(

{ureSense) c 9 ^ eStatioszzn Plant Data Engineering Server Server Designer Station Figure 9-1 Reference Implementation for Periodic On-Line Monitoring In the reference implementation, the on-line monitoring procedure followed this general approach:

Plant operating data are acquired and archived to the SQL Server database with no change to existing plant software or procedures.

The data bridge software runs automatically under a scheduler to extract the local data required for analysis by the on-line monitoring models. The data bridge is automatically run nightly during periods of low user loading on the network and database.

The data bridge software automatically manages the local data archive to maintain a user-defined look-back history of plant operating data for the analysis.

9-4

EPRI licensed Material Operating in On-Line Mode

The on-line monitoring software runs automatically under a scheduler to perform a nightly evaluation of the current local plant operating data for each enabled model and automatically stores the run results in combination with the local data.

The user accesses the daily run results at any time through either a designer interface or a monitoring interface. The designer interface is password protected and provides model design and modification features. The monitoring interface is also password protected and permits viewing of results and reports, but it does not permit model modifications.

This automated approach ensures that the user will spend a minimal amount of time reviewing the daily results. Experience shows that the data extraction from the plant data historian is the time-consuming step, often requiring several minutes for extraction of data for a 24-hour period at a 1 -minute sampling rate. By precompiling the look-back data set and monitoring results, this implementation provides the user with nearly instantaneous access to the results and reports.

When numerous models are placed into service, the accumulated time savings provided by instantaneous access to the results is appreciable.

In this implementation, the on-line monitoring software further provides the capability for a designer to query the SQL Server database directly and to acquire data for any valid time interval and signal subset. This capability uses the same database interfaces as the data bridge. Generally, data extraction time delays associated with the SQL Server are not problematic for the infrequent queries made by a design-or analysis-oriented user. Thus, the designer can perform a periodic assessment over any time interval of interest. This is considered an important feature for performing an operability assessment after a failure event has been identified.

An alternative approach supported by SureSense and other on-line monitoring software is to maintain the system in an always-on, real-time mode with a dedicated computer, a dedicated data connection, and a local data archive for look-back capability. This approach is outlined in the following section.

9.2.3 True On-Line Monitoring The available on-line monitoring software available is readily capable of operating in a true on-line mode. Rather than evaluating data contained in a data file, the software receives data directly from the plant computer or data historian as a continuous data stream, sampled at a specified frequency. For example, the on-line monitoring software can acquire data in the following ways:

Read data directly from one or more data files

Read data directly from a data acquisition system in real time as a data stream

Read data directly from a plant data historian in real time as a data stream

Read data directly from any other network or internet-accessible source in real time as a data stream

Read data simultaneously from any combination of the above 9-5

EPRI Licensed Material Operating in On-Line Mode SureSense also provides an autocode compiler option that can produce stand-alone embeddable software capable of performing these data acquisition modes when combined with a user's custom software application.

Several technical decisions affect the method by which on-line monitoring is accomplished. In a real-time on-line mode, operating data would be evaluated as they are acquired. A real-time mode implies that the system is either continuously or regularly checked for alarm conditions.

Certain performance-critical systems might merit this level of attention; however, a greater number of models can be adequately monitored on a daily or less frequent basis.

A real-time mode of operation is conceptually simple, but the method of implementation will require careful planning. The system will require dedicated computing resources and data connections. The acceptability of increasing the network load during heavy traffic periods should be confirmed. Procedures for continuous evaluation, results storage, and real-time notification to personnel should be established. Finally, procedures or software capability to capture a look-back window for engineering evaluation of a fault event should be provided. The ability to provide simultaneous look-back and engineering analysis capability while maintaining the real-time on-line monitoring function should be considered.

9.2.4 Look-Back Functions An on-line monitoring system could function by evaluating a data stream (observation by observation) and identifying failed or degraded sensors as the failures occur. For some industries, a simple declaration of the failure might well be adequate. However, nuclear plant users will generally want to review historical data whenever a failure declaration is made. This historical review is referred to as a look-back function and is an important part of any on-line monitoring system. Specifically, the purpose is to:

Evaluate current data in the context of recent historical data to verify the failure event

Determine if the failure is a real instrument or equipment failure or if it is the result of inadequate model training for a new but valid operating mode of the equipment

Identify the point of onset and character of the failure (which might be important for operability assessments)

Historical data are always contained in a data file or database, not sampled in an on-line fashion.

This means that look-back capability is by its nature a batch mode of operation even in the most automated system. Some on-line monitoring systems will include a local look-back data cache to enhance the efficiency of their data plotting and analysis functions. This is deemed to be a highly desirable feature for most nuclear plant applications. Look-back periods on the order of 60-90 days are recommended. This is usually a long enough window to verify a failure event and to perform an operability assessment when a failure event occurs.

9-6

EPRI licensed Material Operating in On-Line Mode 9.3 Data Bridge Description 9.3.1 General Description When implementing an on-line monitoring system, plant-specific data issues must be addressed and resolved. A key issue is the level of automation and the efficiency of the on-line monitoring interface to the plant data archive. Several options that have been previously discussed include the following:

An always-on, automated system that performs data acquisition and analysis continuously in real time at a specified sample rate

A periodic automated system that perfonns data acquisition and analysis at discrete specified intervals

An automated system that is normally off and is manually activated to perform data acquisition and analysis at a set interval (at least quarterly)

A manual system in which data are acquired manually on at least a quarterly interval and entered manually into a computer program for the purpose of analysis In power plant applications, the second or third approach is generally preferred. A data bridge can be an important element of the implementation approach in either case. A data bridge is defined as a middleware software application that manages the data transfer between the plant data historian and the on-line monitoring software application. A well-written data bridge will minimize or eliminate the need to modify existing plant data acquisition and data archiving systems and procedures.

Plant data are typically stored in one of a number of proprietary plant data historian software databases. Many such data historian databases are in use in the power generating industry today.

All modem plant data historians provide a connectivity layer that enables data access by other software applications such as an on-line monitoring system. An open database connectivity (ODBC) driver or an application-programming interface (API) typically provides access to data stored in the database. It is important to understand the capabilities and limitations of the plant data historian software when planning for an on-line monitoring system implementation. Some data historian vendors provide connectivity modules as part of the base software, while others might charge separately for these modules. Data historian access module cost might be an important factor in budgeting for a project.

A data bridge functions as a connecting layer between the plant-specific data historian and the on-line monitoring software. The data bridge can be internal to the on-line monitoring software or can be operated as a stand-alone program. The SureSense software used in the EPRI project enables both approaches; however, there are advantages to each that should be considered.

The primary advantage of an internal data bridge between the plant data historian and the on-line monitoring software occurs during model training and acceptance testing. At this point, the designer will evaluate model performance over various historical periods. Implementing an 9-7

EPRI Licensed Material Operating in On-Line Mode internal data bridge enables this development work to proceed with minimal human intervention to accomplish data gathering. The need to store and manage redundant copies of the data in multiple formats is also eliminated.

The downside of a direct connection between the plant data historian and the on-line monitoring software is that queries made to the plant data historian can be relatively slow to complete.

Extracting large data sets can become unproductive if the queries made to the data historian take too long. A stand-alone data bridge can be used to mitigate this processing overhead and make optimal use of engineering resources.

A stand-alone data bridge provides a software tool to automatically extract the required information from the plant data historian and to format the data for highly efficient access by the on-line monitoring software. The data bridge can be provided in any number of configurations and capabilities dependent on the on-line monitoring software selected.

9.3.2 EPRI On-Line Monitoring Implementation Project Data Bridge The data bridge used by the EPRI On-Line Monitoring Implementation Project is a stand-alone software program that provides an automated means to extract plant data from any data archive with a data access interface. The bridge automatically connects to the data archive, extracts model-specific data from the archive, and updates a local data file in binary signal data file (SDF) format. The bridge will automatically maintain a moving window of data beginning with the most recent data and extending for a user-specified look-back time interval. The bridge can be run in a scheduled and unattended mode. It is typically run when other network and database traffic is low.

Data extraction will typically be performed in a regularly scheduled fashion (such as once per day). The software can be configured to run unattended and will require human intervention only if the desired run behaviors need to be changed. The steps in the data extraction are listed here:

1. Select the set of signals to be acquired from the plant data historian.

2. Select the start and stop time for the data, or alternatively, select the current or initial time and a look-back interval.

3. Determine the overlap between the requested data interval and any previously extracted local data file.

4. Determine the time interval required from the plant data historian to update the local data file.

5. Connect to the plant data historian, and verify availability of the requested data.

6. Query the data historian for the requested data.

7. Time synchronize the newly acquired data with data contained in any previously extracted local data file.

8. Update the local data file with the requested data.

9-8

EPRI licensed Material Operating in On-Line Mode A reference data bridge implementation was completed as part of the EPRI project. The implementation was based on a Microsoft SQL Server plant data historian system in combination with the SureSense on-line monitoring software. The implementation accomplished a periodic on-line monitoring capability. The following requirements were specified for the reference data bridge implementation:

Operating data will be extracted nightly from the plant's SQL Server database for SureSense processing.

Operating data extraction will be an automatically scheduled batch procedure.

Operating data will be extracted separately for any number of SureSense models.

Extracted operating data will be saved to the engineering server in binary SDF format.

Extracted operating data will include the data from the preceding 90 days; however, only the most recent day of data will be required to update a file archive on the engineering server.

Extracted operating data will always overwrite the previous day's data in the same user-defined file location.

SureSense operating data analysis will be performed nightly for any number of user-defined SureSense models.

SureSense operating data analysis will be an automatically scheduled batch procedure.

SureSense operating data analysis will include the data from the preceding 90 days.

SureSense operating data analysis results will be saved to the engineering server in a binary format compatible with the SureSense data visualization and reporting tools.

Operating data analysis results will always overwrite the previous day's results in the same user-defined file location.

Operating data analysis results will be available for each model after processing.

The reference data bridge application was configured according to the these requirements. The following describes the reference implementation; however, variations on the implementation might be made to best accommodate plant-specific data processing environments.

The data bridge was implemented as a console application that performs data extraction from the SQL Server and saves the data in SDF file format to the engineering server. The data bridge implements the SureSense universal data access package to ensure compatibility with the SureSense application and to minimize development costs. The universal data access package is a plug-in data input and output software module that controls the data flow through the data bridge to the on-line monitoring application. Because the plant-specific data source characteristics are isolated in a plug-in software module, the universal data access package enables the data bridge and on-line monitoring software to connect to virtually any network-accessible data source in virtually any data format. A single plug-in module enables plant data system connectivity for both the data bridge software and for on-line monitoring system designers. SureSense users should consult their user documentation for more information about configuring these types of data access plug-ins.

9-9

EPRI licensed Material Operating in On-Line Mode The SureSense application was installed into a configuration-controlled directory under system administrator control. The installation created the default directory structure shown in Figure 9-2.

The SureSense data bridge was installed in the top-level application directory for versions 1.4.1 and higher. The application directory and all subdirectories were assigned read-only user permissions; however, the system administrator was given full permission.

El.

SDMSv141 1E C3I Data El EI Images Manual t4l Cl Figns FEl i Pojects Ei CI Taplates El Mi Tinv durckrii p-kiepwd rsource.bt RUNFILE.bd nISDFBridge.bat R SDF.liddge.pr

ISDMS.bat 4j SDMShetp.hWni l SDMSvl41.je SYSbod~dl SysTrahd M Unendedbat Figure 9-2 Directory Structure for a SureSense Data Bridge Implementation 9.3.3 Data Bridge Programming and Setup The data bridge console application executable is named SDFBridgejar. The application requires a single argument-the name of a resource file. An example resource file named resource.txt is provided with the installation. The application might be run manually from the command line, but more typically will be run under the control of a scheduler. In the reference implementation, the Microsoft Windows scheduler was used. The SDFBridge.bat file provides an example of an executable command file suitable for use with a scheduler.

The SureSense data bridge can access data from virtually any network-accessible data source using a built-in class loader that dynamically loads a data reader plug-in for each required data source. The data source location and the required reader plug-in for the data source are specified in the data bridge resource file (further described in the following paragraphs). The data bridge uses the plug-in to access the data source, to acquire the necessary data, and to reformat the data 9-10

EPRI licensed Material Operating in On-Line Mode to a local archive in SDF binary format. The data bridge resource file contains a series of data extraction run instructions wherein each extraction run is specified as shown here:

DATA SET READER ODBC SOURCE j dbc:odbc:PlantArchixre OUTPUT.\\Data\\ModelARecent:.sdf SAMPLE 6.9444444444E-4 LOOKBACK 90.0 NAMELIST Signal 1, Signal 2, Signal 3 Each entry in the resource file is made on a separate line. The DATA SET keyword is a delimiter that defines the beginning of each new data extraction specification.

The second entry begins with the READER keyword and defines the name of the data reader plug-in used to access the data source. This must the name of a class file located in the application's \\PlugIns\\reader\\subdirectory. In this example, the reader plug-in is ODBC.class.

Note that a different reader plug-in can be specified for each data extraction run.

The third entry begins with the SOURCE keyword and defines the name of the network-accessible data source. This could be the file path, database name, or COM/DCOM object name.

If a database or other on-line source is used, the source must be registered with the operating system to be accessible. Note that a different data source can be specified for each data extraction run.

The fourth entry begins with the OUTPUT keyword and specifies the output file path name for the data extraction. The application's \\Data\\directory is the recommended location for the output files. Note that the data bridge will check for the existence of this file and will automatically merge the existing file with newly extracted data to minimize the plant data historian's processing overhead. As an example, consider the case where the data bridge runs daily with a 90-day look-back window. The first time the data bridge runs, the output file does not exist. The data bridge will run and extract the full 90-day period of data to the SDF file. The second time the data bridge runs, it will acquire only the necessary missing date (for one day) to fill in the time period from the current time back to the last time recorded in the previously extracted SDF file. It will then merge the data sets to update the SDF file for the most current 90-day look-back period. Note that a different output file should be specified for each data extraction run.

The fifth entry begins with the SAMPLE keyword and specifies the sampling interval for the data expressed in the native units of the data source. In this example, the SQL Server time units are based on days elapsed since January 1, 1 900. Thus a specification of 6.9444444444E-4 days represents a 1-minute sampling interval. Note that a different sampling interval can be specified for each data extraction run.

The sixth entry begins with the LOOKBACK keyword and specifies the look-back window interval for the data expressed in the native units of the data source. In this example, the look-back window is 90 days. Note that a differenti look-back interval can be specified for each data extraction run.

9-11

EPRI Licensed Material Operating in On-Line Mode The seventh entry begins with either the NAMELIST keyword or the NAMETABLE keyword and specifies the computer point names to be extracted in the current run. If the NAMELIST keyword is used, a comma-delimited list of names must be provided after the keyword. The comma-delimited list can extend over multiple lines. If the NAMETABLE keyword is used, the database table name for the name list must be provided after the keyword. Note that a different computer point name list can be specified for each data extraction run.

The computer points extracted will generally be coordinated with the computer points required as inputs to one or more on-line monitoring models. The recommended approach is to extract the computer points for each model into an individual SDF file. In other words, it is recommended that a separate extraction run be made for each model. However, it is also acceptable to place all computer points required by all models into a single extracted file.

As mentioned previously, it might be necessary to register the on-line data source with the operating system. For example, to register an ODBC data source on a Microsoft Windows platform, the ODBC Data Source Administrator is used. On a Windows platform the ODBC Data Source Administrator, "Data Sources (ODBC)," can be run from the Control Panel by clicking the Start button and then Settings, Control Panel. A new data source name (DSN) can be added, assuming that there is an ODBC driver for the database of interest. If there is not an ODBC driver for the database of interest, a driver needs to be installed before running the ODBC Data Source Administrator again. The facility's system administrator should be contacted for details for specific computer, network, and operating system environments.

After registering the data source and configuring the data bridge resource file, the data bridge can be run manually by entering the equivalent of the following on the command line:

java -cp.;.\\SDF_Bridge.jar SDFBridge resource.txt A command file should be configured with the necessary information and used to run the data bridge periodically under an automatic program scheduler. A sample command file named SDFBridge.bat is provided with the SureSense software. The Microsoft Windows operating system provides a suitable scheduler; however, any scheduler can be used. Each time the data bridge is run, it will update the SDF file archive for each data set specified in the resource file.

The data bridge manages the middleware interface between the on-line monitoring application and the plant data historian. Experience shows that data extraction from the data historian is the slowest step in a typical on-line monitoring implementation. The process of evaluating the data using the on-line monitoring software is typically much faster. Nonetheless, a 90-day archive of 1-minute interval data contains 129,600 observations of multiple signals. This is slightly more than 1 megabyte of data per signal or about 20 megabytes of data for a typical 20-signal model.

On a typical desktop computer, it can take from tens of seconds to several minutes to process this amount of data through a model. For this reason, the on-line monitoring data processing step was also automated in the reference implementation so that users are provided with near instantaneous access to the analysis results.

9-12

EPRI Licensed Material Operating in On-Line Mode 9.4 On-Line Monitoring System Operation The software used for the reference implementation can be operated in several modes with several types of user interface. The reference implementation takes advantage of two modes of SureSense operation-the unattended mode and the monitoring mode.

In the unattended mode, the SureSense software is operated as a command line process without a graphical user interface. This is the same SureSense software provided to users. However, operating in unattended mode without the overhead of a graphical user interface makes the processing more efficient. To invoke the unattended mode, simply a run control file name is added as the command line argument when launching the SureSense application. The SureSense command interpreter will open the named run control file and will operate based on the run specifications contained there. In the reference implementation, the unattended mode is used with a program scheduler to produce the daily run results.

In monitoring mode, the SureSense software is operated with a restricted graphical user interface that allows the user to access and review run results but that does not allow modifications to the underlying model or data set specifications. Monitoring mode is the preferred method of deployment in the reference implementation where it is used to evaluate the daily run results.

The software also offers a designer mode, which is more fully described in the SureSense Diagnostic Monitoring Studio User's Guide [3]. The designer mode will typically be used to prepare and evaluate models prior to deployment for on-line monitoring. As part of model preparation, it must be specifically configured for on-line operation. The designer mode might also be used to perform detailed engineering analysis of the run results for operability assessment after a fault event has been detected.

9.4.1 Producing Run Results The SureSense program can be scheduled to run automatically at predetermined intervals and to archive monitoring run results for periodic review by technicians and engineers. This is the most popular approach because few organizations require or desire to dedicate resources for continuous monitoring with immediate notification of results. The periodic on-line monitoring approach is easily configured and is the subject of this section.

The following steps apply to automating periodic on-line monitoring data evaluation:

1. Open the model in designer mode, and configure the data set(s) that will be monitored for on-line operation. It is not necessary to enable all data sets for on-line monitoring. It might be preferable to set up a dedicated data set to be used for on-line monitoring. This step requires designer privileges.

2. Set up the run control files for use with an automated scheduling tool such as the Task Scheduler accessory available in Microsoft Windows NT, 2000, and XP Professional. The run control files are described later in this report.

9-13

EPRI licensed Material Operating in On-Line Mode

3. Set up the automated scheduling tool such as the Task Scheduler accessory available in Microsoft Windows NT, 2000, and XP Professional.

4. Review and plot the results using either the designer mode or the monitoring mode user interfaces after the scheduler has run the program.

The SureSense monitoring interface is designed to use automatically stored run results for a model. Run results are based on the analysis of a specific data set. Automatic archiving of the run results for that data set is accomplished by checking the Create Result Files checkbox in the SureSense Edit Data Set window. Figure 9-2; shows the checkbox location. By setting up the data set to archive run results, any subsequent run of that data set will store the run results for later use. It is also recommended that the Auto Sejfect check box in the Edit Data Set window be enabled. This will automatically select the first and last available time points for the data set.

It should be noted that a designer must set up the data set to store run results using the designer interface. The modified model project must be saved for the changes to be effective.

Apply the changes to the data set and Save the model to enable on-line monitoring. When this initial setup has been completed, the data set will refresh its dedicated archive with new results each time the Model>Data Set combination is run by a user or by an automated scheduler. It is recommended that this data set be run to initialize the archive by selecting Run from the main system window menu. Select Monitor from the submenu and select the modified data set to run in the Run Director window.

9-14

EPRI licensed Material Operating in On-Line Mode Figure 9-3 Setting Up a Data Set to Store Run Results The run control files consist of a control file and a resource file. The control file is typically a batch command or.BAT file when running under Microsoft Windows operating systems. The resource file is a simple text file that specifies which models and data sets are to be run when the command file is executed. Any number of resource files can be run from a single command file.

However, in this discussion it is assumed that a single resource file lists the models and data sets of interest. Contact the software provider or system administrator for instructions on setting up more sophisticated run control scenarios. Standard Microsoft Windows syntax is used in the following examples, but the concepts are equally applicable to other operating systems.

9-15

EPRI licensed Material Operating in On-Line Mode The command file will typically contain one or more command lines of the following format:

C:\\PROGRA-1\\JavaSoft\\jre\\1.3.1_03\\bin',java -cp.;.\\SDMSvl4Om.jar SDMS RUNFILE.txt where

C:\\PROGRA-l\\JavaSoft\\jre\\1.3.1 03\\bin\\java is the path to the Java virtual machine.

-cp.;.\\SDMSvl4Om.jar SDMS specifies the SureSense executable (presumes the command file is in the SureSense home directory denoted as.\\).

RUNFILE. txt specifies the path to the resource file (presumes the resource file is also in the SureSense home directory).

A sample command file can be found in the SureSense home directory with the name Unattended.bat. The name unattended means that SureSense will run automatically and without a graphical user interface. The presence of a command line argument specifying the resource file name instructs SureSense to run without the user interface.

The resource file is a plain text file and will typically contain multiple lines with the following format:

RUNFILE USER Scheduler Daily MODEL "./Projects/Model_A.svm" "Model A_Recent" MODEL "./Projects/Model_B.svm" "Model B_Recent" where

RUNFILE is a keyword that instructs the SureSense command interpreter that this resource file implements an unattended run.

USER is a keyword that instructs the SureSense command interpreter to expect the next two tokens to contain a user name and a password.

Scheduler is a valid user name for a user with Monitor privileges. Only monitoring runs can be performed in unattended mode.

Daily is a valid password for the monitoring user.

MODEL is a keyword that instructs the SureSense command interpreter to expect the next two tokens to contain a model name and a data set name.

"lProjects/ModelA.svm" is the path name to a SureSense model file, which is always contained in quotes and can include spaces as necessary for the correct path.

"Model_A_Recent" is the data set name within the specified SureSense model file, which is always contained in quotes and can include spaces as necessary for the data set.

9-16

EPRI Licensed Material Operating in On-Line Mode Any number of MODEL instructions can be listed in the resource file. Each instruction should be listed on a separate line in the file. The resource file should be saved as a plain text file. A sample resource file can be found in the SureSense home directory with the name RUNFILE.txt.

It should be noted that the specified model must be trained and the data set enabled for creating results files to run successfully in unattended mode. The unrun.log file in the SureSense home directory should be reviewed periodically to ensure that the models and data sets are running as expected. The.BAT file should be run manually prior to implementing the scheduler to be sure that its behavior is acceptable.

9.4.2 Evaluating Run Results Models that have been previously developed are available for monitoring using either the SureSense designer interface or the monitoring interface. The monitor interface is provided for users to view run results without having to use the full designer interface. Figure 9-4 shows the arrangement.

I 3ru iM=16431.

i ci rEIjlJM 11D-t Level Example -> Training File El LT-01 -> Level #1 13 n7 LT-O2-Level#2 13 1LT-03 -Level

3 E41 1 LT-04-> Level #4 Figure 9-4 SureSense User Interface Options for Monitoring The monitoring interface has the following desirable features:

Not all personnel require detailed knowledge regarding how to design and maintain on-line monitoring models. The simpler monitoring interface allows personnel to access and view run results without needing extensive training in how to operate and navigate through the SureSense designer interface.

The monitor interface is designed to support automated on-line monitoring in which data are collected, the model is updated, and run results are provided automatically at predefined times.

9-17

EPRI licensed Material Operating in On-Line Mode

The monitor interface has been designed to support remote user access. If desired, monitoring activities can be centralized for several facilities (with the monitor interface providing local access to on-line monitoring results).

The monitoring interface displays results for a single data set within the model. When the model is first selected, SureSense automatically selects a default data set for the display. The preferred default is the first monitoring data set for which automated results file archiving is enabled. If no such data set is found, the first monitoring data set will be the second default choice. Open a different data set for display by selecting Data Set from the Monitor menu item.

As shown in Figure 9-5, three options are provided for each signal:

Run information

Observation estimate plot

Residual plot A

HeiI101 rM-i~

iiiEiSR;..i ZC7' E

ts '

l 7 E

I Level Example -> Training File El tLT-01 -> Level #1 Run Into

Observation & Estimation Residual & LimIt 8E n LT Level #2 i

Run Info

...

Observation &Estimation

Residual & Umit o 3LT-03 -> Level #3 Run Info

...

Observation & Estimation Residual & Limit o CO LT Level #4

Run Info Figure 9-5 Monitor Window Options for Each Signal Each of these options is obtained by double clicking on the desired item with the right or left button of the mouse. Figure 9-6 shows a typical signal report. Figure 9-7 shows a sample observation estimate plot, and Figure 9-8 shows a sample residual plot.

9-18

EPRI Licensed Material

_':*1O Operating in On E

fL u Properties of Signal LT-02 in Level Example:

Parameter Name: Level #2 Unit: PC Component: Tank Level Port: Internal Allowable Error: 1.5 Maximum Error: 2.0 Confidence: 0.9975 Residual Moving Avg Points: I Validated: True Monitoring performance results for OPERATING phase:

Number of Observations:

1477 KinValue MaxValue AvgValue StdDevVal 61.9546 63.1367 62.6814 0.2504 BinEstimate MaxEstimate AvgEstimate StdDevEst 62.6072 63.1449 62.9719 0.0549 RMSError MaxError AvgError StdDevErr 0.3970 1.0329 (1.2947 0.2662 Alarm Total: 1061 of type Pos Alarm Type:

0 Neg Alarm Type:

1061 Failure Total: 1103 of type Pos Failure Type:

0 Neg Failure Type:

1103 Failure History:

Time Type Value Estimate Error 02/01/01 06:04:59 VAR-NOR 62.9253 62.9819

-0.0602 02/01/01 06:13:59 NEARNEG 62.8349 62.9434

-0.1121 Figure 9-6 Sample Run Information for a Signal n-Line Mode 9-19

EPRI Licensed Material Operating in On-Line Mode Level Example: Test File 63.20 63.07.

62.94i 0.

V.u i!

w

, 0v 1l I.

62.68 62.55 62.42 62.29 62.16 62.03 61.90 -

02M01l x

x x

x 02O2W1 Time

' LT-02: Estimation vs. Time SX LT-02: Observation vs. Tiue NTORINNIN&M Figure 9-7 Sample Observation Estimate Plot for a Signal 9-20

EPRI Licensed Material Operating in On-Line Mode v

PA.

1X Level Example: Test File 2.00 1.60 1.20 0.80 0.40 0.00 A

A

-0.40 1°20 < d I f

-0.80

-1 20

-1.60 -

j.

-2.00 02O11D01 0202101 Time x LT-02: Resi&Wal vs. Tine I

Figure 9-8 Sample Residual Plot for a Signal These reports and plots will display very quickly if the selected data set is enabled for on-line monitoring. If the selected data set is not enabled for on-line monitoring, the result might still be available. However, the data set will have to be processed first to generate the results information. In this case, a Data Analysis Required dialog (similar to the one shown in Figure 9-

9) will be displayed, requesting permission to connect to the data sources and run the data set prior to displaying the requested results. If the operator responds Yes, the specified data set will run in the background and display the requested information. Depending on the model or data set size, it might take several minutes to complete a run.

Figure 9-9 Option to Run a Data Set If Not Enabled for On-Line Monitoring 0-v 9-21

EPRI licensed Material Operating in On-Line Mode Run summary information is also provided for the entire model. By selecting Summary from the Report menu, the Run Summary report is displayed as shown in Figure 9-10. If the selected data set is not enabled for on-line monitoring, the summary might still be available; however, the data set must be processed first to generate the summary information.

It should be noted that a SureSense model cannot be modified, trained, or saved from within the monitor window. A designer must have previously trained and saved the model before it can be used to display or generate monitoring results from within the monitor window. If results are requested from an untrained model, a message similar to the one shown in Figure 9-11 will be displayed.

izl l

_je I

_gi i ' i i_ E.;;ii

{ a iA'.i

.S;;'2,i:ti",.,

i.'-O i.i.e.....................................................

.4K.i do

{.._2

.............. Em.'

,.. =ii

_,;X_,:,

Monitoring Results for Level Example -> Test File Result Summary for OPERATING:

Data Points Processed: 1477 Average Processing Time:

1.9973 msec Normalized RNS Error Percent:

0.1764 %

Single Cycle Alarm Total:

1064 alarms consisting of Pos Alarm Type: I Keg Alarm Type:

1063 Failure Decision Total:

1103 failu:es consisting of Pos Decision Type:

0 Neg Decision Type:

1103 Summary by Signal Parameter:

Signal~ame MinValue HinEstimate RMSError RMSErrork LT-O1 62.8702 62.9672 0.0217 0.0342Z LT-02 61.9546 62.6072 0.3970 0.6305' LT-03 61.4515 61.6583 0.0144 0.0232%

LT-04 58.5115 58.5236 0.0086 0.0147%

MaxValue HaxEstimate MaxError NumAlarms 63.6775 63.6194 0.0951 2

63.1367 63.1449 1.0329 1061 62.0803 62.0601 0.2054 1

59.0244 59.0235 0.0313 0

AvgValue AvgEstimate AvgError NumFailures 63.4374 63.4347 0.0171 0

62.6814 62.9719 0.2947 1103 61.8914 61.8908 0.0107 0

58.7707 58.7679 0.0068 0

StdDevVal StdDevEst StdDevErr TimelstFail 0.0797 0.0705 0.0133

0.

2504 0.0549 0.2662 02/01/01 06:04:59 0.0668 0.0590 0.0096 0.0845 0.0867 0.0052 n

r

=

7r=77 Figure 9-10 Model Run Summary Report 9-22

EPRI licensed Material Operating in On-Line Mode ii' Cannot sh mnsummary Intorrnation

Figure 9-11 Notification That the Selected Model Is Not Currently Trained 9.5 Using the Microsoft Window<; Scheduler Any program scheduler can be used to initiate the command files described in the previous sections. Microsoft Windows provides a scheduler accessory with its Windows NT, 2000, and XP Professional operating systems. The following discussion illustrates the Windows 2000 scheduler. The NT and XP schedulers are nearly identical.

To set up the Windows scheduler, the following steps should be performed:

1. Select Programs from the Start menu, select Accessories, select System Tools, and select Scheduled Tasks. This will bring up the Task Scheduler window, similar to the one shown in Figure 9-12.

2. Double click on the Add Scheduled Task icon. This will start the Scheduled Task Wizard, similar to the one shown in Figure 9-13.

3. Click Next and select Browse in the window that is similar to the one shown in Figure 9-14.

Browse to the command file (for example, Unattended.bat,), select it, and choose Open.

4. Name the scheduled task, and select the frequency for running the task in the window similar to Figure 9-15. Choose Next.

5. Enter the start time and date information in the next window similar to Figure 9-16. Choose Next.

6. Enter the operating system user name password, if necessary, in the window similar to Figure 9-17. This is not the SureSense user name or password but the user's valid login to the Windows operating system. The facility's system administrator should be contacted for this information if necessary. Choose Next.

7. Review the information in the final window, similar to Figure 9-18, then choose Finish. This task will now appear in the Task Scheduler window (Figure 9-12).

This procedure creates a scheduled task that swill run SureSense in unattended mode as directed by the Scheduled Task Wizard. The facility's system administrator should be consulted for further details about configuring the Microsoft Windows Task Scheduler.

9-23

EPRI licensed Material Operating in On-Line Mode Feidtn JE<_

iiiu~n LtRffi~

~tus Add Soedkd Task 5yanwtec Neteted "Aie shd*

ettmes 4:43:00 PM...

Nee Figure 9-12 Microsoft Windows Task Scheduler Figure 9-13 Microsoft Windows Scheduled Task Wizard Introduction 9-24

EPRI Licensed Material Operating in On-Line Mode Figure 9-14 Select the SureSense Command File (*.BAT) Using the Browse Button Figure 9-15 Name the Scheduled Task and Select the Run Frequency 9-25

EPRI Licensed Material Operating in On-Line Mode Figure 9-16 Provide Additional Scheduling Details Figure 9-17 Provide Microsoft Windows Login Information 9-26

EPRI Licensed Material Operating in On-Line Mode Figure 9-18 Review and Install the Scheduled Task 9-27

EPRI incensed Material 10 DECLARING THAT A MODEL IS READY FOR USE Model development is prone to continual refinement and tinkering to optimize performance. In order for the model to be useful, however, it must eventually be placed into service. Section 10 addresses the steps necessary to declare that a model is ready for use, and Table 10-1 provides a short checklist to consider. The following sections discuss each of the items listed in Table 10-1.

Some of the recommendations are model specific, and other recommendations apply equally to the entire on-line monitoring system.

Table 10-1 Checklist for Determining That a Model Is Ready for Use Item I

Recommendation l

Ready?

Completing the Model

1.

Confirm the signal selection.

2.

Confirm the adequacy of training data.

3.

Verify training adequacy.

4.

Verify fault-detection setlings.

5.

Declare the model ready.

Automating Data Acquisition

6.

Set up ongoing data acquisition.

7.

Assign responsibility for model testing Anticipating Failure Alarms

8.

Establish an alarm response.

g9.

Determine retraining goals.

10-1

EPRP Licensed Material Declaring That a Model Is Ready for Use 10.1 Completing the Model This topical report provides detailed modeling guidelines, including recommendations for approaching model development. At some point, each model's behavior will be adequate so that it can be relied on as an on-line monitoring tool. The following activities should be considered as part of the model completion process:

Confirm the signal selection, Remove uncorrelated signals, and determine if other signals should be added to the model. Removing signals from an existing model is easy. Adding signals is more difficult because historical data will have to be acquired for these signals and the model must be retrained for the new configuration. For this reason, it is always better to evaluate the signal selection carefully before acquiring data.

Confirm the accuracy of training data. Bad data should not be allowed to remain in training data sets even if model performance appears acceptable.

Verify training adequacy. Test the mode]. using historical data, and confirm that the system operating space is adequately bounded by the training space. Document any instances in which the training space does not bound transients. Evaluate the settings of the phase determiner and estimator as part of the training adequacy.

Verify fault-detection settings. Fault detection should not be too sensitive. Occasional alarms should be evenly distributed across all signals in the training data sets.

After completing the model, declare the model ready.

10.2 Automating Data Acquisition As the model is placed in service, new data should be tested periodically using the model.

Consider the following:

Set up ongoing data acquisition. Set up the method by which data will be periodically made available to the model. This assumes that the model is operating in batch mode with data files periodically extracted from the plant computer's data storage system.

Assign responsibility for model testing. I)etermine how new data will be tested and how the results will be evaluated.

10.3 Anticipating Failure Alarms Signal (sensor) failures will be periodically identified by the on-line monitoring system. Based on the models developed to date, few of the identified failures will represent actual instrument drift or failure. More likely, the identified failures will represent process system excursions beyond the defined training space. Consider the following:

Establish alarm response. In many cases, fault detection might be more sensitive than required by the instrumentation. Evaluate failure alarms, and determine if corrective action is required. Corrective action might involve instrument calibration or repair, or it might require changes to the on-line monitoring system model.

Determine retraining goals. Decide under which conditions to retrain the model rather than accepting the periodic failure alarms.

10-2

EPRI licensed Material 11 REFERENCES

1. On-Line Monitoring of Instrument Channel Performance, Volume 2: Model Examples, Algorithm Details, and Reference Informnation, EPRI, Palo Alto, CA: 2004. 1003579.

2. On-Line Monitoring of Instrument Channel Performance, Volume 3: Applications to Nuclear Power Plant Technical Specification Instrumentation, EPRI, Palo Alto, CA: 2004. 1007930.

3. SureSense Diagnostic Monitoring Studio User's Guide, Version 2.0, Expert Microsystems, Inc., Orangevale, CA: 2004.

4. "Application of On-Line Performance Monitoring to Extend Calibration Intervals of Instrument Channel Calibrations Required by the Technical Specifications," On-Line Monitoring of Instrument Channel Performance. EPRI Topical Report TR-104965. Safety Evaluation by the Office of Nuclear Reactor Regulation, Project No. 669, U.S. Nuclear Regulatory Commission, July 2000.

5. On-Line Monitoring of Instrument Channel Performance, EPRI, Palo Alto, CA: 2000.

1000604.

6. On-Line Monitoring Cost Benefit Guide, EPRI, Palo Alto, CA: 2003. 1006777.

11-1

EPRI incensed Material A

GLOSSARY This glossary provides definitions for technical terms used in the report or otherwise applied to on-line monitoring. Abbreviations used in the body of the report are also included in the glossary.

A accuracy (reference) - In process instrumentation, a number or quantity that defines a limit that error should not exceed when a device is used under specified operating conditions. Error represents the difference between the measured value and the standard or ideal value.

adaptive sequential probability - An inference procedure for determining a signal alarm condition based on the derived probability density function for the training data.

adjustment - The activity of physically adjusting a device to leave it in a state in which its performance characteristics are within acceptable limits.

ANL - Argonne National Laboratory.

ANN - Artificial neural network.

API - Application programming interface. A. set of software functions or methods externally callable by another software program.

as-found - The condition in which a channelt, or portion of a channel, is found after a period of operation and prior to any calibration.

as-left - The condition in which a channel, or portion of a channel, is left after calibration or surveillance check.

ASP - Adaptive sequential probability.

B B&W - Babcock and Wilcox.

BART - Bounded angle ratio test.

A-1

EPRI Licensed Material Glossary Bayesian Belief Network - A mathematical method of specifying the probabilistic relationships among events. In the context of on-line monitoring, a belief network is an expression of the probabilistic knowledge of a system and its operation.

Bayesian conditional probability - An inference procedure for determining signal failure based on a preceding number of alarms.

Bayesian sequential probability - An inference procedure for determining a signal alarm condition based on the derived probability density function for the training data.

BCP - Bayesian conditional probability.

Belief Network - See Bayesian Belief Network.

BSP - Bayesian sequential probability.

BWR - Boiling water reactor.

C calibration - The process of adjustment, as necessary, of the output of a device such that it responds within a specified tolerance to known values of input.

calibration interval - The elapsed time between the initiation or successful completion of calibrations or calibration checks on the same instrument, channel, instrument loop, or other specified system or device.

calibration (time-directed) - The calibration of an instrument at specified time intervals, without regard for the existing calibrated stale of the instrument.

channel - An arrangement of components and modules as required to generate a single protective action signal when required by a generating station condition, a control signal, or an indication function.

channel calibration (typical Technical Specification definition) - The adjustment, as necessary, of the channel so that it responds within the required range and accuracy to known input. The channel calibration shall encompass the entire channel, including the required sensor, alarm, interlock, display, and trip functions. The channel calibration might be performed by means of any series of sequential, overlapping calibrations or total channel steps so that the entire channel is calibrated.

channel check - The qualitative assessment, by operator observation, of channel behavior during operation and includes, where possible, comparison of the channel indication to other indications from other redundant channels measuring the same parameter.

confidence interval - An interval that contains the population mean to a given probability.

A-2

EPRI Licensed Material Glossary CMP - Configuration management plan.

CSV - Comma delimited file format.

D DDD - Design description document.

D/P - Differential pressure.

D-matrix - The matrix of vectors selected by the MSET training process. These vectors represent the model in terms of its recognition of "normal" system behavior. Also, referred to as the training matrix or the process memory matrix.

desired value - A measurement value with no error existing.

deviation - The difference between the parameter estimate and the monitored signal (more commonly referred to as the residual).

DOE - Department of Energy.

domain - The operating states that form the basis for training a model.

drift - An undesired change in output over a period of time, which is unrelated to the input, environment, or load.

E EDF - ElectricitM de France.

error - The undesired algebraic difference between a value that results from measurement and a corresponding true value.

ESFAS - Engineered Safeguards Features Actuation System.

estimate - The best estimate of the actual process value; used interchangeably with parameter estimate.

F field calibration - Performing the activities of surveillance and adjustment using an external reference source.

flat-topping - The tendency for an estimate to follow a signal disturbance to the upper or lower limit of the data used for training and remain at that limit.

A-3

EPRI licensed Material Glossary FRD - Functional requirements document.

G Gaussian probability - Normal probability.

GUI - Graphical user interface.

I ICMP - Instrument Calibration and Monitoring Program.

IMC - Instrument monitoring and calibration.

Initial training - See Training.

instrument channel - An arrangement of components and modules as required to generate a single protective action or indication signal that is required by a generating station condition. A channel loses its identity where single protective action signals are combined.

L linear - A straight-line relationship between one variable and another. When used to describe the output of an instrument, it means that the output is proportional to the input.

loop - See channel.

M M&TE - Measuring (or measurement) and test equipment.

margin - An additional allowance added to i he instrument channel uncertainty to allow for unknown uncertainty components. The addition of margin moves the set point further away from the analytical limit or nominal process limits mean - The average value of a random sample or population. For n measurements of x, where i ranges from 1 to n, the mean is given by.

x = EXi n

median - The value of the middle number in an ordered set of numbers. Half the numbers have values that are greater than the median and half have values that are less than the median. If the data set has an even number, the median is the average of the two middle numbers.

A-4

EPRI licensed Material Glossary MinMax - An algorithm that extracts vectors that bound a vector space defined by training data.

(See vector ordering).

model - The group of signals that have been collected for an analysis.

module - Any assembly of interconnecting components that constitutes an identifiable device, instrument, or piece of equipment. A module can be removed as a unit and replaced with a spare.

It has definable performance characteristics that permit it to be tested as a unit. A module can be a card, a drawout circuit breaker, or another subassembly of a larger device provided it meets the requirements of this definition.

monitoring - The activity of evaluating instrument channel performance to determine that it is performing within acceptable performance limits.

MSES - Multivariate State Estimation Studio.

MSET - Multivariate State Estimation Technique.

N NEPO - Nuclear Energy Plant Optimization.

noise - An unwanted component of a signal or variable. It causes a fluctuation in a signal that tends to obscure its information content.

nonlinear - A relationship between two or more variables that cannot be described as a straight line. When used to describe the output of an instrument, it means that the output is of a different magnitude than the input.

normal distribution - The density function of the normal random variable X, with mean,u and variance &Y2 is 1

(X-pj n (x; u, a) =

e 2

normalized - A term indicating that the data. values for a group of disparate signals have been modified so that all signals have approximately equal weight in an analysis.

NRC - Nuclear Regulatory Commission.

0 ODBC - Open database connectivity.

OLM - On-line monitoring.

A-5

EPRI licensed Material Glossary OLMS - On-line monitoring system.

on-line monitoring - An automated method of monitoring instrument performance and assessing instrument calibration while the plant is operating.

operating space - A defined region of operation.

operating state - A defined region of operation, often established by power level or equipment lineup. Often used interchangeably with operating space.

overfitting - The tendency for the estimate to follow a signal disturbance.

P parameter estimate - The best estimate of the actual process value.

pattern recognition - The ability of a system to match large amounts of input information simultaneously and generate a categorical or generalized output.

PDF - Probability density function.

PEANO - Process Evaluation and Analysis by Neural Operators.

phase - A defined region of operation used 10 separate the model into submodels.

phase determiner - A software function used to determine which phase applies to a set of observations.

population - The totality of the observations with which we are concerned.

probability density function - An expression of the distribution of probability for a continuous function. The probability contained within a given interval can vary from 0 to 1 and is expressed by:

P(a < X < b) = Jb f(X)dX PWR - Pressurized water reactor.

R random - Describing a variable whose value at a particular future instant cannot be predicted exactly but can only be estimated by a probability distribution function.

range - The difference between the minimum and maximum value in a set of data.

RCS - Reactor coolant system.

A-6

EPRI licensed Material Glossary reference accuracy - A number or quantity that defines the limit that errors will not exceed when the device is used under reference operating conditions.

residual - The difference between the observation and the corresponding estimate for that observation. Also known as the residual error.

retraining - Any change made to the set of data originally selected as representative of system normal and expected behavior.

retraining for operating space - Retraining caused by modifying the data used for training. If the pool of data made available for training is modified, the vector selection for the training matrix will likely change, even if the model settings are unchanged.

retraining for settings - Retraining caused by adjusting model settings. Changing estimator settings, changing the number of signals, adjusting data limit filters, or modifying phase-determiner definitions for validation will require retraining and optimizes model performance for a given set of training data.

RPS - Reactor Protection System.

RTD - Resistance temperature detector.

S S/G - Steam generator.

safety limit - A limit on an important process variable that is necessary to reasonably protect the integrity of physical barriers that guard against the uncontrolled release of radioactivity.

sample - A subset of a population.

SDF - Signal data file.

SDM - Signal disturbance magnitude.

SDMS - SureSense Diagnostic Monitoring System.

sensor - The portion of a channel that responds to changes in a plant variable or condition and converts the measured process variable into an electric or pneumatic signal.

set point - See trip set point.

signal - The output data from a channel.

signal conditioning - One or more modules that perform further signal conversion, buffering, isolation, or mathematical operations on the signal as needed.

A-7

EPRI Licensed Material Glossary span - The region for which a device is calibrated and verified to be operable.

spillover - The tendency for the estimate of one signal to follow a disturbance in a second highly correlated signal.

SPRT - Sequential probability ratio test (used with MSET to determine if a process is operating normally or abnormally).

SQL - Structured query language.

staggered test basis - Testing of one of the systems, subsystems, channels, or other designated components during the interval specified by the surveillance frequency, so that all systems, subsystems, channels, or other designated components are tested during n surveillance frequency intervals, where n is the total number of systems, subsystems, channels, or other designated components in the associated function.

standard deviation (population) - A measure of how widely values are dispersed from the population mean and is given by In Ex2 - (EX)2 nI n2 n

standard deviation (sample) - A measure of how widely values are dispersed from the sample mean and is given by

= nX 2 (EX/

S n(n -1) state space - The operating states that form the basis for training a model.

steady-state - A characteristic of a condition, such as a value, rate, periodicity, or amplitude, exhibiting only a negligible change over an arbitrary long period of time.

SureSense - A commercially supported implementation of the MSET software originally developed by Argonne National Laboratory.

surveillance - The activity of checking a device to determine if it is operating within acceptable limits.

surveillance interval - The elapsed time between the initiation or successful completion of a surveillance or surveillance check on the same instrument, channel, instrument loop, or other specified system or device.

A-8

EPRI licensed Material Glossary T

test interval - See calibration interval.

time-directed calibration - See calibration (time-directed).

training - For a pattern recognition system such as MSET, the selected vectors that describe the operating state for normal and expected behavior.

training matrix - The matrix of vectors selected by the MSET training process. These vectors represent the model in terms of its recognition of "normal" system behavior. Also, referred to as the D-matrix or the process memory matrix.

trip set point - A predetermined value at which a bistable device changes state to indicate that the quantity under surveillance has reached the selected value.

U uncertainty - The amount to which an instniment channel's output is in doubt (or the allowance made therefore) due to possible errors either random or systematic that have not been corrected for. The uncertainty is generally identified within a probability and confidence level.

V V&V - Verification and validation.

variance (population) - A measure of how widely values are dispersed from the population mean and is given by 2

nlX 2

-2(X) 2 n

variance (sample) - A measure of how widely values are dispersed from the sample mean and is given by 2

n X 2

( X/

n(n - 1) vector (of signals) - All data observations for a single time step. For example, if the data are contained in a spreadsheet, a single row of data is a vector.

vector ordering - An algorithm that adds representative vectors from the inner regions of a vector space to produce a more accurate process model. Vector ordering is differentiated from the MinMax algorithm in that it describes the interior of a space whereas the MinMax algorithm bounds the vector space.

A-9

EPRI Licensed Material Glossary vector pattern recognizer - An MSET estimation technique. The algorithm compares two vectors and defines their similarity as a function of the inverse of the Euclidean distance between the vectors.

vector similarity evaluation technique - An MSET estimation technique. The algorithm defines the similarity of two vectors as a function of the ratio between the Euclidean distance between the vectors and the sum of the root sum square (RSS) values of the vectors.

VPR - Vector Pattern Recognizer.

VSET - Vector Similarity Evaluation Technique.

A-10

I I Export Control Restrictions Access to and use of EPRI Intellectual Property is granted with the specific understanding and requirement that responsibility for ensuring full compliance with all applicable U.S. and foreign export laws and regulations is being under-taken by you and your company. This includes an obligation to ensure that any individual receiving access hereunder who is not a U.S. citizen or permanent U.S. resident is permitted access under applicable U.S. and foreign export laws and regulations. In the event you are uncertain whether you or your company may lawfully obtain access to this EPRI Intellectual Property, you acknowledge that it is your obligation to consult with your company's legal counsel to determine whether this access is lawful. Although EPRI may make available on a case by case basis an informal assessment of the applicable U.S. export classification for specific EPRI Intellectual Property, you and your company acknowledge that this assessment is solely for informational purposes and not for reliance purposes. You and your company acknowledge that it is still the obligation of you and your company to make your own assessment of the applicable U.S. export classification and ensure compliance accordingly.

s You and your company understand and acknowledge your obligations to make a prompt report to EPRI and the appropriate authorities regarding any access to or use of EPRI Intellectual Property hereunder that may be in violation of applicable U.S. or foreign export laws or regulations.

SINGLE USER UCENSE AGREEMENT THIS IS A LEGALLY BINDING AGREEMENT BETWEEN YOU AND THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (1PRi)` PLEASE READ IT CAREFULLY BEFORE REMOVING THE WRAPPING; MATERIAL BY OPENINGTHIS SEALED PACKAGEYOU ARE AGREEING TOTHETERMS OFTHISAGREEMENT. IFYOU DO NOTAGREE TOTHETERMS OFTHISAGREEMENT,PROMPTLY RETURNTHE UNOPENED PACKAGETO EPRI AND THE PURCHASE PRICE WILL BE REFUNDED.

1. GRANT OF LICENSE EPRI grants you the nonexclusive and nontransferable right during the term of this agreement to use this package only for your own benefit and the benefit of your orgnization.This means that the following may use this package:

(I) your company (at any site owned or operated by your company); TV) Its subsidiaries or other related entities and (111) a consultant to your company or related entities, if the consultant has entered Into a contract agreeing not to disclose the package outside of its organization or to use the package for Its own benefit or the benefit of any party other than your company.

This shrink-vwrp license agreement is subordinate to the terms of the Master Utility Ucense Agreement between most U.S. EPRI member utilities and EPRI.Any EPRI member utility that does not have a Master Utility License Agreement may get one on request

2. COPYRIGHT This package, including the Information contained in it, Is either licensed to EPRI or owned by EPRI and is protected by United States and international copyright laws.You may not, without the prior written permission of EPRI, reproduce, translate or riodily this pactage,ln any form, in whole or in part, or prepare any derivative work based on this package.

3. RESTRICTIONS You may not rent, lease, license, disclose or give this package to any person or organization, or use the Information contained in this package, for the benefit of any third party or for any purpose other than as specified above unless such use Is with the prior written permission of EPRlYou agree to take all reasonable steps to prevent unauthorized disclosure or use of this package.

Except as specified above, this agreement does not grant you any right to patents, copyrights, tnide secrets, trade names, trademarks or any other Intellectual pmoperty, rights or licenses In respect of this package.

4CTERM AND TERMINATION This license and this agreement are effective until terminated.You may terminate them at any time by destroying this package.

EPRI has the right to terminate the license and this agreement Immediately if you fall to comply with any term or condItion of this agreement Upon any termination you may destroy this package, but all obligations of nondisclosure will remain In effect S. DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES NEITHER EPRIIANY MEMBER OF EPRI,ANY COSPONSOR, NORANY PERSON OR ORGANIZATIONACTING ON BEHALF OF ANY OFTHEM:

(A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED,

() WITH RESPECT T1 THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS OR SIMILAR ITEM DISCLOSED IN THIS PACKAGE, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR

PURPOSE, OR (11)

THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS,INCLUDINGANY PARTYS INTELLECTUAL PROPERTYOR (lll)THATTHIS PACKAGE IS SUITABLE TO ANY PARTICULAR USER S CIRCUMSTANCE. OR (B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES)

RESULTING FROM YOUR SELECTION OR USE OF THIS PACKAGE ORANY INFORMATION,APPARATUS, METHOD, PROCESS OR SIMILAR ITEM DISCLOSED IN THIS PACKAGE.

6. EXPORT The laws and regulations of the United States restrict the export and re-export of any portion of this package, and you agree not to export or re-export this package or any related technical data In any form without the appropri-ate United States and foreign government approvals.

7. CHOICE OF LAW This agreement will be governed by the laws of the State of California as applied to transactions taking place entire-ly In Caibrnia between California resident U. INTEGRATION You have read and understand this agreement, and acknowledge that it is the final, complete and exclusive agreement between you and EPRI concerning its subiect matter, superseding any prior related understanding or agreement. No waiver, varation or different terms of this agreement will be enforceable against EPRI unless EPRI gives its prior writ-ten consent, signed by an officer of EPRI About EPRI EPRI creates science and technology solutions for the global energy and energy services industry.

U.S. electric utilities established the Electric Power Research Institute in 1973 as a nonprofit research consortium for the benefit of utility members, their customers, and society. Now known simply as EPRI, the company provides a wide range of innovative products and services to more than 1000 energy-related organizations in 40 countries. EPRI's multidisciplinary team of scientists and engineers draws on a worldwide network of technical and business expertise to help solve today's toughest energy and environmental problems.

EPRI. Electrify the World Program:

1003361 Nuclear flower

) 2004 Electric Power Research Institute (EPRI), Inc.AII rights reserved. Electric Power Research Institute and EPRI are registered service marks of the Electric Power Research Institute, Inc.

EPRI. ELECTRIFYTHE WORLD is a service mark of the Electric Power Research Institute, Inc.

Prmnted on recled paper in the United States ofAmnerico EPRI

3412 Hillview Avenue, Palo Alto, California 94304

PO Box 1041 2, Palo Alto, California 94303

USA 800.313.3774

650.855.2121 - askepri@epri.com

www.epri.com

ML060400223

Text

SUMMARY

=

Navigation menu

Search