ML23256A010: Difference between revisions

From kanterella
Jump to navigation Jump to search
(StriderTol Bot insert)
 
(StriderTol Bot change)
 
(One intermediate revision by the same user not shown)
Line 15: Line 15:


=Text=
=Text=
{{#Wiki_filter:IAEA Workshop on Assessment and Reduction of Vulnerabilities to Common Cause Failures in Instrumentation and Control Systems in Nuclear Power Plants 18-22 September 2023 State-of-the-art approaches to reduce the potential for CCF in I&C systems Conditions to avoid the need for diversity in design Sushil Birla Senior Technical Advisor U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research The views expressed herein are those of the author and do not represent an official position of the U.S. NRC.
{{#Wiki_filter:State-of-the-art approaches to reduce the potential for CCF in I&C systems Conditions to avoid the need for diversity in design Sushil Birla Senior Technical Advisor U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research The views expressed herein are those of the author and do not represent an official position of the U.S. NRC.
IAEA Workshop on Assessment and Reduction of Vulnerabilities to Common Cause Failures in Instrumentation and Control Systems in Nuclear Power Plants 18-22 September 2023


Terminology & Scope To assimilate knowledge from outside the NPP industry and avoid ambiguity
Terminology & Scope To assimilate knowledge from outside the NPP industry and avoid ambiguity Sources of definitions are broader than NPP-specific standards.
* Sources of definitions are broader than NPP-specific standards.
Context in focus: Operating power reactor protection systems.
* Context in focus: Operating power reactor protection systems.
Focus: Hazards from (systemic) common causes:
* Focus: Hazards from (systemic) common causes:
Rooted in engineering deficiencies That may degrade the redundancy and defense-in-depth characteristics Hazard: potential for harm through the degradation of a safety function allocated to the object under analysis Examples of sources: ISO/IEC/IEEE 24765; ISO/IEC Systems & Software SQuaRE series; ISO/IEC 15026 2
Rooted in engineering deficiencies That may degrade the redundancy and defense-in-depth characteristics Hazard: potential for harm through the degradation of a safety function allocated to the object under analysis Examples of sources: ISO/IEC/IEEE 24765; ISO/IEC Systems & Software SQuaRE series; ISO/IEC 15026 2


Meaning of state-of-the-art in this presentation State-of-the-art Capability demonstrated in leading-edge implementations; not yet scaled up State-of-the-practice Best-in-class; best practices, e.g.,
Meaning of state-of-the-art in this presentation State-of-the-art State-of-the-practice Current practice Capability demonstrated in leading-edge implementations; not yet scaled up Best-in-class; best practices, e.g.,
as seen in industry consensus standards Current practice As seen in many organizations 3
as seen in industry consensus standards As seen in many organizations 3


Reference Framework for Assurance Verification Validation (V&V)
Reference Framework for Assurance Plans Concept Requirements Architecture Detailed design Implementation Testing Verification Validation (V&V)
Vp           Vc              Vr          Va          Vdd            Vi          Vt Requirements from NPP Safety Analysis System Development Detailed Plans      Concept      Requirements  Architecture              Implementation  Testing design HAp          HAc               HAr       HAr           HAdd           HAi       HAi Safety Engineering Reference model from IEEE Std 1012 4
Vp System Development HAp Requirements from NPP Safety Analysis HAc HAr HAr HAdd HAi HAi Vc Vr Va Vdd Vi Vt Safety Engineering Reference model from IEEE Std 1012 4


Identifying elements to reduce the uncertainty space Deficiencies in:
Identifying elements to reduce the uncertainty space Conditions to reduce associated uncertainties
* = all phases Conditions on methods and tools Logical integration of all the evidence Reducing inconsistencies in judgment
+
+
+
5 Deficiencies in:
Hazard identification*
Hazard identification*
Requirements specification Architectural specification   Conditions to reduce associated uncertainties Detailed design specification Implementation (coding)
Requirements specification Architectural specification Detailed design specification Implementation (coding)
Verification                                     +
Verification
Conditions  on methods and tools
* = all phases
                                                      +
Logical integration of all the evidence
                                                      +
Reducing inconsistencies in judgment 5


Prevention               Mitigation Q
Prevention Mitigation Fault tolerance in Design Q
u a                   Changes needed to prevent CCF l   Desired
u a
* Objective evaluation criteria i     state
l i
* Paradigm t
t y
y
o f
* State of practice
D e
* Competence o
s i
* Culture f
g n
D                       Current e                         state s
Desired state Current state Changes needed to prevent CCF Objective evaluation criteria Paradigm State of practice Competence Culture
i g
n              Fault tolerance in Design


Defect-prevention through Refinement Abstraction     Requirements Declarative (what)
Defect-prevention through Refinement Requirements Architecture Detailed design Implementation Abstraction Declarative (what)
R E
Imperative (how)
F                Architecture I
Concretion R
N E
E F
M E
I N
N              Detailed design T
E M
Concretion                  Imperative (how)
E N
Implementation
T


Refinement: A key preventative technique Development Phase       Constraints on language for each phase Requirements          Domain-specific controlled natural language refinement                                                          Semantically compatible Architecture        Domain-specific architecture modeling language refinement                                                          Semantically compatible Detailed design      Domain-specific design specification language refinement                                                          Semantically compatible Implementation        Domain-specific coding/programming language Reduce defects by reusing composable assets: see IEEE Std 1517; ISO/IEC 26550 family Problem space: Domain modeling Solution space: Domain engineering
Refinement: A key preventative technique Development Phase Requirements Architecture Detailed design Implementation Constraints on language for each phase Domain-specific controlled natural language Domain-specific architecture modeling language Domain-specific design specification language Domain-specific coding/programming language Semantically compatible Semantically compatible Semantically compatible refinement refinement refinement Reduce defects by reusing composable assets: see IEEE Std 1517; ISO/IEC 26550 family Problem space: Domain modeling Solution space: Domain engineering


Approaches and Methods for Improved Requirements Approaches and Methods           Source of Uncertainty Addressed by this Method (i.e.           Additional Explanation Method effectiveness in addressing uncertainty; Benefit Restricted or Constrained or     Improved specification:                                       Supports auto-generation of verification Controlled Natural Language      Reduced ambiguity;                                            conditions (e.g., test cases).
Approaches and Methods for Improved Requirements Approaches and Methods Source of Uncertainty Addressed by this Method (i.e.
Improved consistency and verifiability.                        Example of enabler: Web Ontology Language (OWL).
Method effectiveness in addressing uncertainty; Benefit Additional Explanation Restricted or Constrained or Controlled Natural Language Improved specification:
Behavior Specification           As above.                                                     Also supports auto-generation of verification The finite state machine (FSM) paradigm supports stepwise     conditions (e.g., test cases).
Reduced ambiguity; Improved consistency and verifiability.
refinement and flow-down of behavior specification.
Supports auto-generation of verification conditions (e.g., test cases).
Example of enabler: Web Ontology Language (OWL).
Behavior Specification As above.
The finite state machine (FSM) paradigm supports stepwise refinement and flow-down of behavior specification.
Also supports auto-generation of verification conditions (e.g., test cases).
Facilitates hazard analysis (e.g. STPA)
Facilitates hazard analysis (e.g. STPA)
Domain Specific Specialization   Higher quality of specification and V&V.                       Example: ReqSpec in AADL Enables re-use of pre-verified building blocks, specific to an application domain.
Domain Specific Specialization Higher quality of specification and V&V.
Theorem Proving                 Provides proof that a conclusion can be inferred deductively   Can also be used to identify gaps in the chain from a proven chain of premises.                              of premises.
Enables re-use of pre-verified building blocks, specific to an application domain.
Improved V&V.                                                  Example: Used as a module in SCR.
Example: ReqSpec in AADL Theorem Proving Provides proof that a conclusion can be inferred deductively from a proven chain of premises.
Model-Checking                   Checks that the flow-down is correct and consistent.
Improved V&V.
Review, Walkthrough, and         Fills the gaps in requirements V&V uncovered by analytical Inspection (RWI)                methods.
Can also be used to identify gaps in the chain of premises.
Example: Used as a module in SCR.
Model-Checking Checks that the flow-down is correct and consistent.
Review, Walkthrough, and Inspection (RWI)
Fills the gaps in requirements V&V uncovered by analytical methods.
Sources: IEC 61508, IEC 62279 9
Sources: IEC 61508, IEC 62279 9


Approaches to Address Uncertainties Introduced in Methods used for Requirements Approaches and Methods         Uncertainties Introduced                       Approaches to Address Uncertainties Introduced Restricted Natural Language   Comprehensibility is traded off for             Annotations in the model bridge the gap.
Approaches to Address Uncertainties Introduced in Methods used for Requirements Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Restricted Natural Language Comprehensibility is traded off for Disambiguation.
Disambiguation .
Annotations in the model bridge the gap.
Behavior Specification         Potential for semantic inconsistency across     Constrain the interacting environments to different FSM modeling environments.            eliminate the inconsistencies, specific to the domain of interest.
Behavior Specification Potential for semantic inconsistency across different FSM modeling environments.
Domain Specific Specialization When a new application does not fit in a       RWI by expert team.
Constrain the interacting environments to eliminate the inconsistencies, specific to the domain of interest.
predefined domain, adaptation may degrade model validity.
Domain Specific Specialization When a new application does not fit in a predefined domain, adaptation may degrade model validity.
Theorem Proving                Language transformations required to enable its  Use in combination with domain-specific use may introduce hard-to-find semantic        environments for specification, validation inconsistencies.                                and verification.
RWI by expert team.
RWI by expert team.
Model-Checking                 Model-checking is highly dependent on the       As for theorem proving.
Theorem Proving Language transformations required to enable its use may introduce hard-to-find semantic inconsistencies.
fidelity of the model to reality.
Use in combination with domain-specific environments for specification, validation and verification.
Review, Walkthrough,           Human fallibility.                             Independent RWI by expert team.
RWI by expert team.
Inspection 10
Model-Checking Model-checking is highly dependent on the fidelity of the model to reality.
As for theorem proving.
Review, Walkthrough, Inspection Human fallibility.
Independent RWI by expert team.
10


Approaches and Methods used for Architecture Approaches and                 Source of Uncertainty Addressed     Additional Explanation Methods                        by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Theorem Proving               See table for requirements           See table for requirements Prevention through             Limits the uncertainty space by Modeling Constraints          preventing or detecting incorrect or unverifiable constructs in the models.
Approaches and Methods used for Architecture Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Theorem Proving See table for requirements See table for requirements Prevention through Modeling Constraints Limits the uncertainty space by preventing or detecting incorrect or unverifiable constructs in the models.
Model-Checking                 See table for requirements Review, Walkthrough,           See table for requirements and Inspection (RWI)
Model-Checking See table for requirements Review, Walkthrough, and Inspection (RWI)
Sources: IEC 61508, IEC 62279 11
See table for requirements Sources: IEC 61508, IEC 62279 11


Approaches to Address Uncertainties Introduced in Methods used for Unit Verification 1/2 Approaches and         Uncertainties Introduced         Approaches to Address Uncertainties Methods                                                  Introduced Correct by             Undetected incorrect             Independent V&V (IV&V) of:
Approaches to Address Uncertainties Introduced in Methods used for Unit Verification 1/2 Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Correct by construction Undetected incorrect transformations.
construction            transformations.                  Integrated tool suite, Libraries, Other reusable assets, Development environment(s).
Independent V&V (IV&V) of:
Semantic consistency across interfaces Safe subset of the       Programmer may use features     For safe subset language and the tools programming            outside the safe subset.          enforcing usage within the safe subset:
Integrated tool suite, Libraries, Other reusable assets, Development environment(s).
language                                                  IV&V Pre-certification Configuration control Change control Model-checking         See prior slide.                 See prior slide.
Semantic consistency across interfaces Safe subset of the programming language Programmer may use features outside the safe subset.
Static analysis         Does not discover faults which   Complement with model-checking.
For safe subset language and the tools enforcing usage within the safe subset:
occur only during execution.
IV&V Pre-certification Configuration control Change control Model-checking See prior slide.
See prior slide.
Static analysis Does not discover faults which occur only during execution.
Requires source code.
Requires source code.
Complement with model-checking.
12
12


Schedulability
Schedulability Need to verify that the workload fits within the available resources Workload Computing (esp. in microprocessor-based platforms)
* Need to verify that the workload fits within the available resources
Communication (esp. in serial networks)
* Workload
Typically, cyclic, requiring accurate periodicity Typically, many tasks requiring different amounts of resources Resources (typically, shared across different tasks)
* Computing (esp. in microprocessor-based platforms)
Time (computing; communication)
* Communication (esp. in serial networks)
Space (memory)
* Typically, cyclic, requiring accurate periodicity
In general, high computational complexity Constraints can be applied to reduce the complexity Example: Programmable Logic Controllers (PLCs)
* Typically, many tasks requiring different amounts of resources
For more information, see RIL-1101 Appendix I H. KOPETZ, Simplicity is complex, Springer (2019)
* Resources (typically, shared across different tasks)
* Time (computing; communication)
* Space (memory)
* In general, high computational complexity
* Constraints can be applied to reduce the complexity
* Example: Programmable Logic Controllers (PLCs)
For more information, see
* RIL-1101 Appendix I
* H. KOPETZ, Simplicity is complex, Springer (2019)


Conditions to Reduce Uncertainties Associated with Tools - Examples
Conditions to Reduce Uncertainties Associated with Tools - Examples 14
: 1. The development environment is qualified and certified for the domain of usage.
: 1. The development environment is qualified and certified for the domain of usage.
: 2. The development environment is maintained under configuration management (as a set).
: 2. The development environment is maintained under configuration management (as a set).
Line 128: Line 130:
: 9. V&V tools are qualified by people independent of the developers and users of these tools..
: 9. V&V tools are qualified by people independent of the developers and users of these tools..
: 10. V&V tools are qualified using methods that are independent of the methods implemented by these tools.
: 10. V&V tools are qualified using methods that are independent of the methods implemented by these tools.
14


Abstraction Refinement Disambiguation Domain modeling; domain engineering Compositionality Schedulability For practicable guidance, see R. Hite, et al, SYMPLE: A complexity-aware approach for realizing verifiable FPGA-based digital I&C for safety critical applications https://www.ans.org/pubs/proceedings/article-49775/
Summary: Concepts to support Assurability
Summary: Concepts to support Assurability
* Abstraction
* Refinement
* Disambiguation
* Domain modeling; domain engineering
* Compositionality
* Schedulability For practicable guidance, see R. Hite, et al, SYMPLE: A complexity-aware approach for realizing verifiable FPGA-based digital I&C for safety critical applications https://www.ans.org/pubs/proceedings/article-49775/


Some known limitations
Some known limitations Validating results of hazard analysis Did it really identify all causes that could degrade the safety function?
* Validating results of hazard analysis
Validating assumptions about the environment of the safety system, e.g.:
  - Did it really identify all causes that could degrade the safety function?
Conditions of operation and maintenance Configuration control change impact analysis Qualifying suite of tools from different sources Libraries Underlying languages Infrastructure for independent V&V 16
* Validating assumptions about the environment of the safety system, e.g.:
  - Conditions of operation and maintenance
  - Configuration control change impact analysis
* Qualifying suite of tools from different sources
  - Libraries
  - Underlying languages
* Infrastructure for independent V&V 16


Reasoning Model to support performance-based evaluation (based on the Toulmin model1)
Reasoning Model to support performance-based evaluation 17 Reasoning Assertion Premise / Evidence Influences on validity of proposition Rebuttals Qualifiers (Strength; Condition)
Theoretical or causal model Basis for Inference rule Used in Premise / Evidence                    Reasoning                            Assertion Qualifiers Influences on validity Doubts/Defeaters                                                      (Strength; of proposition Condition)
Inference rule Theoretical or causal model Basis for Used in (based on the Toulmin model1) 1Toulmin, S., The Uses of Argument, Cambridge, UK: Cambridge University Press, 1958 Doubts/Defeaters
Rebuttals 1Toulmin, S., The Uses of Argument, Cambridge, UK: Cambridge University Press, 1958               17


Judgment Decide The safety claim is satisfied unconditionally (i.e., the residual uncertainty has an insignificant effect on the safety claim).
Judgment The safety claim is satisfied unconditionally (i.e., the residual uncertainty has an insignificant effect on the safety claim).
* No one can find any uncontrolled hazard with the potential to degrade the performance of the safety function
* No one can find any uncontrolled hazard with the potential to degrade the performance of the safety function
* No one can find any unmitigated "defeater" The safety claim is not satisfied with the given evidence.
* No one can find any unmitigated "defeater" The safety claim is not satisfied with the given evidence.
Line 159: Line 147:
The defeaters are identified and associated with the respective sub-claims.
The defeaters are identified and associated with the respective sub-claims.
The safety claim does not hold.
The safety claim does not hold.
The state-of-the-art can support consistent judgment
* Fallacies in logic.
* Fallacies in logic.
based on objective, scientific evidence and logical reasoning
* Deficiencies in evidence.
* Deficiencies in evidence.
18
Decide The state-of-the-art can support consistent judgment based on objective, scientific evidence and logical reasoning 18


Economics!
Economics!
Engineering time                                             Run time Preventative                          Reactive Monitor Prevent         Prevent                                             Diverse Verify     Detect hazard        propagation                                          redundancy Intervene Potential to decrease intrinsic cost             Cost increases 19
19 Engineering time Run time Monitor Detect Intervene Diverse redundancy Prevent hazard Prevent propagation Verify Reactive Preventative Cost increases Potential to decrease intrinsic cost


Acronyms & Abbreviations 1/2 AADL   Architecture Analysis and Design Language CCF   Common cause failure Dev   Development Engrg Engineering DI&C   Digital Instrumentation and Control EPRI   Electrical Power Research Institute esp. Especially FSM   Finite state machine HAp   Hazard analysis of plans HAr   Hazard analysis of requirements HAa   Hazard analysis of architecture HAdd   Hazard analysis of detailed design HAi   Hazard analysis of implementation HAt   Hazard analysis of testing (including test specifications and oracles)
Acronyms & Abbreviations 1/2 AADL Architecture Analysis and Design Language CCF Common cause failure Dev Development Engrg Engineering DI&C Digital Instrumentation and Control EPRI Electrical Power Research Institute esp.
IAEA   International Atomic Energy Agency I&C   Instrumentation and Control IEC   International Electrotechnical Commission IEEE   Institute of Electrical and Electronics Engineers ISO   International Standards Organization IV&V   Independent Verification and Validation NPP   Nuclear Power Plant NRC   U.S. Nuclear Regulatory Commission OWL   Web Ontology Language RIL   Research Information Letter RPS   Reactor Protection System RWI   Review, Walkthrough, and Inspection 20
Especially FSM Finite state machine HAp Hazard analysis of plans HAr Hazard analysis of requirements HAa Hazard analysis of architecture HAdd Hazard analysis of detailed design HAi Hazard analysis of implementation HAt Hazard analysis of testing (including test specifications and oracles)
IAEA International Atomic Energy Agency I&C Instrumentation and Control IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers ISO International Standards Organization IV&V Independent Verification and Validation NPP Nuclear Power Plant NRC U.S. Nuclear Regulatory Commission OWL Web Ontology Language RIL Research Information Letter RPS Reactor Protection System RWI Review, Walkthrough, and Inspection 20


Acronyms & Abbreviations 2/2 R&D     Research and Development Reqmts Requirements RIL     Research Information Letter RPS     Reactor Protection System SCR     Software Cost Reduction (set of techniques for designing software systems) spec   specification SQuaRE Systems and Software Quality Requirements and Evaluation STPA   System Theoretic Process Analysis (method of hazard analysis)
R&D Research and Development Reqmts Requirements RIL Research Information Letter RPS Reactor Protection System SCR Software Cost Reduction (set of techniques for designing software systems) spec specification SQuaRE Systems and Software Quality Requirements and Evaluation STPA System Theoretic Process Analysis (method of hazard analysis)
Std     Standard V&V     Verification and Validation Vp     V&V of plans Vr     V&V of requirements Va     V&V of architecture Vdd     V&V of detailed design Vi     V&V of implementation Vt     V&V of testing (including test specifications and oracles) 21
Std Standard V&V Verification and Validation Vp V&V of plans Vr V&V of requirements Va V&V of architecture Vdd V&V of detailed design Vi V&V of implementation Vt V&V of testing (including test specifications and oracles) 21 Acronyms & Abbreviations 2/2


Discussion Supporting slides
Discussion Supporting slides
Line 186: Line 173:
4.Degrees or levels of safety qualification.
4.Degrees or levels of safety qualification.
5.Monitoring & monitored elements of the system.
5.Monitoring & monitored elements of the system.
6.Shared resources, e.g., equipment for monitoring or servicing.
6.Shared resources, e.g., equipment for monitoring or servicing.  
*Analysis of the system demonstrates that unintended behavior is not possible.
*Analysis of the system demonstrates that unintended behavior is not possible.
*Interaction across different sources of uncertainty is avoided.
*Interaction across different sources of uncertainty is avoided.
Line 193: Line 180:
State-of-the-art methods enable satisfaction of these conditions 23
State-of-the-art methods enable satisfaction of these conditions 23


Examples: Identifying hazards controlling conditions RIL-1101 Table 1: Considerations in broadly evaluating hazard analysis Contributory hazards                 Conditions that reduce the hazard space ID       Description                ID    Description H-n-                                 H-0-i mm H-0-6 Hazard controls needed to      -6G1 Hazard controls are identified and validated to be correct, complete, and consistent.
Examples: Identifying hazards controlling conditions RIL-1101 Table 1: Considerations in broadly evaluating hazard analysis Contributory hazards Conditions that reduce the hazard space ID H-n-mm Description ID H-0-i Description
satisfy system constraints        [H-0-7G1]
(which prevent hazards) are inadequate.
-7      Flow-down  to verifiable  -7G1 Requirements and constraints [H-0-6G1] are formulated and validated to be correct, requirements and                  complete, consistent constraints is inadequate
-11      Required control action is  -    Each required control action is analyzed for ways in which it can lead to a hazard, e.g.
degraded.                  11G1 1.    ~ not provided when needed
: 2.  ~ provided when not needed
: 3.  ~ provided at incorrect time
: 4.  ~ provided too long
: 5.  ~ provided too short
: 6.  ~ is intermittent
: 7.  ~ interferes with another ..
: 8.  ~ exhibits Byzantine behavior
: 9. Incorrect state transition occurs
: 10. Incorrect input value Sources: RIL-1101; RIL-1002 24


Examples: Controlling causes of hazards from complexity Contributory hazards               Conditions that reduce the hazard space ID                                  ID H-S- Description                   H-S-   Description 1     The system is not             1G1     Verifiability required property, flowing down system most finely grained constituents.
H-0-6 Hazard controls needed to satisfy system constraints (which prevent hazards) are inadequate.
sufficiently verifiable and  1G1.1   Verifiability checked at every phase, at every level of integration, before next phase.
-6G1 Hazard controls are identified and validated to be correct, complete, and consistent.
understandable ...
[H-0-7G1]
1.1G1.1 The behavior is unambiguously specified (incl. unexpected inputs) at every level of integration.
-7 Flow-down to verifiable requirements and constraints is inadequate
considerations and criteria are not formulated at the    1.1G1.2 The flow-down (from composition to decomposition) ensures that:
-7G1 Requirements and constraints [H-0-6G1] are formulated and validated to be correct, complete, consistent
beginning of the                      1. Allocated behaviors satisfy the behavior specified at the next higher level.
-11 Required control action is degraded.
development lifecycle;                2. Unspecified behavior does not occur.
11G1 Each required control action is analyzed for ways in which it can lead to a hazard, e.g.
1.1G1.3 System behavior composed of element behaviors such that when all elements verified therefore, corresponding individually, their compositions may also be considered verified; no unspecified behavior emerges.
1.
architectural constraints are not formalized and checked . 1.1G1.4 Development follows a refinement process.
~ not provided when needed 2.
1.1.1 Unanalyzed/unanalyzable       1.1.1G1 Static analyzability: System is statically analyzable.
~ provided when not needed 3.
conditions exist, e.g.                1. All states, including fault conditions, are known.
~ provided at incorrect time 4.
unknown/unwanted system              2. All fault states that lead to failure modes are known.
~ provided too long 5.
states.                              3. The safe-state space of the system is known.
~ provided too short 6.
1.2 1.3 2     Comprehensibility: System     2G1     Behavior is completely and explicitly specified.
~ is intermittent 7.
behavior not interpreted      2G3     Behavior is understood or interpreted completely, correctly, consistently, and unambiguously correctly/consistently by its 2G6     The architecture is specified such that it is unambiguously interpretable by the community of its users [H-S-1].                      users (e.g., reviewers, architects, designers, implementers), that is, the people and the tools they use.
~ interferes with another..
8.
~ exhibits Byzantine behavior 9.
Incorrect state transition occurs 10.
Incorrect input value Sources: RIL-1101; RIL-1002 24
 
Examples: Controlling causes of hazards from complexity Contributory hazards Conditions that reduce the hazard space ID H-S-Description ID H-S-Description 1
The system is not sufficiently verifiable and understandable...
considerations and criteria are not formulated at the beginning of the development lifecycle; therefore, corresponding architectural constraints are not formalized and checked.
1G1 Verifiability required property, flowing down system most finely grained constituents.
1G1.1 Verifiability checked at every phase, at every level of integration, before next phase.
1.1G1.1 The behavior is unambiguously specified (incl. unexpected inputs) at every level of integration.
1.1G1.2 The flow-down (from composition to decomposition) ensures that:
1.
Allocated behaviors satisfy the behavior specified at the next higher level.
2.
Unspecified behavior does not occur.
1.1G1.3 System behavior composed of element behaviors such that when all elements verified individually, their compositions may also be considered verified; no unspecified behavior emerges.
1.1G1.4 Development follows a refinement process.
1.1.1 Unanalyzed/unanalyzable conditions exist, e.g.
unknown/unwanted system states.
1.1.1G1 Static analyzability: System is statically analyzable.
1.
All states, including fault conditions, are known.
2.
All fault states that lead to failure modes are known.
3.
The safe-state space of the system is known.
1.2
 
1.3
 
2 Comprehensibility: System behavior not interpreted correctly/consistently by its users [H-S-1].
2G1 Behavior is completely and explicitly specified.
2G3 Behavior is understood or interpreted completely, correctly, consistently, and unambiguously 2G6 The architecture is specified such that it is unambiguously interpretable by the community of its users (e.g., reviewers, architects, designers, implementers), that is, the people and the tools they use.
Source: RIL-1101 25
Source: RIL-1101 25


Examples: Controlling causes of hazards from interference Contributory hazards                   Conditions that reduce the hazard space ID H-SA                                   ID
Examples: Controlling causes of hazards from interference Contributory hazards Conditions that reduce the hazard space ID H-SA Description ID H-SA-Description 3
-    Description                      H-SA- Description 3     A system, device, or other       3G2     Interactions and interconnections that preclude complete V&V are avoided, eliminated, or element (external or internal to        prevented.
A system, device, or other element (external or internal to a safety system) might affect a safety function adversely through unintended interactions caused by some combination of deficiencies, disorders, malfunctions, or oversights.
a safety system) might affect a  3G3     Freedom from interference is assured provably across:
3G2 Interactions and interconnections that preclude complete V&V are avoided, eliminated, or prevented.
safety function adversely                1. Lines of defense.
3G3 Freedom from interference is assured provably across:
through unintended                      2. Redundant divisions of system.
1.
interactions caused by some              3. Degrees of safety qualification.
Lines of defense.
combination of deficiencies,            4. Monitoring & monitored elements of the system.
2.
disorders, malfunctions, or      3G4     Analysis of the system demonstrates that unintended behavior is not possible.
Redundant divisions of system.
oversights.                              1. Interaction across different sources of uncertainty is avoided.
3.
: 2. The architecture precludes unwanted interactions, unwanted or hidden couplings.
Degrees of safety qualification.
: 3. Specified information exchanges or communications occur in safe ways.
4.
3G6     Constraints are identified for such contributing hazards from the environment as EMI; 3G7     The impact of dependency-affecting change is analyzed to demonstrate no adverse effect.
Monitoring & monitored elements of the system.
4     [H-SA-3G4]: A function,         4G1     Analysis of the execution-behavior of the system proves that such interference will not occur. For whose execution is required at          example, worst-case execution time is guaranteed.
3G4 Analysis of the system demonstrates that unintended behavior is not possible.
a particular time, cannot be performed as required because of interference through sharing of some resource it needs.
1.
5     Timing constraints are not       5G1 correctly specified and not correctly allocated.
Interaction across different sources of uncertainty is avoided.
2.
The architecture precludes unwanted interactions, unwanted or hidden couplings.
3.
Specified information exchanges or communications occur in safe ways.
3G6 Constraints are identified for such contributing hazards from the environment as EMI; 3G7 The impact of dependency-affecting change is analyzed to demonstrate no adverse effect.
4
[H-SA-3G4]: A function, whose execution is required at a particular time, cannot be performed as required because of interference through sharing of some resource it needs.
4G1 Analysis of the execution-behavior of the system proves that such interference will not occur. For example, worst-case execution time is guaranteed.
5 Timing constraints are not correctly specified and not correctly allocated.
5G1
 
Source: RIL-1101 26
Source: RIL-1101 26


Approaches and Methods used for Unit Verification 2/2 Approaches and                   Source of Uncertainty Addressed   Additional Explanation Methods                          by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Black box testing                 Used when information internal to Automation & combinatorial the unit is not available.         testing have extended the test coverage, but coverage of the fault space is not assured to be complete.
Approaches and Methods used for Unit Verification 2/2 Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Black box testing Used when information internal to the unit is not available.
White box testing                 Enables coverage of fault space   IV&V agent requires access when best-practice V&V methods     to unit-internal information.
Automation & combinatorial testing have extended the test coverage, but coverage of the fault space is not assured to be complete.
have been used in preceding phases of development.
White box testing Enables coverage of fault space when best-practice V&V methods have been used in preceding phases of development.
Review, Walkthrough               Fills the gaps in V&V uncovered and Inspection (RWI)            by analytical methods.
IV&V agent requires access to unit-internal information.
Review, Walkthrough and Inspection (RWI)
Fills the gaps in V&V uncovered by analytical methods.
Sources: IEC 61508, IEC 62279, etc.
Sources: IEC 61508, IEC 62279, etc.
27}}
27}}

Latest revision as of 08:54, 25 November 2024

State-of-the-art Approaches to Reduce the Potential for CCF in I&C Systems Conditions to Avoid the Need for Diversity in Design
ML23256A010
Person / Time
Issue date: 09/18/2023
From: Sushil Birla
NRC/RES/DE
To:
Sushil Birla 301-415-2311
References
Download: ML23256A010 (27)


Text

State-of-the-art approaches to reduce the potential for CCF in I&C systems Conditions to avoid the need for diversity in design Sushil Birla Senior Technical Advisor U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research The views expressed herein are those of the author and do not represent an official position of the U.S. NRC.

IAEA Workshop on Assessment and Reduction of Vulnerabilities to Common Cause Failures in Instrumentation and Control Systems in Nuclear Power Plants 18-22 September 2023

Terminology & Scope To assimilate knowledge from outside the NPP industry and avoid ambiguity Sources of definitions are broader than NPP-specific standards.

Context in focus: Operating power reactor protection systems.

Focus: Hazards from (systemic) common causes:

Rooted in engineering deficiencies That may degrade the redundancy and defense-in-depth characteristics Hazard: potential for harm through the degradation of a safety function allocated to the object under analysis Examples of sources: ISO/IEC/IEEE 24765; ISO/IEC Systems & Software SQuaRE series; ISO/IEC 15026 2

Meaning of state-of-the-art in this presentation State-of-the-art State-of-the-practice Current practice Capability demonstrated in leading-edge implementations; not yet scaled up Best-in-class; best practices, e.g.,

as seen in industry consensus standards As seen in many organizations 3

Reference Framework for Assurance Plans Concept Requirements Architecture Detailed design Implementation Testing Verification Validation (V&V)

Vp System Development HAp Requirements from NPP Safety Analysis HAc HAr HAr HAdd HAi HAi Vc Vr Va Vdd Vi Vt Safety Engineering Reference model from IEEE Std 1012 4

Identifying elements to reduce the uncertainty space Conditions to reduce associated uncertainties

  • = all phases Conditions on methods and tools Logical integration of all the evidence Reducing inconsistencies in judgment

+

+

+

5 Deficiencies in:

Hazard identification*

Requirements specification Architectural specification Detailed design specification Implementation (coding)

Verification

Prevention Mitigation Fault tolerance in Design Q

u a

l i

t y

o f

D e

s i

g n

Desired state Current state Changes needed to prevent CCF Objective evaluation criteria Paradigm State of practice Competence Culture

Defect-prevention through Refinement Requirements Architecture Detailed design Implementation Abstraction Declarative (what)

Imperative (how)

Concretion R

E F

I N

E M

E N

T

Refinement: A key preventative technique Development Phase Requirements Architecture Detailed design Implementation Constraints on language for each phase Domain-specific controlled natural language Domain-specific architecture modeling language Domain-specific design specification language Domain-specific coding/programming language Semantically compatible Semantically compatible Semantically compatible refinement refinement refinement Reduce defects by reusing composable assets: see IEEE Std 1517; ISO/IEC 26550 family Problem space: Domain modeling Solution space: Domain engineering

Approaches and Methods for Improved Requirements Approaches and Methods Source of Uncertainty Addressed by this Method (i.e.

Method effectiveness in addressing uncertainty; Benefit Additional Explanation Restricted or Constrained or Controlled Natural Language Improved specification:

Reduced ambiguity; Improved consistency and verifiability.

Supports auto-generation of verification conditions (e.g., test cases).

Example of enabler: Web Ontology Language (OWL).

Behavior Specification As above.

The finite state machine (FSM) paradigm supports stepwise refinement and flow-down of behavior specification.

Also supports auto-generation of verification conditions (e.g., test cases).

Facilitates hazard analysis (e.g. STPA)

Domain Specific Specialization Higher quality of specification and V&V.

Enables re-use of pre-verified building blocks, specific to an application domain.

Example: ReqSpec in AADL Theorem Proving Provides proof that a conclusion can be inferred deductively from a proven chain of premises.

Improved V&V.

Can also be used to identify gaps in the chain of premises.

Example: Used as a module in SCR.

Model-Checking Checks that the flow-down is correct and consistent.

Review, Walkthrough, and Inspection (RWI)

Fills the gaps in requirements V&V uncovered by analytical methods.

Sources: IEC 61508, IEC 62279 9

Approaches to Address Uncertainties Introduced in Methods used for Requirements Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Restricted Natural Language Comprehensibility is traded off for Disambiguation.

Annotations in the model bridge the gap.

Behavior Specification Potential for semantic inconsistency across different FSM modeling environments.

Constrain the interacting environments to eliminate the inconsistencies, specific to the domain of interest.

Domain Specific Specialization When a new application does not fit in a predefined domain, adaptation may degrade model validity.

RWI by expert team.

Theorem Proving Language transformations required to enable its use may introduce hard-to-find semantic inconsistencies.

Use in combination with domain-specific environments for specification, validation and verification.

RWI by expert team.

Model-Checking Model-checking is highly dependent on the fidelity of the model to reality.

As for theorem proving.

Review, Walkthrough, Inspection Human fallibility.

Independent RWI by expert team.

10

Approaches and Methods used for Architecture Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Theorem Proving See table for requirements See table for requirements Prevention through Modeling Constraints Limits the uncertainty space by preventing or detecting incorrect or unverifiable constructs in the models.

Model-Checking See table for requirements Review, Walkthrough, and Inspection (RWI)

See table for requirements Sources: IEC 61508, IEC 62279 11

Approaches to Address Uncertainties Introduced in Methods used for Unit Verification 1/2 Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Correct by construction Undetected incorrect transformations.

Independent V&V (IV&V) of:

Integrated tool suite, Libraries, Other reusable assets, Development environment(s).

Semantic consistency across interfaces Safe subset of the programming language Programmer may use features outside the safe subset.

For safe subset language and the tools enforcing usage within the safe subset:

IV&V Pre-certification Configuration control Change control Model-checking See prior slide.

See prior slide.

Static analysis Does not discover faults which occur only during execution.

Requires source code.

Complement with model-checking.

12

Schedulability Need to verify that the workload fits within the available resources Workload Computing (esp. in microprocessor-based platforms)

Communication (esp. in serial networks)

Typically, cyclic, requiring accurate periodicity Typically, many tasks requiring different amounts of resources Resources (typically, shared across different tasks)

Time (computing; communication)

Space (memory)

In general, high computational complexity Constraints can be applied to reduce the complexity Example: Programmable Logic Controllers (PLCs)

For more information, see RIL-1101 Appendix I H. KOPETZ, Simplicity is complex, Springer (2019)

Conditions to Reduce Uncertainties Associated with Tools - Examples 14

1. The development environment is qualified and certified for the domain of usage.
2. The development environment is maintained under configuration management (as a set).
3. Restrictions for safe use of a tool are identified and enforced.
4. Semantics are preserved in information exchanged across tools used in system development.
5. The architectural description method is unambiguous.
6. Methods, and languages used to describe, represent, or specify architectures support unambiguous transformation across development phases and dissimilar elements from different sources.
7. Automation used for the creation of a work product is independent of automation used for the V&V of that work product.
8. The developers of a work product are different from those performing its V&V.
9. V&V tools are qualified by people independent of the developers and users of these tools..
10. V&V tools are qualified using methods that are independent of the methods implemented by these tools.

Abstraction Refinement Disambiguation Domain modeling; domain engineering Compositionality Schedulability For practicable guidance, see R. Hite, et al, SYMPLE: A complexity-aware approach for realizing verifiable FPGA-based digital I&C for safety critical applications https://www.ans.org/pubs/proceedings/article-49775/

Summary: Concepts to support Assurability

Some known limitations Validating results of hazard analysis Did it really identify all causes that could degrade the safety function?

Validating assumptions about the environment of the safety system, e.g.:

Conditions of operation and maintenance Configuration control change impact analysis Qualifying suite of tools from different sources Libraries Underlying languages Infrastructure for independent V&V 16

Reasoning Model to support performance-based evaluation 17 Reasoning Assertion Premise / Evidence Influences on validity of proposition Rebuttals Qualifiers (Strength; Condition)

Inference rule Theoretical or causal model Basis for Used in (based on the Toulmin model1) 1Toulmin, S., The Uses of Argument, Cambridge, UK: Cambridge University Press, 1958 Doubts/Defeaters

Judgment The safety claim is satisfied unconditionally (i.e., the residual uncertainty has an insignificant effect on the safety claim).

  • No one can find any uncontrolled hazard with the potential to degrade the performance of the safety function
  • No one can find any unmitigated "defeater" The safety claim is not satisfied with the given evidence.

The residual uncertainty is so great that the safety claim cannot be supported.

The defeaters are identified and associated with the respective sub-claims.

The safety claim does not hold.

  • Fallacies in logic.
  • Deficiencies in evidence.

Decide The state-of-the-art can support consistent judgment based on objective, scientific evidence and logical reasoning 18

Economics!

19 Engineering time Run time Monitor Detect Intervene Diverse redundancy Prevent hazard Prevent propagation Verify Reactive Preventative Cost increases Potential to decrease intrinsic cost

Acronyms & Abbreviations 1/2 AADL Architecture Analysis and Design Language CCF Common cause failure Dev Development Engrg Engineering DI&C Digital Instrumentation and Control EPRI Electrical Power Research Institute esp.

Especially FSM Finite state machine HAp Hazard analysis of plans HAr Hazard analysis of requirements HAa Hazard analysis of architecture HAdd Hazard analysis of detailed design HAi Hazard analysis of implementation HAt Hazard analysis of testing (including test specifications and oracles)

IAEA International Atomic Energy Agency I&C Instrumentation and Control IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers ISO International Standards Organization IV&V Independent Verification and Validation NPP Nuclear Power Plant NRC U.S. Nuclear Regulatory Commission OWL Web Ontology Language RIL Research Information Letter RPS Reactor Protection System RWI Review, Walkthrough, and Inspection 20

R&D Research and Development Reqmts Requirements RIL Research Information Letter RPS Reactor Protection System SCR Software Cost Reduction (set of techniques for designing software systems) spec specification SQuaRE Systems and Software Quality Requirements and Evaluation STPA System Theoretic Process Analysis (method of hazard analysis)

Std Standard V&V Verification and Validation Vp V&V of plans Vr V&V of requirements Va V&V of architecture Vdd V&V of detailed design Vi V&V of implementation Vt V&V of testing (including test specifications and oracles) 21 Acronyms & Abbreviations 2/2

Discussion Supporting slides

Examples: Constraints on Architecture to prevent interference

[ID#] Unintended interactions between a system, device or other element (internal or external to a safety system) that cause adverse effects on a safety function are avoided (Controls H-SA-3 in Table ).

  • Interactions are limited provably to those required for the safety functions.
  • Interactions and interconnections that cannot be completely verified are avoided, eliminated, or prevented.
  • Freedom from interference (including fault propagation) is assured provably across:

1.Lines of defense or protection barriers.

2.Redundant divisions of the DI&C system.

3.Elements intended to be diverse.

4.Degrees or levels of safety qualification.

5.Monitoring & monitored elements of the system.

6.Shared resources, e.g., equipment for monitoring or servicing.

  • Analysis of the system demonstrates that unintended behavior is not possible.
  • Interaction across different sources of uncertainty is avoided.
  • The architecture precludes unwanted interactions and unwanted or unknown hidden couplings or dependencies.
  • Specified information exchanges or communications occur in safe ways.

State-of-the-art methods enable satisfaction of these conditions 23

Examples: Identifying hazards controlling conditions RIL-1101 Table 1: Considerations in broadly evaluating hazard analysis Contributory hazards Conditions that reduce the hazard space ID H-n-mm Description ID H-0-i Description

H-0-6 Hazard controls needed to satisfy system constraints (which prevent hazards) are inadequate.

-6G1 Hazard controls are identified and validated to be correct, complete, and consistent.

[H-0-7G1]

-7 Flow-down to verifiable requirements and constraints is inadequate

-7G1 Requirements and constraints [H-0-6G1] are formulated and validated to be correct, complete, consistent

-11 Required control action is degraded.

11G1 Each required control action is analyzed for ways in which it can lead to a hazard, e.g.

1.

~ not provided when needed 2.

~ provided when not needed 3.

~ provided at incorrect time 4.

~ provided too long 5.

~ provided too short 6.

~ is intermittent 7.

~ interferes with another..

8.

~ exhibits Byzantine behavior 9.

Incorrect state transition occurs 10.

Incorrect input value Sources: RIL-1101; RIL-1002 24

Examples: Controlling causes of hazards from complexity Contributory hazards Conditions that reduce the hazard space ID H-S-Description ID H-S-Description 1

The system is not sufficiently verifiable and understandable...

considerations and criteria are not formulated at the beginning of the development lifecycle; therefore, corresponding architectural constraints are not formalized and checked.

1G1 Verifiability required property, flowing down system most finely grained constituents.

1G1.1 Verifiability checked at every phase, at every level of integration, before next phase.

1.1G1.1 The behavior is unambiguously specified (incl. unexpected inputs) at every level of integration.

1.1G1.2 The flow-down (from composition to decomposition) ensures that:

1.

Allocated behaviors satisfy the behavior specified at the next higher level.

2.

Unspecified behavior does not occur.

1.1G1.3 System behavior composed of element behaviors such that when all elements verified individually, their compositions may also be considered verified; no unspecified behavior emerges.

1.1G1.4 Development follows a refinement process.

1.1.1 Unanalyzed/unanalyzable conditions exist, e.g.

unknown/unwanted system states.

1.1.1G1 Static analyzability: System is statically analyzable.

1.

All states, including fault conditions, are known.

2.

All fault states that lead to failure modes are known.

3.

The safe-state space of the system is known.

1.2

1.3

2 Comprehensibility: System behavior not interpreted correctly/consistently by its users [H-S-1].

2G1 Behavior is completely and explicitly specified.

2G3 Behavior is understood or interpreted completely, correctly, consistently, and unambiguously 2G6 The architecture is specified such that it is unambiguously interpretable by the community of its users (e.g., reviewers, architects, designers, implementers), that is, the people and the tools they use.

Source: RIL-1101 25

Examples: Controlling causes of hazards from interference Contributory hazards Conditions that reduce the hazard space ID H-SA Description ID H-SA-Description 3

A system, device, or other element (external or internal to a safety system) might affect a safety function adversely through unintended interactions caused by some combination of deficiencies, disorders, malfunctions, or oversights.

3G2 Interactions and interconnections that preclude complete V&V are avoided, eliminated, or prevented.

3G3 Freedom from interference is assured provably across:

1.

Lines of defense.

2.

Redundant divisions of system.

3.

Degrees of safety qualification.

4.

Monitoring & monitored elements of the system.

3G4 Analysis of the system demonstrates that unintended behavior is not possible.

1.

Interaction across different sources of uncertainty is avoided.

2.

The architecture precludes unwanted interactions, unwanted or hidden couplings.

3.

Specified information exchanges or communications occur in safe ways.

3G6 Constraints are identified for such contributing hazards from the environment as EMI; 3G7 The impact of dependency-affecting change is analyzed to demonstrate no adverse effect.

4

[H-SA-3G4]: A function, whose execution is required at a particular time, cannot be performed as required because of interference through sharing of some resource it needs.

4G1 Analysis of the execution-behavior of the system proves that such interference will not occur. For example, worst-case execution time is guaranteed.

5 Timing constraints are not correctly specified and not correctly allocated.

5G1

Source: RIL-1101 26

Approaches and Methods used for Unit Verification 2/2 Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Black box testing Used when information internal to the unit is not available.

Automation & combinatorial testing have extended the test coverage, but coverage of the fault space is not assured to be complete.

White box testing Enables coverage of fault space when best-practice V&V methods have been used in preceding phases of development.

IV&V agent requires access to unit-internal information.

Review, Walkthrough and Inspection (RWI)

Fills the gaps in V&V uncovered by analytical methods.

Sources: IEC 61508, IEC 62279, etc.

27