ML23256A010
ML23256A010 | |
Person / Time | |
---|---|
Issue date: | 09/18/2023 |
From: | Sushil Birla NRC/RES/DE |
To: | |
Sushil Birla 301-415-2311 | |
References | |
Download: ML23256A010 (27) | |
Text
IAEA Workshop on Assessment and Reduction of Vulnerabilities to Common Cause Failures in Instrumentation and Control Systems in Nuclear Power Plants 18-22 September 2023 State-of-the-art approaches to reduce the potential for CCF in I&C systems Conditions to avoid the need for diversity in design Sushil Birla Senior Technical Advisor U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research The views expressed herein are those of the author and do not represent an official position of the U.S. NRC.
Terminology & Scope To assimilate knowledge from outside the NPP industry and avoid ambiguity
- Sources of definitions are broader than NPP-specific standards.
- Context in focus: Operating power reactor protection systems.
- Focus: Hazards from (systemic) common causes:
Rooted in engineering deficiencies That may degrade the redundancy and defense-in-depth characteristics Hazard: potential for harm through the degradation of a safety function allocated to the object under analysis Examples of sources: ISO/IEC/IEEE 24765; ISO/IEC Systems & Software SQuaRE series; ISO/IEC 15026 2
Meaning of state-of-the-art in this presentation State-of-the-art Capability demonstrated in leading-edge implementations; not yet scaled up State-of-the-practice Best-in-class; best practices, e.g.,
as seen in industry consensus standards Current practice As seen in many organizations 3
Reference Framework for Assurance Verification Validation (V&V)
Vp Vc Vr Va Vdd Vi Vt Requirements from NPP Safety Analysis System Development Detailed Plans Concept Requirements Architecture Implementation Testing design HAp HAc HAr HAr HAdd HAi HAi Safety Engineering Reference model from IEEE Std 1012 4
Identifying elements to reduce the uncertainty space Deficiencies in:
Hazard identification*
Requirements specification Architectural specification Conditions to reduce associated uncertainties Detailed design specification Implementation (coding)
Verification +
Conditions on methods and tools
- = all phases
+
Logical integration of all the evidence
+
Reducing inconsistencies in judgment 5
Prevention Mitigation Q
u a Changes needed to prevent CCF l Desired
- Objective evaluation criteria i state
- Paradigm t
y
- State of practice
- Competence o
- Culture f
D Current e state s
i g
n Fault tolerance in Design
Defect-prevention through Refinement Abstraction Requirements Declarative (what)
R E
F Architecture I
N E
M E
N Detailed design T
Concretion Imperative (how)
Implementation
Refinement: A key preventative technique Development Phase Constraints on language for each phase Requirements Domain-specific controlled natural language refinement Semantically compatible Architecture Domain-specific architecture modeling language refinement Semantically compatible Detailed design Domain-specific design specification language refinement Semantically compatible Implementation Domain-specific coding/programming language Reduce defects by reusing composable assets: see IEEE Std 1517; ISO/IEC 26550 family Problem space: Domain modeling Solution space: Domain engineering
Approaches and Methods for Improved Requirements Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Additional Explanation Method effectiveness in addressing uncertainty; Benefit Restricted or Constrained or Improved specification: Supports auto-generation of verification Controlled Natural Language Reduced ambiguity; conditions (e.g., test cases).
Improved consistency and verifiability. Example of enabler: Web Ontology Language (OWL).
Behavior Specification As above. Also supports auto-generation of verification The finite state machine (FSM) paradigm supports stepwise conditions (e.g., test cases).
refinement and flow-down of behavior specification.
Facilitates hazard analysis (e.g. STPA)
Domain Specific Specialization Higher quality of specification and V&V. Example: ReqSpec in AADL Enables re-use of pre-verified building blocks, specific to an application domain.
Theorem Proving Provides proof that a conclusion can be inferred deductively Can also be used to identify gaps in the chain from a proven chain of premises. of premises.
Improved V&V. Example: Used as a module in SCR.
Model-Checking Checks that the flow-down is correct and consistent.
Review, Walkthrough, and Fills the gaps in requirements V&V uncovered by analytical Inspection (RWI) methods.
Sources: IEC 61508, IEC 62279 9
Approaches to Address Uncertainties Introduced in Methods used for Requirements Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Restricted Natural Language Comprehensibility is traded off for Annotations in the model bridge the gap.
Disambiguation .
Behavior Specification Potential for semantic inconsistency across Constrain the interacting environments to different FSM modeling environments. eliminate the inconsistencies, specific to the domain of interest.
Domain Specific Specialization When a new application does not fit in a RWI by expert team.
predefined domain, adaptation may degrade model validity.
Theorem Proving Language transformations required to enable its Use in combination with domain-specific use may introduce hard-to-find semantic environments for specification, validation inconsistencies. and verification.
RWI by expert team.
Model-Checking Model-checking is highly dependent on the As for theorem proving.
fidelity of the model to reality.
Review, Walkthrough, Human fallibility. Independent RWI by expert team.
Inspection 10
Approaches and Methods used for Architecture Approaches and Source of Uncertainty Addressed Additional Explanation Methods by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Theorem Proving See table for requirements See table for requirements Prevention through Limits the uncertainty space by Modeling Constraints preventing or detecting incorrect or unverifiable constructs in the models.
Model-Checking See table for requirements Review, Walkthrough, See table for requirements and Inspection (RWI)
Sources: IEC 61508, IEC 62279 11
Approaches to Address Uncertainties Introduced in Methods used for Unit Verification 1/2 Approaches and Uncertainties Introduced Approaches to Address Uncertainties Methods Introduced Correct by Undetected incorrect Independent V&V (IV&V) of:
construction transformations. Integrated tool suite, Libraries, Other reusable assets, Development environment(s).
Semantic consistency across interfaces Safe subset of the Programmer may use features For safe subset language and the tools programming outside the safe subset. enforcing usage within the safe subset:
language IV&V Pre-certification Configuration control Change control Model-checking See prior slide. See prior slide.
Static analysis Does not discover faults which Complement with model-checking.
occur only during execution.
Requires source code.
12
Schedulability
- Need to verify that the workload fits within the available resources
- Workload
- Computing (esp. in microprocessor-based platforms)
- Communication (esp. in serial networks)
- Typically, cyclic, requiring accurate periodicity
- Typically, many tasks requiring different amounts of resources
- Resources (typically, shared across different tasks)
- Time (computing; communication)
- Space (memory)
- In general, high computational complexity
- Constraints can be applied to reduce the complexity
- Example: Programmable Logic Controllers (PLCs)
For more information, see
- RIL-1101 Appendix I
- H. KOPETZ, Simplicity is complex, Springer (2019)
Conditions to Reduce Uncertainties Associated with Tools - Examples
- 1. The development environment is qualified and certified for the domain of usage.
- 2. The development environment is maintained under configuration management (as a set).
- 3. Restrictions for safe use of a tool are identified and enforced.
- 4. Semantics are preserved in information exchanged across tools used in system development.
- 5. The architectural description method is unambiguous.
- 6. Methods, and languages used to describe, represent, or specify architectures support unambiguous transformation across development phases and dissimilar elements from different sources.
- 7. Automation used for the creation of a work product is independent of automation used for the V&V of that work product.
- 8. The developers of a work product are different from those performing its V&V.
- 9. V&V tools are qualified by people independent of the developers and users of these tools..
- 10. V&V tools are qualified using methods that are independent of the methods implemented by these tools.
14
Summary: Concepts to support Assurability
- Abstraction
- Refinement
- Disambiguation
- Domain modeling; domain engineering
- Compositionality
- Schedulability For practicable guidance, see R. Hite, et al, SYMPLE: A complexity-aware approach for realizing verifiable FPGA-based digital I&C for safety critical applications https://www.ans.org/pubs/proceedings/article-49775/
Some known limitations
- Validating results of hazard analysis
- Did it really identify all causes that could degrade the safety function?
- Validating assumptions about the environment of the safety system, e.g.:
- Conditions of operation and maintenance
- Configuration control change impact analysis
- Qualifying suite of tools from different sources
- Libraries
- Underlying languages
- Infrastructure for independent V&V 16
Reasoning Model to support performance-based evaluation (based on the Toulmin model1)
Theoretical or causal model Basis for Inference rule Used in Premise / Evidence Reasoning Assertion Qualifiers Influences on validity Doubts/Defeaters (Strength; of proposition Condition)
Rebuttals 1Toulmin, S., The Uses of Argument, Cambridge, UK: Cambridge University Press, 1958 17
Judgment Decide The safety claim is satisfied unconditionally (i.e., the residual uncertainty has an insignificant effect on the safety claim).
- No one can find any uncontrolled hazard with the potential to degrade the performance of the safety function
- No one can find any unmitigated "defeater" The safety claim is not satisfied with the given evidence.
The residual uncertainty is so great that the safety claim cannot be supported.
The defeaters are identified and associated with the respective sub-claims.
The safety claim does not hold.
The state-of-the-art can support consistent judgment
- Fallacies in logic.
based on objective, scientific evidence and logical reasoning
- Deficiencies in evidence.
18
Economics!
Engineering time Run time Preventative Reactive Monitor Prevent Prevent Diverse Verify Detect hazard propagation redundancy Intervene Potential to decrease intrinsic cost Cost increases 19
Acronyms & Abbreviations 1/2 AADL Architecture Analysis and Design Language CCF Common cause failure Dev Development Engrg Engineering DI&C Digital Instrumentation and Control EPRI Electrical Power Research Institute esp. Especially FSM Finite state machine HAp Hazard analysis of plans HAr Hazard analysis of requirements HAa Hazard analysis of architecture HAdd Hazard analysis of detailed design HAi Hazard analysis of implementation HAt Hazard analysis of testing (including test specifications and oracles)
IAEA International Atomic Energy Agency I&C Instrumentation and Control IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers ISO International Standards Organization IV&V Independent Verification and Validation NPP Nuclear Power Plant NRC U.S. Nuclear Regulatory Commission OWL Web Ontology Language RIL Research Information Letter RPS Reactor Protection System RWI Review, Walkthrough, and Inspection 20
Acronyms & Abbreviations 2/2 R&D Research and Development Reqmts Requirements RIL Research Information Letter RPS Reactor Protection System SCR Software Cost Reduction (set of techniques for designing software systems) spec specification SQuaRE Systems and Software Quality Requirements and Evaluation STPA System Theoretic Process Analysis (method of hazard analysis)
Std Standard V&V Verification and Validation Vp V&V of plans Vr V&V of requirements Va V&V of architecture Vdd V&V of detailed design Vi V&V of implementation Vt V&V of testing (including test specifications and oracles) 21
Discussion Supporting slides
Examples: Constraints on Architecture to prevent interference
[ID#] Unintended interactions between a system, device or other element (internal or external to a safety system) that cause adverse effects on a safety function are avoided (Controls H-SA-3 in Table ).
- Interactions are limited provably to those required for the safety functions.
- Interactions and interconnections that cannot be completely verified are avoided, eliminated, or prevented.
- Freedom from interference (including fault propagation) is assured provably across:
1.Lines of defense or protection barriers.
2.Redundant divisions of the DI&C system.
3.Elements intended to be diverse.
4.Degrees or levels of safety qualification.
5.Monitoring & monitored elements of the system.
6.Shared resources, e.g., equipment for monitoring or servicing.
- Analysis of the system demonstrates that unintended behavior is not possible.
- Interaction across different sources of uncertainty is avoided.
- The architecture precludes unwanted interactions and unwanted or unknown hidden couplings or dependencies.
- Specified information exchanges or communications occur in safe ways.
State-of-the-art methods enable satisfaction of these conditions 23
Examples: Identifying hazards controlling conditions RIL-1101 Table 1: Considerations in broadly evaluating hazard analysis Contributory hazards Conditions that reduce the hazard space ID Description ID Description H-n- H-0-i mm H-0-6 Hazard controls needed to -6G1 Hazard controls are identified and validated to be correct, complete, and consistent.
satisfy system constraints [H-0-7G1]
(which prevent hazards) are inadequate.
-7 Flow-down to verifiable -7G1 Requirements and constraints [H-0-6G1] are formulated and validated to be correct, requirements and complete, consistent constraints is inadequate
-11 Required control action is - Each required control action is analyzed for ways in which it can lead to a hazard, e.g.
degraded. 11G1 1. ~ not provided when needed
- 2. ~ provided when not needed
- 3. ~ provided at incorrect time
- 4. ~ provided too long
- 5. ~ provided too short
- 6. ~ is intermittent
- 7. ~ interferes with another ..
- 8. ~ exhibits Byzantine behavior
- 9. Incorrect state transition occurs
- 10. Incorrect input value Sources: RIL-1101; RIL-1002 24
Examples: Controlling causes of hazards from complexity Contributory hazards Conditions that reduce the hazard space ID ID H-S- Description H-S- Description 1 The system is not 1G1 Verifiability required property, flowing down system most finely grained constituents.
sufficiently verifiable and 1G1.1 Verifiability checked at every phase, at every level of integration, before next phase.
understandable ...
1.1G1.1 The behavior is unambiguously specified (incl. unexpected inputs) at every level of integration.
considerations and criteria are not formulated at the 1.1G1.2 The flow-down (from composition to decomposition) ensures that:
beginning of the 1. Allocated behaviors satisfy the behavior specified at the next higher level.
development lifecycle; 2. Unspecified behavior does not occur.
1.1G1.3 System behavior composed of element behaviors such that when all elements verified therefore, corresponding individually, their compositions may also be considered verified; no unspecified behavior emerges.
architectural constraints are not formalized and checked . 1.1G1.4 Development follows a refinement process.
1.1.1 Unanalyzed/unanalyzable 1.1.1G1 Static analyzability: System is statically analyzable.
conditions exist, e.g. 1. All states, including fault conditions, are known.
unknown/unwanted system 2. All fault states that lead to failure modes are known.
states. 3. The safe-state space of the system is known.
1.2 1.3 2 Comprehensibility: System 2G1 Behavior is completely and explicitly specified.
behavior not interpreted 2G3 Behavior is understood or interpreted completely, correctly, consistently, and unambiguously correctly/consistently by its 2G6 The architecture is specified such that it is unambiguously interpretable by the community of its users [H-S-1]. users (e.g., reviewers, architects, designers, implementers), that is, the people and the tools they use.
Source: RIL-1101 25
Examples: Controlling causes of hazards from interference Contributory hazards Conditions that reduce the hazard space ID H-SA ID
- Description H-SA- Description 3 A system, device, or other 3G2 Interactions and interconnections that preclude complete V&V are avoided, eliminated, or element (external or internal to prevented.
a safety system) might affect a 3G3 Freedom from interference is assured provably across:
safety function adversely 1. Lines of defense.
through unintended 2. Redundant divisions of system.
interactions caused by some 3. Degrees of safety qualification.
combination of deficiencies, 4. Monitoring & monitored elements of the system.
disorders, malfunctions, or 3G4 Analysis of the system demonstrates that unintended behavior is not possible.
oversights. 1. Interaction across different sources of uncertainty is avoided.
- 2. The architecture precludes unwanted interactions, unwanted or hidden couplings.
- 3. Specified information exchanges or communications occur in safe ways.
3G6 Constraints are identified for such contributing hazards from the environment as EMI; 3G7 The impact of dependency-affecting change is analyzed to demonstrate no adverse effect.
4 [H-SA-3G4]: A function, 4G1 Analysis of the execution-behavior of the system proves that such interference will not occur. For whose execution is required at example, worst-case execution time is guaranteed.
a particular time, cannot be performed as required because of interference through sharing of some resource it needs.
5 Timing constraints are not 5G1 correctly specified and not correctly allocated.
Source: RIL-1101 26
Approaches and Methods used for Unit Verification 2/2 Approaches and Source of Uncertainty Addressed Additional Explanation Methods by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Black box testing Used when information internal to Automation & combinatorial the unit is not available. testing have extended the test coverage, but coverage of the fault space is not assured to be complete.
White box testing Enables coverage of fault space IV&V agent requires access when best-practice V&V methods to unit-internal information.
have been used in preceding phases of development.
Review, Walkthrough Fills the gaps in V&V uncovered and Inspection (RWI) by analytical methods.
Sources: IEC 61508, IEC 62279, etc.
27