Latest revision as of 08:54, 25 November 2024

Text

State-of-the-art Approaches to Reduce the Potential for CCF in I&C Systems Conditions to Avoid the Need for Diversity in Design
	ML23256A010
Person / Time
Issue date:	09/18/2023
From:	Sushil Birla; NRC/RES/DE
To:	;
	Sushil Birla 301-415-2311
References
	Download: ML23256A010 (27)
	v • d • e

State-of-the-art approaches to reduce the potential for CCF in I&C systems Conditions to avoid the need for diversity in design Sushil Birla Senior Technical Advisor U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research The views expressed herein are those of the author and do not represent an official position of the U.S. NRC.

IAEA Workshop on Assessment and Reduction of Vulnerabilities to Common Cause Failures in Instrumentation and Control Systems in Nuclear Power Plants 18-22 September 2023

Terminology & Scope To assimilate knowledge from outside the NPP industry and avoid ambiguity Sources of definitions are broader than NPP-specific standards.

Context in focus: Operating power reactor protection systems.

Focus: Hazards from (systemic) common causes:

Rooted in engineering deficiencies That may degrade the redundancy and defense-in-depth characteristics Hazard: potential for harm through the degradation of a safety function allocated to the object under analysis Examples of sources: ISO/IEC/IEEE 24765; ISO/IEC Systems & Software SQuaRE series; ISO/IEC 15026 2

Meaning of state-of-the-art in this presentation State-of-the-art State-of-the-practice Current practice Capability demonstrated in leading-edge implementations; not yet scaled up Best-in-class; best practices, e.g.,

as seen in industry consensus standards As seen in many organizations 3

Reference Framework for Assurance Plans Concept Requirements Architecture Detailed design Implementation Testing Verification Validation (V&V)

Vp System Development HAp Requirements from NPP Safety Analysis HAc HAr HAr HAdd HAi HAi Vc Vr Va Vdd Vi Vt Safety Engineering Reference model from IEEE Std 1012 4

Identifying elements to reduce the uncertainty space Conditions to reduce associated uncertainties

= all phases Conditions on methods and tools Logical integration of all the evidence Reducing inconsistencies in judgment

+

5 Deficiencies in:

Hazard identification*

Requirements specification Architectural specification Detailed design specification Implementation (coding)

Verification

Prevention Mitigation Fault tolerance in Design Q

u a

l i

t y

o f

D e

s i

g n

Desired state Current state Changes needed to prevent CCF Objective evaluation criteria Paradigm State of practice Competence Culture

Defect-prevention through Refinement Requirements Architecture Detailed design Implementation Abstraction Declarative (what)

Imperative (how)

Concretion R

E F

I N

E M

E N

T

Refinement: A key preventative technique Development Phase Requirements Architecture Detailed design Implementation Constraints on language for each phase Domain-specific controlled natural language Domain-specific architecture modeling language Domain-specific design specification language Domain-specific coding/programming language Semantically compatible Semantically compatible Semantically compatible refinement refinement refinement Reduce defects by reusing composable assets: see IEEE Std 1517; ISO/IEC 26550 family Problem space: Domain modeling Solution space: Domain engineering

Approaches and Methods for Improved Requirements Approaches and Methods Source of Uncertainty Addressed by this Method (i.e.

Method effectiveness in addressing uncertainty; Benefit Additional Explanation Restricted or Constrained or Controlled Natural Language Improved specification:

Reduced ambiguity; Improved consistency and verifiability.

Supports auto-generation of verification conditions (e.g., test cases).

Example of enabler: Web Ontology Language (OWL).

Behavior Specification As above.

The finite state machine (FSM) paradigm supports stepwise refinement and flow-down of behavior specification.

Also supports auto-generation of verification conditions (e.g., test cases).

Facilitates hazard analysis (e.g. STPA)

Domain Specific Specialization Higher quality of specification and V&V.

Enables re-use of pre-verified building blocks, specific to an application domain.

Example: ReqSpec in AADL Theorem Proving Provides proof that a conclusion can be inferred deductively from a proven chain of premises.

Improved V&V.

Can also be used to identify gaps in the chain of premises.

Example: Used as a module in SCR.

Model-Checking Checks that the flow-down is correct and consistent.

Review, Walkthrough, and Inspection (RWI)

Fills the gaps in requirements V&V uncovered by analytical methods.

Sources: IEC 61508, IEC 62279 9

Approaches to Address Uncertainties Introduced in Methods used for Requirements Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Restricted Natural Language Comprehensibility is traded off for Disambiguation.

Annotations in the model bridge the gap.

Behavior Specification Potential for semantic inconsistency across different FSM modeling environments.

Constrain the interacting environments to eliminate the inconsistencies, specific to the domain of interest.

Domain Specific Specialization When a new application does not fit in a predefined domain, adaptation may degrade model validity.

RWI by expert team.

Theorem Proving Language transformations required to enable its use may introduce hard-to-find semantic inconsistencies.

Use in combination with domain-specific environments for specification, validation and verification.

RWI by expert team.

Model-Checking Model-checking is highly dependent on the fidelity of the model to reality.

As for theorem proving.

Review, Walkthrough, Inspection Human fallibility.

Independent RWI by expert team.

10

Approaches and Methods used for Architecture Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Theorem Proving See table for requirements See table for requirements Prevention through Modeling Constraints Limits the uncertainty space by preventing or detecting incorrect or unverifiable constructs in the models.

Model-Checking See table for requirements Review, Walkthrough, and Inspection (RWI)

See table for requirements Sources: IEC 61508, IEC 62279 11

Approaches to Address Uncertainties Introduced in Methods used for Unit Verification 1/2 Approaches and Methods Uncertainties Introduced Approaches to Address Uncertainties Introduced Correct by construction Undetected incorrect transformations.

Independent V&V (IV&V) of:

Integrated tool suite, Libraries, Other reusable assets, Development environment(s).

Semantic consistency across interfaces Safe subset of the programming language Programmer may use features outside the safe subset.

For safe subset language and the tools enforcing usage within the safe subset:

IV&V Pre-certification Configuration control Change control Model-checking See prior slide.

See prior slide.

Static analysis Does not discover faults which occur only during execution.

Requires source code.

Complement with model-checking.

12

Schedulability Need to verify that the workload fits within the available resources Workload Computing (esp. in microprocessor-based platforms)

Communication (esp. in serial networks)

Typically, cyclic, requiring accurate periodicity Typically, many tasks requiring different amounts of resources Resources (typically, shared across different tasks)

Time (computing; communication)

Space (memory)

In general, high computational complexity Constraints can be applied to reduce the complexity Example: Programmable Logic Controllers (PLCs)

For more information, see RIL-1101 Appendix I H. KOPETZ, Simplicity is complex, Springer (2019)

Conditions to Reduce Uncertainties Associated with Tools - Examples 14

1. The development environment is qualified and certified for the domain of usage.

2. The development environment is maintained under configuration management (as a set).

3. Restrictions for safe use of a tool are identified and enforced.

4. Semantics are preserved in information exchanged across tools used in system development.

5. The architectural description method is unambiguous.

6. Methods, and languages used to describe, represent, or specify architectures support unambiguous transformation across development phases and dissimilar elements from different sources.

7. Automation used for the creation of a work product is independent of automation used for the V&V of that work product.

8. The developers of a work product are different from those performing its V&V.

9. V&V tools are qualified by people independent of the developers and users of these tools..

10. V&V tools are qualified using methods that are independent of the methods implemented by these tools.

Abstraction Refinement Disambiguation Domain modeling; domain engineering Compositionality Schedulability For practicable guidance, see R. Hite, et al, SYMPLE: A complexity-aware approach for realizing verifiable FPGA-based digital I&C for safety critical applications https://www.ans.org/pubs/proceedings/article-49775/

Summary: Concepts to support Assurability

Some known limitations Validating results of hazard analysis Did it really identify all causes that could degrade the safety function?

Validating assumptions about the environment of the safety system, e.g.:

Conditions of operation and maintenance Configuration control change impact analysis Qualifying suite of tools from different sources Libraries Underlying languages Infrastructure for independent V&V 16

Reasoning Model to support performance-based evaluation 17 Reasoning Assertion Premise / Evidence Influences on validity of proposition Rebuttals Qualifiers (Strength; Condition)

Inference rule Theoretical or causal model Basis for Used in (based on the Toulmin model1) 1Toulmin, S., The Uses of Argument, Cambridge, UK: Cambridge University Press, 1958 Doubts/Defeaters

Judgment The safety claim is satisfied unconditionally (i.e., the residual uncertainty has an insignificant effect on the safety claim).

No one can find any uncontrolled hazard with the potential to degrade the performance of the safety function

No one can find any unmitigated "defeater" The safety claim is not satisfied with the given evidence.

The residual uncertainty is so great that the safety claim cannot be supported.

The defeaters are identified and associated with the respective sub-claims.

The safety claim does not hold.

Fallacies in logic.

Deficiencies in evidence.

Decide The state-of-the-art can support consistent judgment based on objective, scientific evidence and logical reasoning 18

Economics!

19 Engineering time Run time Monitor Detect Intervene Diverse redundancy Prevent hazard Prevent propagation Verify Reactive Preventative Cost increases Potential to decrease intrinsic cost

Acronyms & Abbreviations 1/2 AADL Architecture Analysis and Design Language CCF Common cause failure Dev Development Engrg Engineering DI&C Digital Instrumentation and Control EPRI Electrical Power Research Institute esp.

Especially FSM Finite state machine HAp Hazard analysis of plans HAr Hazard analysis of requirements HAa Hazard analysis of architecture HAdd Hazard analysis of detailed design HAi Hazard analysis of implementation HAt Hazard analysis of testing (including test specifications and oracles)

IAEA International Atomic Energy Agency I&C Instrumentation and Control IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers ISO International Standards Organization IV&V Independent Verification and Validation NPP Nuclear Power Plant NRC U.S. Nuclear Regulatory Commission OWL Web Ontology Language RIL Research Information Letter RPS Reactor Protection System RWI Review, Walkthrough, and Inspection 20

R&D Research and Development Reqmts Requirements RIL Research Information Letter RPS Reactor Protection System SCR Software Cost Reduction (set of techniques for designing software systems) spec specification SQuaRE Systems and Software Quality Requirements and Evaluation STPA System Theoretic Process Analysis (method of hazard analysis)

Std Standard V&V Verification and Validation Vp V&V of plans Vr V&V of requirements Va V&V of architecture Vdd V&V of detailed design Vi V&V of implementation Vt V&V of testing (including test specifications and oracles) 21 Acronyms & Abbreviations 2/2

Discussion Supporting slides

Examples: Constraints on Architecture to prevent interference

[ID#] Unintended interactions between a system, device or other element (internal or external to a safety system) that cause adverse effects on a safety function are avoided (Controls H-SA-3 in Table ).

Interactions are limited provably to those required for the safety functions.

Interactions and interconnections that cannot be completely verified are avoided, eliminated, or prevented.

Freedom from interference (including fault propagation) is assured provably across:

1.Lines of defense or protection barriers.

2.Redundant divisions of the DI&C system.

3.Elements intended to be diverse.

4.Degrees or levels of safety qualification.

5.Monitoring & monitored elements of the system.

6.Shared resources, e.g., equipment for monitoring or servicing.

Analysis of the system demonstrates that unintended behavior is not possible.

Interaction across different sources of uncertainty is avoided.

The architecture precludes unwanted interactions and unwanted or unknown hidden couplings or dependencies.

Specified information exchanges or communications occur in safe ways.

State-of-the-art methods enable satisfaction of these conditions 23

Examples: Identifying hazards controlling conditions RIL-1101 Table 1: Considerations in broadly evaluating hazard analysis Contributory hazards Conditions that reduce the hazard space ID H-n-mm Description ID H-0-i Description

H-0-6 Hazard controls needed to satisfy system constraints (which prevent hazards) are inadequate.

-6G1 Hazard controls are identified and validated to be correct, complete, and consistent.

[H-0-7G1]

-7 Flow-down to verifiable requirements and constraints is inadequate

-7G1 Requirements and constraints [H-0-6G1] are formulated and validated to be correct, complete, consistent

-11 Required control action is degraded.

11G1 Each required control action is analyzed for ways in which it can lead to a hazard, e.g.

1.

~ not provided when needed 2.

~ provided when not needed 3.

~ provided at incorrect time 4.

~ provided too long 5.

~ provided too short 6.

~ is intermittent 7.

~ interferes with another..

8.

~ exhibits Byzantine behavior 9.

Incorrect state transition occurs 10.

Incorrect input value Sources: RIL-1101; RIL-1002 24

Examples: Controlling causes of hazards from complexity Contributory hazards Conditions that reduce the hazard space ID H-S-Description ID H-S-Description 1

The system is not sufficiently verifiable and understandable...

considerations and criteria are not formulated at the beginning of the development lifecycle; therefore, corresponding architectural constraints are not formalized and checked.

1G1 Verifiability required property, flowing down system most finely grained constituents.

1G1.1 Verifiability checked at every phase, at every level of integration, before next phase.

1.1G1.1 The behavior is unambiguously specified (incl. unexpected inputs) at every level of integration.

1.1G1.2 The flow-down (from composition to decomposition) ensures that:

1.

Allocated behaviors satisfy the behavior specified at the next higher level.

2.

Unspecified behavior does not occur.

1.1G1.3 System behavior composed of element behaviors such that when all elements verified individually, their compositions may also be considered verified; no unspecified behavior emerges.

1.1G1.4 Development follows a refinement process.

1.1.1 Unanalyzed/unanalyzable conditions exist, e.g.

unknown/unwanted system states.

1.1.1G1 Static analyzability: System is statically analyzable.

1.

All states, including fault conditions, are known.

2.

All fault states that lead to failure modes are known.

3.

The safe-state space of the system is known.

1.2

1.3

2 Comprehensibility: System behavior not interpreted correctly/consistently by its users [H-S-1].

2G1 Behavior is completely and explicitly specified.

2G3 Behavior is understood or interpreted completely, correctly, consistently, and unambiguously 2G6 The architecture is specified such that it is unambiguously interpretable by the community of its users (e.g., reviewers, architects, designers, implementers), that is, the people and the tools they use.

Source: RIL-1101 25

Examples: Controlling causes of hazards from interference Contributory hazards Conditions that reduce the hazard space ID H-SA Description ID H-SA-Description 3

A system, device, or other element (external or internal to a safety system) might affect a safety function adversely through unintended interactions caused by some combination of deficiencies, disorders, malfunctions, or oversights.

3G2 Interactions and interconnections that preclude complete V&V are avoided, eliminated, or prevented.

3G3 Freedom from interference is assured provably across:

1.

Lines of defense.

2.

Redundant divisions of system.

3.

Degrees of safety qualification.

4.

Monitoring & monitored elements of the system.

3G4 Analysis of the system demonstrates that unintended behavior is not possible.

1.

Interaction across different sources of uncertainty is avoided.

2.

The architecture precludes unwanted interactions, unwanted or hidden couplings.

3.

Specified information exchanges or communications occur in safe ways.

3G6 Constraints are identified for such contributing hazards from the environment as EMI; 3G7 The impact of dependency-affecting change is analyzed to demonstrate no adverse effect.

4

[H-SA-3G4]: A function, whose execution is required at a particular time, cannot be performed as required because of interference through sharing of some resource it needs.

4G1 Analysis of the execution-behavior of the system proves that such interference will not occur. For example, worst-case execution time is guaranteed.

5 Timing constraints are not correctly specified and not correctly allocated.

5G1

Source: RIL-1101 26

Approaches and Methods used for Unit Verification 2/2 Approaches and Methods Source of Uncertainty Addressed by this Method (i.e. Method effectiveness in addressing uncertainty); Benefit Additional Explanation Black box testing Used when information internal to the unit is not available.

Automation & combinatorial testing have extended the test coverage, but coverage of the fault space is not assured to be complete.

White box testing Enables coverage of fault space when best-practice V&V methods have been used in preceding phases of development.

IV&V agent requires access to unit-internal information.

Review, Walkthrough and Inspection (RWI)

Fills the gaps in V&V uncovered by analytical methods.

Sources: IEC 61508, IEC 62279, etc.

27

ML23256A010: Difference between revisions

Latest revision as of 08:54, 25 November 2024

Text

Navigation menu

ML23256A010: Difference between revisions

Latest revision as of 08:54, 25 November 2024

Text

Navigation menu

Search