While acquainting themselves with a new plant’s operations, a controls engineer delved into the alarm summary to gather insights. What they discovered was staggering: The system was inundated with hundreds of alarms. Some were being triggered more than 50 times a day while others remained in a constant state of alarm for weeks at a time.
Further inquiry revealed this scenario was considered normal within the day-to-day operations of the plant. The plant staff were comfortable with the state of their system and the volume of alerts.
However, as an outsider, the controls engineer felt overwhelmed and struggled to effectively navigate the system. The disconnect between experiences prompted the engineer to consider how to bridge this gap in understanding and propose a feasible solution to address the issue.
Alarming serves a critical function within a SCADA system, alerting operations (either audibly or visibly) to items that need to be addressed, such as process deviation, abnormal conditions, or equipment malfunction. However, the function of an alarm system is often compromised when high frequencies of alarms occur, leading to habituation, desensitization, and complacency among operators. This overstimulation diminishes the significance of alarms and hinders the system’s ability to bring attention to genuine emergencies.
The classic tale “The Boy Who Cried Wolf” provides an analogy of consequences that come from an alarm system with poor performance. In the story, the boy repeatedly falsely alarms the townspeople. Over time, the people become indifferent to the alarms. Consequently, when a real wolf threatens the safety of the town, the people no longer heed the boy’s cries and the sheep suffer the consequences in the end. Unfortunately, comparable situations are not uncommon in SCADA systems, where an operator’s desensitization to authentic alarms endangers operations and threatens safety.
Based on situations like the one mentioned above, with the understanding the scenario described is unfortunately quite common in SCADA systems, it is recommended to approach alarm system health holistically.
Crises can be averted by conducting an objective assessment of an alarm system’s state and address areas of concern. This proactive stance allows users to identify potential issues before they escalate, enabling timely intervention and effective implementation.
Through systematic evaluation and targeted action, organizations can mitigate risks and ensure continued safety, efficiency and reliability of critical industrial processes.
With a well-defined process, an unbiased perspective of alarm system health can be achieved.
This four-step process identifies, addresses and maintains the health of an alarm system:
Analysis: Use standards to measure the alarm system’s health. This serves as a baseline for the evaluation, removing subjectivity.
Review: Review the results with a multidisciplinary group (e.g., programmers, operations personnel, engineers, etc.). During this phase:
Act: Implement solutions based on the results of the alarm analysis and review steps.
Repeat: Run the analysis, review, act and continually repeat the process for ongoing improvement and maintenance of alarm system health.
Regarding this process, it’s important to keep in mind the following:
An effective way to assess the health of an alarm system is to measure it against a standard. This helps remove subjectivity and mitigate habituation, also known as “Tuning Out.” One such standard for alarm systems is found in IEC 62682 a set of guidelines developed by the International Electrotechnical Commission (IEC, 2022).
This standard offers recommendations for the design, implementation, operation and management of industrial alarm systems. It delineates principles for alarm management covering aspects like design, prioritization and documentation, with the aim to enhance safety, efficiency, and situational awareness in industrial settings. Adherence to IEC 62682 facilitates the establishment of best practices for alarm systems, providing direction on items such as the following:
With the alarm analysis process outlined, the next step is reviewing the results, which should be conducted with a multidisciplinary group. The group should include programmable logic controller (PLC) programmers, human-machine interface (HMI) developers, engineers, operators and others who could provide a valuable perspective.
The group’s perspectives, expertise and insights lead to more thorough results compared to individual efforts.
As listed above, the analysis results are broken down into various measurables. The priorities should be determined by the group, based off need. When meeting, consider that smaller, frequent meetings will be more effective. There’s no need to tackle the whole task at once. Keep in mind there isn’t a single solution to every alarm problem. Alarm issues will require different approaches and solutions to create a healthy alarm system.
The next crucial step is to translate the insights gathered into actionable strategies and ensure the identified concerns will be effectively addressed. The actions taken in this stage will vary, but some samples are provided below to provide guidance and inspiration.
These issues outline commonly found deficiencies along with actions that can be taken to resolve them:
Alarm system health is crucial for the efficient and safe operation of industrial processes controlled by SCADA systems. Adopting a holistic approach to assessing, reviewing and acting upon alarm system issues helps organizations to mitigate risks and ensure the continued reliability of critical processes.
Leveraging standards such as IEC 62682 provides a framework for evaluating alarm system performance and identifying areas for improvement. Collaboration within multidisciplinary teams fosters diverse perspectives and leads to more comprehensive solutions.
Continuous monitoring and periodic re-evaluation are essential for maintaining alarm system health over time. By following a structured process and remaining proactive in addressing alarm system challenges, organizations can enhance operational efficiency, promote safety, and safeguard workers and facilities against potential disruptions.