Why OT Incidents Are Often Diagnosed Backwards

By Muhammad Ali Khan ICS/ OT Cybersecurity Specialist — AAISM | CISSP | CISA | CISM | CEH | ISO27001 LI | CHFI | CGEIT | CDCP
The Fundamental Misstep: Starting With the Wrong Question
In OT and ICS environments, incident response rarely begins with the question it should. Instead of asking what actually changed in the system, investigations often start by looking for a known cyberattack pattern, a signature, or a familiar IT-style failure. This backward approach is one of the most persistent reasons OT incidents drag on for weeks, get misclassified, or are quietly written off as “operational issues” rather than recognized as security failures.
OT systems do not fail loudly when something goes wrong. They fail subtly, gradually, and often in ways that look like normal process instability. When a production line slows down, a turbine trips unexpectedly, or sensor values begin drifting, the immediate assumption is almost always mechanical wear, calibration issues, or operator error. Cybersecurity is treated as a last resort. By the time security teams are involved, the original indicators are often gone, logs have rolled over, and the system has already been “fixed” in ways that destroy evidence.
Outcome-Driven Thinking Masks the Real Cause
Another reason OT incidents are diagnosed backwards is the dominance of outcome-based thinking. Investigations focus on the visible failure: the shutdown, the safety trip, the batch deviation, or the quality loss. Teams then work backward from that outcome, searching for a single root cause that neatly explains everything.
In OT environments, this mindset is misleading. Many cyber incidents do not directly cause the final failure. Instead, they introduce small deviations — timing shifts, logic changes, intermittent communications — that slowly accumulate until the system crosses a safety or reliability threshold. By the time the failure becomes visible, the original cyber trigger may be several steps removed, making it easy to overlook or dismiss.
Legacy Operational Bias Against Cyber Causes
There is a deep cultural bias in OT environments toward physical explanations. Engineers are trained to trust the process first. If a valve misbehaves, the valve is blamed. If a controller behaves unpredictably, firmware, hardware, or wiring becomes the focus. For decades, this approach made sense because physical faults were far more common than cyber manipulation.
Modern OT systems, however, are software-defined, remotely accessed, and interconnected. Digital changes can now produce physical effects that look indistinguishable from mechanical faults. When teams rely on outdated mental models, cyber-induced behavior is interpreted through a purely mechanical lens, leading to misdiagnosis and false confidence.
IT-Centric Incident Models Don’t Fit OT Reality
Traditional cybersecurity frameworks further reinforce backward diagnosis. IT incident response is built around clear indicators: malware alerts, authentication failures, lateral movement, and data exfiltration. OT incidents rarely produce these signals.
Instead, OT cyber events manifest as anomalies that sit below security thresholds but above operational tolerance. A PLC scan cycle slows slightly, historian timestamps drift, or an HMI displays stale data. None of these necessarily trigger security alarms. When investigators begin by asking “where is the malware?”, they miss the reality that many OT attacks involve no malware at all, only logic manipulation, configuration abuse, or timing interference.

How Incident Reporting Hides Cyber Events
OT incidents are usually documented as reliability, safety, or maintenance issues. A relay misoperation, compressor surge, or unexpected trip is recorded in operational terms, often without any reference to digital influence.
When security teams review these reports later, they see no explicit cyber indicators and assume there was no security component. Over time, organizations internalize the belief that they have “never had a cyber incident,” when in reality those incidents were simply classified under different names.
The Silent Role of Organizational Incentives
Labeling an incident as cyber-related often brings regulatory scrutiny, audits, and uncomfortable questions about network design and access control. As a result, there is an unspoken incentive to explain incidents in non-cyber terms whenever possible.
Backward diagnosis allows organizations to settle on a familiar and less disruptive explanation early. Once a mechanical or human-error narrative is accepted, challenging it requires effort, evidence, and political capital that many teams are reluctant to spend.
Recovery First, Evidence Last
Industrial environments prioritize restoration over investigation. When production is disrupted, the primary goal is to get systems running again. Controllers are rebooted, logic is restored from backups, configurations are rolled back, and temporary workarounds are applied.
While these actions stabilize operations, they also erase forensic evidence. By the time a formal review occurs, the system no longer reflects the state in which the incident occurred. Investigators are left reconstructing events from partial logs and assumptions, reinforcing backward conclusions.
Why This Pattern Is So Dangerous
Backward diagnosis prevents organizations from seeing patterns. When each incident is treated as an isolated operational failure, broader campaigns remain invisible. Repeated unexplained anomalies across different assets or sites may be related, but without a security-first perspective, they are never correlated.
This blind spot allows attackers to operate below detection thresholds, causing persistent disruption without ever triggering a coordinated cyber response.
Rethinking How OT Incidents Should Be Investigated
Correcting this problem requires a shift in mindset. Instead of starting with what failed, investigations must start with what changed. What behavior deviated from normal? When did it happen? Who or what could make that change?
This approach does not assume a cyberattack, but it treats digital influence as a primary possibility rather than an afterthought. It also demands closer collaboration between operations, engineering, and security teams, with shared access to logs, configurations, and process data.
The Real Question OT Teams Need to Ask
OT incidents continue to be diagnosed backwards because failure in industrial systems does not look like a breach. Until organizations stop asking “what broke?” and start asking “what was altered?”, many of the most serious OT cyber incidents will remain misclassified, misunderstood, and destined to repeat.
Comments
Post a Comment