Sunday, July 27, 2025

Do we still need to worry about a big cyberattack on the power grid?


Note from Tom 7/27: Kevin Perry, retired Chief CIP Auditor of the SPP Regional Entity and co-leader of the NERC Standards Drafting Team that drafted CIP versions 2 and 3, clarified my amateurish electrical engineering musings at three different points in this post. I have created a footnote for each of his observations.

Last Friday, the typically free-flowing meeting of a group I lead, the OWASP SBOM Forum, got onto the subject of grid cyberattacks. One of the members of the group put a link to a 2020 Wired article about the Aurora Generator Test at Idaho National Laboratory in 2007 in the chat.

I remember the uproar created by the Aurora test and how it changed the popular perception of the power grid. Previously, the grid had mostly been perceived as something that’s sturdy and stable, but not very interesting. After the Aurora test, it was increasingly considered to be something that’s quite interesting, but at the same time highly vulnerable to cyberattacks. People started thinking that grid attacks were such a big threat that it was almost inevitable that one would cause a huge outage that puts us back in the Stone Age.

The Aurora test was dramatic – after all, who doesn’t love seeing a large machine explode? However, the full story is a little more complicated than that. In fact, the Aurora test fell far short of demonstrating that the grid is highly vulnerable to cyberattacks; in fact, it’s far less vulnerable than almost any other part of our critical infrastructure.

Here’s some background: As you may know, the power grid is based on alternating current (AC). That means the voltage at any point in the grid (say, a certain point on a power line) varies between the minimum and maximum values a certain number of times per second. That number is approximately 60 Hertz in the US and 50 Hertz in Europe. This is referred to as the frequency of the grid.

Generators that run on fossil fuels (coal, natural gas, oil, etc.) produce AC power.[i] However, the generator can’t be connected to the grid if its frequency doesn’t closely match that of the grid, since it can be damaged if that happens. Very small deviations are usually acceptable, but even a frequency of 59 or 61 might be unacceptable.[ii]

This is why most generators are protected by a device called a protective relay. This is installed between the generator and the line that is connected to the grid. The relay senses the frequency of the generator and compares it to the grid’s frequency (which also varies, but normally by very little). If the difference exceeds some predetermined value, the relay commands a circuit breaker to open (disconnect) the line until the difference comes back within the tolerable range. The relay and breaker are normally installed in a switching yard outside of the generating facility.

The Aurora attack starts by causing the relay to open the line; when that happens, the generator speeds up (like when you engage the clutch in a moving car) and gets out of sync with the grid. Normally, the relay would prevent re-connection until the generator and the grid were synchronized again. However, the Aurora attack forces reconnection anyway. This results in a huge amount of torque being applied to the generator shaft, which causes physical damage; this whole cycle is repeated until the generator stops working due to the damage. It’s almost the equivalent of throwing a car driving on a highway into reverse without first coming to a stop.

The attack can be executed either by purely cyber means (in which case the attacker could be located remotely) or by a combination of physical and cyber means (in which case someone needs to be onsite to perform certain required physical actions, even if there is also a remote cyber attacker).

Of course, the INL test used entirely cyber means. However, as this excellent article by Schweitzer Engineering Labs (SEL)[iii] describes, a number of conditions need to be met before an attack using purely cyber means can succeed. For example, several protection measures in the relay were missing during the test, even though they would normally be expected to be in place (one, called “synchronism check on the tie breaker”, was in place on the relay previously but was disabled before the test).

In addition, as described on page 3 of the article, the test attack would only have succeeded in the real world if several obvious security breaches had occurred. For example, the data in a communications channel had to be left unencrypted (an unlikely occurrence today, although probably more likely in 2007) and the channel had to be breached by the attackers. Also, the attackers needed to know either one or two passwords controlling access to the protective relay settings. Finally, in real life the relay would have notified the SCADA operator – who can “see” all relays - of the change in access privileges to its settings, presumably leading to discovery of the attack.[iv]

In other words, the likelihood that a purely cyber attack based on Aurora would have succeeded in a real world situation is small, especially today, 18 years after the test. This is in part because of the publicity that resulted from that test; cyber security practices are much stronger in the power industry than they were then; in fact, the NERC CIP cybersecurity standards only came into effect starting in 2009 (the voluntary NERC standard that was in effect at the time of the test, called Urgent Action 1200 – didn’t apply to generation. Thus, the fact that the test was run and was widely publicized, even though it was flawed, undoubtedly resulted in increased grid security).

It’s possible that a physical Aurora attack (which would have to be conducted by someone positioned at the “tie breaker” in the switching yard) might have a better chance of succeeding, but that obviously requires the hackers to get into the switching yard. Switching yards and generating plants are usually under heavy security (although probably not if the generator is a small one like the 2.25 megawatt diesel generator used in the test at INL. Of course, a successful attack on a 2.25MW generator is unlikely to cause much disturbance in the power grid). Unless the attackers have managed to bribe an employee of the company that operates the generator being attacked to let one of them accompany the employee into the yard, it’s very unlikely they could ever be in a position to carry out the physical attack.[v]

Thus, I can safely say that nobody needs to stay up late at night worrying that the next morning their lights won’t work due to an Aurora attack on the generation facility that powers their neighborhood. In fact, any attack on a single generator - even a generator in a huge plant like the Grand Coulee Dam, the largest power source in North America - is unlikely to lead to anything more than a local outage of a couple of hours; it certainly won’t cause a cascading outage like the 2003 Northeast blackout. This is because there's all sorts of redundancy built into the grid, so that no single generation failure - or even two or three simultaneous failures - can have a serious impact, or even any impact at all.

So what’s an out-of-work grid attacker to do? If he wants to have a big impact, he needs to physically attack multiple strategic high voltage Transmission substations simultaneously – at least 7-8 of them, although more would be better. However, the attacker would have to know exactly which ones to attack (I’ve heard there are between 7 and 15 substations that would need to be attacked to have a serious impact on the grid. However, don’t expect me to publish a list of them, if I ever run across one).

Moreover, the attackers would need to simultaneously conduct a Metcalf-style physical attack on each substation, yet at the same time avoid the mistakes that the Metcalf attackers (who have never been identified, let alone caught) made. In fact, since the only good definition of “substation” that I know of (there is no NERC Glossary definition) is “a bunch of expensive equipment surrounded by a fence”[vi], this should show there’s no way to launch a purely cyber attack on a substation, since there’s no central piece of equipment like the generator in a generating plant, and the devices in the substation aren’t usually on a single network.

Most importantly, the attacker would first need to go back in time at least 5-6 years, before the CIP-014 standard for physical security of substations - which was developed in response to the Metcalf attack - came into effect. This is because CIP-014 is probably one of the most effective NERC standards ever developed.

In fact, after there were initial concerns that only a couple hundred substations would be declared in scope for CIP-014, well over 1,000 were declared (NERC says there are about 25,000 Bulk Electric System substations all told. BES substations are all either low or medium impact. Only a subset of the latter are in scope for CIP-014). The power industry was clearly worried that the Metcalf attack was just a test run for The Big One, so they invested a lot of money in CIP-014 compliance (including measures like ballistic barriers around substations).

In short, I think it’s close to impossible for a cyber or physical attack based on Aurora, or frankly any other cyber vulnerability, to succeed in causing an outage of any size, let alone a cascading outage. There has never been an outage of any size caused by a cyberattack in North America.

If you want to worry about a grid attack that would have a huge impact, I suggest you read about EMP attacks, in which a nuclear weapon is detonated about a mile above the US. Such an attack could conceivably fry most of the large transformers on which the grid depends – and which take a year or so to replace. However, a massive solar storm (or the explosion of a large meteor, like the one that did in the dinosaurs) could produce similar devastation.

In fact, the real cause for worry is any prolonged (say, more than two weeks) and widespread outage (say, over several states), no matter what its cause (like a hurricane more massive than Sandy). This could result in hundreds or thousands of people dying and civil order breaking down. The fact that there is such a miniscule likelihood that this could be caused by a cyber or physical attack doesn’t mean it’s a waste of time and money to harden the grid against such attacks. After all, risk = likelihood X impact. A small likelihood times an unimaginably large impact[vii] still yields a high risk.

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve often been told that I should either accept advertising or put up a paywall and charge a subscription fee, or both. However, I really don’t want to do either of these things. It would be great if everyone who appreciates my posts could donate a $20 (or more) “subscription fee” once a year. Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] Renewable power generating devices like wind turbines and solar panels produce direct current (DC). The DC current is changed to AC before being sent out over the power grid. The device that effects this transformation is an inverter. The Aurora vulnerability does not apply to such devices.

Kevin Perry pointed out that wind turbines and solar panels operate differently from an electrical point of view: “Solar panels produce DC power, which has to go through an inverter to be delivered to the grid.  However, the wind turbine, which is a rotating machine, is an induction motor that generates AC power.  But, the speed of the turbine, which varies with wind speed, affects frequency, so the generated power goes through a rectifier to convert it to DC and then an inverter to convert it back to AC at a stable frequency.  It is essentially the same way an uninterruptible power supply works.”

[ii] Kevin corrected what I said in this paragraph by saying, “The generator is not sync’d based on frequency.  It is sync’d based on the phase angle of the generator versus the grid.  If I recall, and I don’t profess to be an engineer, the phase angle, also called phase shift, is the time lag between voltage and current, whereas frequency is the number of cycles per second, or how many times the current changes direction per second.  The protection relay is normally set to prevent connection to the grid unless the phase angle is within a couple degrees of zero.  The relay was deliberately misconfigured by the hack to change the connection phase angle to 120 degrees, which causes the worst torque.”

[iii] SEL is the largest manufacturer of electronic relays worldwide; the relay used in the test may have been theirs. While I have always found SEL staff members to be honest and above board, the fact that the authors were SEL employees should be kept in mind when reading the article.

[iv] After pointing out that good security practices on the protective relay would have prevented the test from succeeding, the article continues with a discussion of recommended controls for the generator that also were not in place during the test.

[v] Regarding this paragraph, Kevin stated, “As you noted, an intruder can manually cause the damage by operating the breaker panel T handle in the substation control house, not at the breaker itself.  But, it doesn’t have to be the generator switchyard.  Any station along the generator lead line offers a connect/disconnect point.”

[vi] This is my definition, in case you hadn’t guessed.

[vii] Ted Koppel put out a good book about this problem, Lights Out, in 2016. It's supposedly about what would happen if a cyberattack caused a prolonged, widespread outage, but it's really about what would happen no matter what the cause of the outage; it’s frightening, but well researched and an easy read.

 

 

No comments:

Post a Comment