Note from Tom 7/27: Kevin Perry, retired Chief CIP Auditor
of the SPP Regional Entity and co-leader of the NERC Standards Drafting Team that
drafted CIP versions 2 and 3, clarified my amateurish electrical engineering
musings at three different points in this post. I have created a footnote for
each of his observations.
Last Friday, the typically free-flowing meeting of a group I
lead, the OWASP SBOM Forum,
got onto the subject of grid cyberattacks. One of the members of the group put
a link to a 2020 Wired
article about the Aurora Generator Test at Idaho National Laboratory in
2007 in the chat.
I remember the uproar created by the Aurora test and how it
changed the popular perception of the power grid. Previously, the grid had mostly
been perceived as something that’s sturdy and stable, but not very interesting.
After the Aurora test, it was increasingly considered to be something that’s quite
interesting, but at the same time highly vulnerable to cyberattacks. People
started thinking that grid attacks were such a big threat that it was almost
inevitable that one would cause a huge outage that puts us back in the Stone
Age.
The Aurora test was dramatic – after all, who doesn’t
love seeing a large machine explode? However, the full
story is a little more complicated than that. In fact, the Aurora test fell far
short of demonstrating that the grid is highly vulnerable to cyberattacks; in
fact, it’s far less vulnerable than almost any other part of our
critical infrastructure.
Here’s some background: As you may know, the power grid is
based on alternating current (AC). That means the voltage at any point in the
grid (say, a certain point on a power line) varies between the minimum and maximum
values a certain number of times per second. That number is approximately 60
Hertz in the US and 50 Hertz in Europe. This is referred to as the frequency of
the grid.
Generators that run on fossil fuels (coal, natural gas, oil,
etc.) produce AC power.[i]
However, the generator can’t be connected to the grid if its frequency doesn’t closely
match that of the grid, since it can be damaged if that happens. Very small
deviations are usually acceptable, but even a frequency of 59 or 61 might be
unacceptable.[ii]
This is why most generators are protected by a device called
a protective relay. This is installed between the generator and the line that
is connected to the grid. The relay senses the frequency of the generator and
compares it to the grid’s frequency (which also varies, but normally by very
little). If the difference exceeds some predetermined value, the relay commands
a circuit breaker to open (disconnect) the line until the difference comes back
within the tolerable range. The relay and breaker are normally installed in a
switching yard outside of the generating facility.
The Aurora attack starts by causing the relay to open the
line; when that happens, the generator speeds up (like when you engage the
clutch in a moving car) and gets out of sync with the grid. Normally, the relay
would prevent re-connection until the generator and the grid were synchronized
again. However, the Aurora attack forces reconnection anyway. This results in a
huge amount of torque being applied to the generator shaft, which causes
physical damage; this whole cycle is repeated until the generator stops working
due to the damage. It’s almost the equivalent of throwing a car driving on a
highway into reverse without first coming to a stop.
The attack can be executed either by purely cyber means (in
which case the attacker could be located remotely) or by a combination of
physical and cyber means (in which case someone needs to be onsite to perform
certain required physical actions, even if there is also a remote cyber
attacker).
Of course, the INL test used entirely cyber means. However,
as this excellent article
by Schweitzer Engineering Labs (SEL)[iii]
describes, a number of conditions need to be met before an attack using purely
cyber means can succeed. For example, several protection measures in the relay
were missing during the test, even though they would normally be expected to be
in place (one, called “synchronism check on the tie breaker”, was in place on
the relay previously but was disabled before the test).
In addition, as described on page 3 of the article, the test
attack would only have succeeded in the real world if several obvious security
breaches had occurred. For example, the data in a communications channel had to
be left unencrypted (an unlikely occurrence today, although probably more
likely in 2007) and the channel had to be breached by the attackers. Also, the
attackers needed to know either one or two passwords controlling access to the
protective relay settings. Finally, in real life the relay would have notified
the SCADA operator – who can “see” all relays - of the change in access
privileges to its settings, presumably leading to discovery of the attack.[iv]
In other words, the likelihood that a purely cyber attack
based on Aurora would have succeeded in a real world situation is small, especially
today, 18 years after the test. This is in part because of the publicity that
resulted from that test; cyber security practices are much stronger in the
power industry than they were then; in fact, the NERC CIP cybersecurity
standards only came into effect starting in 2009 (the voluntary NERC standard
that was in effect at the time of the test, called Urgent Action 1200 – didn’t
apply to generation. Thus, the fact that the test was run and was widely
publicized, even though it was flawed, undoubtedly resulted in increased grid
security).
It’s possible that a physical Aurora attack (which would
have to be conducted by someone positioned at the “tie breaker” in the
switching yard) might have a better chance of succeeding, but that obviously requires
the hackers to get into the switching yard. Switching yards and generating plants
are usually under heavy security (although probably not if the generator is a
small one like the 2.25 megawatt diesel generator used in the test at INL. Of
course, a successful attack on a 2.25MW generator is unlikely to cause much
disturbance in the power grid). Unless the attackers have managed to bribe an
employee of the company that operates the generator being attacked to let one
of them accompany the employee into the yard, it’s very unlikely they could
ever be in a position to carry out the physical attack.[v]
Thus, I can safely say that nobody needs to stay up late at
night worrying that the next morning their lights won’t work due to an Aurora
attack on the generation facility that powers their neighborhood. In fact, any
attack on a single generator - even a generator in a huge plant like the Grand
Coulee Dam, the largest power source in North America - is unlikely to lead to
anything more than a local outage of a couple of hours; it certainly won’t
cause a cascading outage like the 2003 Northeast blackout. This is because
there's all sorts of redundancy built into the grid, so that no single
generation failure - or even two or three simultaneous failures - can have a serious
impact, or even any impact at all.
So what’s an out-of-work grid attacker to do? If he wants to
have a big impact, he needs to physically attack multiple strategic high
voltage Transmission substations simultaneously – at least 7-8 of them, although
more would be better. However, the attacker would have to know exactly which
ones to attack (I’ve heard there are between 7 and 15 substations that would
need to be attacked to have a serious impact on the grid. However, don’t expect
me to publish a list of them, if I ever run across one).
Moreover, the attackers would need to simultaneously conduct
a Metcalf-style physical
attack on each substation, yet at the same time avoid the mistakes that the
Metcalf attackers (who have never been identified, let alone caught) made. In
fact, since the only good definition of “substation” that I know of (there is
no NERC Glossary definition) is “a bunch of expensive equipment surrounded by a
fence”[vi],
this should show there’s no way to launch a purely cyber attack on a substation,
since there’s no central piece of equipment like the generator in a generating
plant, and the devices in the substation aren’t usually on a single network.
Most importantly, the attacker would first need to go back
in time at least 5-6 years, before the CIP-014 standard for physical security
of substations - which was developed in response to the Metcalf attack - came into
effect. This is because CIP-014 is probably one of the most effective NERC
standards ever developed.
In fact, after there were initial concerns that only a couple
hundred substations would be declared in scope for CIP-014, well over 1,000
were declared (NERC says there are about 25,000 Bulk Electric System
substations all told. BES substations are all either low or medium impact. Only
a subset of the latter are in scope for CIP-014). The power industry was
clearly worried that the Metcalf attack was just a test run for The Big One, so
they invested a lot of money in CIP-014 compliance (including measures like
ballistic barriers around substations).
In short, I think it’s close to impossible for a cyber or
physical attack based on Aurora, or frankly any other cyber vulnerability, to
succeed in causing an outage of any size, let alone a cascading outage. There
has never been an outage of any size caused by a cyberattack in North America.
If you want to worry about a grid attack that would have a
huge impact, I suggest you read about EMP attacks,
in which a nuclear weapon is detonated about a mile above the US. Such an
attack could conceivably fry most of the large transformers on which the grid
depends – and which take a year or so to replace. However, a massive solar
storm (or the explosion of a large meteor, like the one that did in the
dinosaurs) could produce similar devastation.
In fact, the real cause for worry is any prolonged (say, more
than two weeks) and widespread outage (say, over several states), no matter
what its cause (like a hurricane more massive than Sandy). This could result in
hundreds or thousands of people dying and civil order breaking down. The fact
that there is such a miniscule likelihood that this could be caused by a cyber
or physical attack doesn’t mean it’s a waste of time and money to harden the
grid against such attacks. After all, risk = likelihood X impact. A small
likelihood times an unimaginably large impact[vii]
still yields a high risk.
My blog is more popular than
ever, but I need more than popularity to keep it going. I’ve often been told
that I should either accept advertising or put up a paywall and charge a
subscription fee, or both. However, I really don’t want to do either of these
things. It would be great if everyone who appreciates my posts could donate a $20 (or more) “subscription fee” once a year. Will you
do that today?
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[i] Renewable
power generating devices like wind turbines and solar panels produce direct
current (DC). The DC current is changed to AC before being sent out over the
power grid. The device that effects this transformation is an inverter. The Aurora
vulnerability does not apply to such devices.
Kevin Perry pointed out that wind turbines and solar panels
operate differently from an electrical point of view: “Solar panels produce DC
power, which has to go through an inverter to be delivered to the grid.
However, the wind turbine, which is a rotating machine, is an induction
motor that generates AC power. But, the speed of the turbine, which
varies with wind speed, affects frequency, so the generated power goes through
a rectifier to convert it to DC and then an inverter to convert it back to AC
at a stable frequency. It is essentially the same way an uninterruptible
power supply works.”
[ii]
Kevin corrected what I said in this paragraph by saying, “The generator is not
sync’d based on frequency. It is sync’d based on the phase angle of the
generator versus the grid. If I recall, and I don’t profess to be an
engineer, the phase angle, also called phase shift, is the time lag between
voltage and current, whereas frequency is the number of cycles per second,
or how many times the current changes direction per second. The
protection relay is normally set to prevent connection to the grid unless the
phase angle is within a couple degrees of zero. The relay was
deliberately misconfigured by the hack to change the connection phase angle to
120 degrees, which causes the worst torque.”
[iii]
SEL is the largest manufacturer of electronic relays worldwide; the relay used
in the test may have been theirs. While I have always found SEL staff members
to be honest and above board, the fact that the authors were SEL employees should
be kept in mind when reading the article.
[iv] After
pointing out that good security practices on the protective relay would have
prevented the test from succeeding, the article continues with a discussion of recommended
controls for the generator that also were not in place during the test.
[v] Regarding
this paragraph, Kevin stated, “As you noted, an intruder can manually cause the
damage by operating the breaker panel T handle in the substation control house,
not at the breaker itself. But, it doesn’t have to be the generator
switchyard. Any station along the generator lead line offers a
connect/disconnect point.”
[vi]
This is my definition, in case you hadn’t guessed.
[vii] Ted Koppel put out a good book about
this problem, Lights Out, in 2016. It's supposedly about what would happen if a
cyberattack caused a prolonged, widespread outage, but it's really about what
would happen no matter what the cause of the outage; it’s frightening, but well
researched and an easy read.
No comments:
Post a Comment