August 14 is the 20-year anniversary of the 2003 Northeast blackout. This event had a profound impact on the world we live in today in many different ways; one of them was mandatory regulation of the power industry. Here is my summary of what happened:
·
The Wikipedia article states,
“The blackout's proximate cause was a software
bug in the alarm system at the control room of FirstEnergy.”
Frankly, this is a stupid statement. It’s kind of like saying that the cause of
World War II was the German invasion of Poland. There’s a big difference between
a triggering event and a cause. This bug was a triggering event.
·
The causes were multiple:
An important one was the fact that compliance with the NERC reliability
standards at the time was voluntary. First Energy, as well as probably other
utilities, had violated the NERC requirements for trimming trees under high
voltage transmission lines. Since high levels of load cause lines to sag and
the load levels were high on that hot day, the lines sagged into treetops,
which caused them to be tripped by their protective relays. When one line shorted
out, its load was automatically distributed to other lines, which themselves
began to sag, encountered trees, and tripped. And so on.
·
Finally, when the last
major line into the area tripped, northern Ohio became a virtual black hole for
electric power, trying to suck in as much power as possible from all its
neighbors. In turn, they tried to get all the power they could from their
neighbors – and voila!, the cascading outage began. Within six minutes
after the last line failed, the event was over. Much of the Northeastern US
(excluding New England), Detroit and eastern Michigan and most of Ontario up to
Hudson Bay had blacked out. 508 generating units at 265 power plants shut down
during those six minutes.
·
But lack of tree
trimming wasn’t the only cause. What made the blackout much worse than it had
to be was that the protective relays protecting many generating units from grid
instability were (in retrospect) set to trigger at a much lower level of
instability than necessary. NERC’s monumental six-volume study
of the blackout (which I skimmed through once, not that I could have understood
most of it anyway) pointed out that (again in retrospect) many of those units would
not have had to shut down if the settings had been more forgiving.
·
The resulting total
blackout required activation of “blackstart plans” in many areas. Since most
generating units require some amount of external power to start up, the
blackstart plans lay out a complicated path for utilizing the small amount of
power still available (e.g. power from hydroelectric plants or power from
generating units that have diesel-powered emergency generators available) to
one-by-one energize lines, then more generating units, then more lines, etc. –
until the generation in the area covered by the blackstart plan is fully
operational.
·
However, all of this
could have been avoided if there had been better visibility into what was going
on in Ohio. The software bug in the First Energy control center caused the
alarms – which would otherwise have been flashing deep red – to be suppressed.
The operators in the control center saw nothing but green screens and were
happy as clams…until they weren’t. But even that might not have mattered, had a
technician at what was called at the time the Midwest Independent Transmission
System Operator (now the Midcontinent ISO) not left for lunch after fixing a
problem with the “state estimator” system.
·
That system would have
detected the growing divergence between electricity load and supply in northern
Ohio (exacerbated by a couple of major plant outages in Cleveland) had it been
active. However, the technician forgot to turn the system back on when he left.
When it finally was turned on, it immediately became clear that northern Ohio
was going to hell in a handbasket very quickly.
·
On the other hand, it’s
not clear that the human beings monitoring these systems would have reacted
correctly if they knew how bad things were. To end the imbalance between load
and supply, they probably would have had to black out the entire city of
Cleveland and the surrounding area. Would they have been authorized to do this
and if not, would they have been able to get that authorization quickly enough
to do any good, given how rapidly the imbalance between load and supply was
increasing? Of course, Cleveland and many other cities ended up being blacked
out, so it would clearly have been better if they system operators had taken
this step.
Of course, all these causes (and
others besides them) were correctable and since have been addressed with policies,
standards, etc. But that required something else: mandatory standards for
utilities to follow. That’s what Section
215 of the Electric Power Act (EPAct) of 2005 provided. It required the Federal
Energy Regulatory Commission (FERC) to enforce mandatory reliability standards
for the electric power industry.
To facilitate this, Section 215
ordered FERC to engage an “electric reliability organization” (ERO) to draft mandatory
reliability standards as well as audit compliance with them, all under FERC’s
oversight. Of course, there wasn’t much question that NERC was the only
organization that could possibly be the ERO, and FERC chose them.
Section 215 also ordered that
mandatory cybersecurity standards be developed for the power sector. At that
time, the main NERC cybersecurity effort was a set of voluntary requirements
called Urgent Action 1200, which applied just to large control centers. NERC
was working on an update called Urgent Action 1300 but in 2016, NERC rolled
that effort into development of version 1 of the CIP standards.
So the CIP standards, which were
(I believe) the first cybersecurity standards that applied to Operational
Technology (OT), outside of perhaps military standards, were very much a
consequence of the Northeast blackout. While I have certainly expressed my
disagreement with many aspects of CIP over the years (you might go back and
read the at least 400 posts I’ve written on CIP, starting with my first post on
this blog in early 2013), and while I continue to believe that compliance with
the CIP standards is much more expensive than it should be, there’s no question
in my mind that the standards have done a great job of securing the North
American power grid against cyberattacks.
Moreover, I think all the NERC
reliability standards (which cover all aspects of operating the power grid,
including – thanks to the blackout – relay settings) have proven very important,
not just the CIP standards. While local outages happen all the time (with
squirrels being one of the main causes), there has not been a cascading outage
since 2003, or even just a large outage that wasn’t caused by a natural
disaster like a hurricane. The dedicated people of NERC (and the six NERC
Regional Entities that explain and audit the standards) deserve all our thanks.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would
like to comment on what you have read here, I would love to hear from you.
Please email me at tom@tomalrich.com.
No comments:
Post a Comment