Last Monday, I wrote a post
about the comments that Kevin Perry, former Chief CIP Auditor of SPP Regional
Entity, made on this
post, which discussed why it actually made sense for Colonial Pipeline to
shut down their operations, due to a loss of their billing system in a
ransomware attack.
The whole point of that post, as well as the previous posts
I’d written on Colonial (which BTW you can find by searching on the main page of my blog. Until last
summer, there was no way to search the blog, making it hard for me to find
previous references to a subject, and close to impossible for readers to. I was
quite glad when search was added) – starting with this
one – was that a “purely IT” incident can affect the OT network, even if
there’s no direct connection.
In the case of Colonial, the loss of their billing system
meant that they couldn’t track who put how much gasoline into their pipeline
and when, and who withdrew how much and when. For an electric utility, the loss
of this capability wouldn’t require shutting down power transmission and
distribution, since the utility can always bill for power used later (i.e. the
meters will keep operating); and if the utility can’t bill later for some
reason, they still need to provide power, because they’re…well, a utility.
But Colonial doesn’t own the gasoline in their pipeline;
they’re transporting it, just as a mover transports your household goods to a
new city. If the mover loses your goods
on the way, they’re on the hook for the entire value of those goods. By the
same token, if Colonial keeps shipping gasoline while their billing system is
down, they’ll literally lose track of what any one shipper has put into the pipeline,
and will end up owing every shipper the entire value of their gasoline.
In last Monday’s post, I started by saying there were three
questions that needed to be answered:
1.
How can we identify systems that don’t directly
control operations, yet can have a huge impact on operations just the same (i.e.,
IT systems that perform functions required for operations)? And when we’ve
identified them, what measures can we take to protect them better than other
systems on the IT network that clearly have no direct operational impact, like
say the systems that run the utility’s retirement plan?
2.
Should those systems be regulated by OT-focused
cybersecurity compliance regimes, such as the dreaded…(and here I cross myself,
despite not being Catholic)…NERC CIP?
3.
Or maybe we need to go beyond all this talk
about regulation and protecting systems, and think about what the real problem
might be?
To summarize what I think Kevin said in that post, he answered
the first question by in effect saying, “Any system on the IT network whose
loss or misuse can impact operations, like Colonial’s billing system, should
be protected like OT systems are, including being isolated from other IT
systems.”
Kevin answered the second question by in effect saying, “Any
system whose loss or misuse can affect Bulk Electric System operations within
15 minutes (essentially, the BES Cyber Asset definition) should be classified
as a BES Cyber System (BCS) and placed within the Electronic Security Perimeter
(if the asset at which it’s installed is classified as Medium or High impact).”
An example he gave of this is a mistake he saw more than
once in his ten-year NERC CIP auditing career: a NERC entity didn’t classify
their historian as a BCS and installed it in the IT network, not the ESP.
However, in the cases Kevin discusses, the historian was used for real-time
monitoring purposes, and therefore should have been classified as a BCS. So it
should have been installed in the ESP to begin with.
This is stretching what Kevin said a little, but one might
draw the implication that, if a system’s loss or misuse doesn’t directly impact
the process being controlled (which, in the case of an electric utility subject
to the NERC CIP standards, is the smooth and uninterrupted operation of the
BES. In the case of Colonial Pipeline, it’s the smooth and uninterrupted
transport of natural gas in their pipeline system), then a) it’s OK to install
it on the IT network, and b) it doesn’t need to be subject to special
regulation, beyond a general obligation to follow good cybersecurity practices.
However, there are two cases I can identify in which the
shutdown of the IT network directly required shutting down OT, even though
there were no systems on the IT network that directly impacted the process
being controlled by OT. One case is from
2018, when a serious ransomware attack on a very large electric utility’s
IT network required shutting down the control centers as well – even though the
ransomware never spread there.
The other case was cited by Tim Conway of SANS in a webinar
earlier this year (which was quoted in Utility Dive). In 2017, the Not
Petya malware (which was based on the Petya ransomware, except that not Petya
didn’t even bother to save the encryption key after encrypting the victim’s
systems – it simply threw the key away. The purpose of Not Petya was to cause
havoc, pure and simple. And it did; about $10 billion worth of havoc – for
which, naturally, Russia has never been held accountable. Do you notice a pattern
here?) brought down the entire global operations of the Swedish shipping giant
Maersk.
Tim pointed out in the webinar (reported in this
post) that no operational systems like cranes were affected by the attack on
Maersk’s IT network. However, because of the loss of its IT systems, Maersk no
longer knew what was in the containers it was shipping – meaning it really
couldn’t guarantee that a container shipped to Company A was actually picked up
by the correct recipient, rather than somebody else. This is very close to the
situation that Colonial Pipeline faced when they lost their billing system. In
both cases, the company shut down operations (although in the case of Maersk, operations
were down for two weeks, vs less than a week for Colonial. On the other hand,
given the devastation that Maersk suffered, the fact that it only took them two
weeks to get up and running again isn’t much short of a miracle).
In other words, these two cases show us that the security of
the IT network can be essential to the correct operation of the OT network, and
– at least in the case of a complete loss of the IT network, as happened with
Maersk and the utility in the 2018 incident – some IT incidents can require
shutting OT down, even when there’s no particular system on the IT network
whose loss requires the OT shutdown (as was the case with Colonial).
So we’re fooling ourselves if we think that our OT network
is protected from all disturbances on the IT network, even though we may have made
it impossible for an attacker to penetrate the OT network from IT – just like
the French were fooling themselves when they built the Maginot Line after World
War I, to prevent another German invasion – even though there was no way the
Germans could have crossed the line to enter France. And this is just as true
with Electronic Security Perimeters. True, CIP-005 R1 and R2 provide formidable
protections against an “invasion” that comes through the IT network. But they
don’t protect against all compromises, especially ones that magically bypass
the ESP, like in the 2018 ransomware case.
So is the solution to apply the full NERC CIP requirements
to IT systems, as well as OT systems? God forbid! I wouldn’t wish the current
NERC CIP requirements – in all their prescriptive glory – on my worst enemy.
However, if and when the NERC CIP standards are rewritten as risk-based, and when
there are important changes made to NERC’s CIP compliance regime (as I
discussed in this webinar
in 2019), then it will be possible to regulate both IT and OT systems, but in
different ways, commensurate with the risks posed by both types of systems.
To go back to my three original questions, Kevin and I answered
the first two. But what about the third? That is, instead of just talking about
regulating and protecting IT vs OT systems, maybe we need to think beyond that
silo? What’s the real problem we need to address?
Fortunately, there’s someone who thinks about what the real
problems are: Tim Roxey, who has appeared in this blog before.
He replied to the same post that Kevin did, saying (in the inimitable English
dialect known as Roxey-speak):
I was
in Whole Foods couple of weeks ago. Heavy storms moving in but I was in
underground parking.
I’m
pushing about my cart when an announcement comes over the speakers. Please all
shoppers stop shopping. We have lost our cash registers due to lightening in
the areas.
Me
thinks. I have cash. I’m good.
Me
thinks wrongly. Somehow the Point Of Sale device can’t process the sales in
cash cuz the credit side is down.
Harumph.
No, it was the people and a branch point in their processing that
broke.
We are
so dependent on our “usual processes” that we fail to see the
alternatives.
Colonial
failed as well.
If you
are CIKR then this is Wrong. Be CIKR AND operate as such.
This was of course quite interesting, but it wasn’t…how can
I say this?...definitive. So I wrote back to Tim and asked him two
questions: “Do you think some sort of regulation of these systems is necessary?
Or are you saying that changing the utility’s (or pipeline company’s) whole modus
operandi is required to fix these problems?”
Tim replied:
Actually if we look at this differently, we see
opportunity.
Apply regulations that address People, Processes, and
technology. Stop concerning ourselves with IT/OT as the technology of
applicability. If you can have the People pull the plug because their
Processes (Recovery) or Technology (IT bleeding into OT) has led to a condition
of uncertainty (The function of CEO is RISK) then the regulations were not so
much fantastic.
The regs in Colonial Pipeline simply do not exist. Their
Issue was IT not OT and hence most NERC Regs would not apply even if they
existed in TSA world.
Requiring Baseline Regulations that hit all three factors;
- the People that operate inside
- Processes that control CI Functions that employ
- Technology to perform the Critical Infrastructure
functions (National Security Functions)
Good Regulations address all three.
Bottom line – Regulations tend towards baselines. Centers of
excellence (Think INPO) tend towards Ceilings of excellent performance (best
practices). Ceilings tend to include a better, more mature understanding of
Risk. Not just the usual Vulnerabilities, Threats and Consequences stuff but
also internal risks of how the People and Processes Parts and Technology parts
interact. The People being unduly influenced by their knowledge of the
processes (or lack thereof ) and the misunderstandings of the technology (IT
really can touch OT) leads to enough uncertainty that conservative calls to pay
Ransom are made.
As with all oracular statements (i.e. statements that a true
oracle
makes. And no, that’s not Larry Ellison), these are subject to many
interpretations. I’ve reproduced Tim’s exact words (with a couple minor grammar
corrections), so that each of us can draw our own interpretation from them.
Here’s mine:
·
You’re missing the
boat if you focus all of your attention on the question of IT vs. OT. That’s
not the issue.
·
The real issue – for
both cyber regulations and best practices – is people, processes, and
technologies. Get those right, and you won’t have to worry about IT vs. OT.
·
Don’t just pay
attention to PPT in three silos, but look at how people, processes and
technologies actually interact – as in the case of Whole Foods, where a
needless dependence of cash payment systems on credit card payment systems made
it impossible for this Whole Foods store to sell anything at all.
·
And just as important,
make sure that people understand how the processes and technologies actually
work, since for example a belief that OT exists safe behind its Maginot Line defenses
can lead to a pretty rude awakening, just like in France in 1940.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. Nor
are they shared by the National Technology and Information Administration’s
Software Component Transparency Initiative, for which I volunteer as co-leader
of the Energy SBOM Proof of Concept. If you would
like to comment on what you have read here, I would love to hear from you.
Please email me at tom@tomalrich.com.
No comments:
Post a Comment