Lew Folkerth of the RF Region and Chris Holmquest of SERC wrote a great article recently titled “The emerging risk of not using cloud services” (my emphasis); I linked to it in this post. It eloquently made the point that, far from the cloud being too risky for NERC entities to use, the risk of not using the cloud is currently greater – and growing all the time. This is because more and more software and service providers (especially security service providers) are announcing that in the coming years they will only offer a cloud-only version of their product, or at least they will no longer implement new features in their on-premises version. Both reliability and security will suffer as a result.
One sentence in the article stood out for me: “New
Reliability Standards will be required, and those standards will need to be
risk-based.” The article goes on to say, “There are too many variables in cloud
environments to be able to write prescriptive standards for these cases.”
Of course, I completely agree with this statement, but how
will this work in practice? Fortunately, these won’t be the first risk-based
standards. The honor of being the first entirely risk-based standard goes to
CIP-013, which came into effect in 2020. If any of you were reading this blog
regularly at that time, you may remember I was a huge fan of the fact that
CIP-013 is a risk based standard; I was sure it would be very successful and
make everyone in NERC (both NERC entities and NERC auditors) love the idea of
risk-based standards. But it didn’t quite work out like I’d hoped. Here’s my
story:
I had been writing about CIP-013 from the beginning,
when FERC issued Order 829
in July 2016. I participated in, and wrote about, the drafting team’s efforts
to develop the standard. And I celebrated in 2018 when FERC approved
CIP-013. Plus, I was the first person (that I know of) that advocated for
pushing back the compliance date for CIP-013 when it became
clear in March 2020 that Covid-19 was going to require the electric power
industry - as well as almost every other industry - to drastically alter how
they conducted their business.
Not coincidentally, I worked with some NERC entities to understand
what CIP-013 compliance means. In the process of doing that, I developed what seemed
to be a clear interpretation of what CIP-013 required (I wrote about this
interpretation in a number of posts in 2017-2020). Here is my summary of the
requirements of CIP-013 (it doesn’t matter whether you’re looking at version 1
or 2 of the standard; the only important difference between the two is greater
applicability in v2):
1.
CIP-013-2 R1 requires the
NERC entity to “…develop one or more documented supply chain cyber security
risk management plan(s) for high and medium impact BES Cyber Systems…” R1 goes
on to list, in R1.1 and 1.2, items that must be included in the plan.
2.
R1.1 requires that the
plan “…identify and assess cyber security risk(s) to the Bulk Electric System
from vendor products or services resulting from: (i) procuring and installing
vendor equipment and software; and (ii) transitions from one vendor(s) to
another vendor(s).”
3.
R1.2 lists six risks
(or more specifically, mitigations for risks) that must be included in the
plan. These six risks had been cited by FERC at different places within Order
829, as items that needed to be included in the plan. The Standards Drafting
Team that developed CIP-013 gathered these all into one Requirement Part: R1.2.
4.
R2 requires the NERC
entity to implement the plan they’ve developed in R1.
5.
R3 requires the NERC
entity to review and update the plan every 15 months.
That’s it. The entire Standard
(minus the Measures) fits on a single page. When it was developed, I marveled
at the sheer simplicity of CIP-013.
Of course, the heart of CIP-013 is
R1, since that describes the plan that’s required. I interpreted R1 (and still
do) to mean that the NERC entity must do the following:
1. Identify supply chain cybersecurity risks to the Bulk Electric
System resulting from the BES Cyber Systems the entity may procure. Where
should the NERC entity look to find these risks? Of course, there are lots of lists,
the NATF Criteria being one list that is especially relevant to the BES. But the
entity doesn’t have to confine itself to a pre-conceived list. One way to
identify risks is to read the news.
Here’s an example of that: Right
after CIP-013 came into effect in 2020, the SolarWinds
attack was discovered (when it was discovered, the attackers had been present
in SolarWinds’ development network for 15 months. During that time, they carefully
prepared and tested their malware before infecting seven releases of the Orion
platform. They even started by introducing a proof of concept for their malware
design, to see if it could infect the platform with a benign piece of code;
that succeeded. I fully expect the attackers to publish a case study of this engineering
triumph someday).
Surely, the risk that a software
supplier has an insecure development network is a big risk. One mitigation for
that risk would be requiring your software suppliers to fill out the Attestation Form
that CISA recently released for compliance with Executive Order 14028.
2. Rate each risk as high or low, based on its likelihood and
impact. In this case, estimating impact is easy: BES Cyber Systems are classified
as such precisely because the impact of their loss, compromise, etc. is high.
This means the impact of any supply chain attack on BCS will always be high. Therefore,
the only real variable is likelihood. To rate each risk, the entity must simply
ask, “Is the likelihood high or low that this risk will be realized?” If
likelihood is high, risk is high; if it’s low, risk is low as well.
Fortunately, if you just divide
likelihood into high and low levels, estimating it is easy. For example, someone
may point out that, if even a small meteorite crashed into your relay supplier’s
factory, the factory might be incinerated; that would of course mean your
organization would need to find another supplier of relays. That’s a huge
impact, but what’s the likelihood? Probably less than the likelihood of being
struck by lightning on a sunny day. This is a low risk.
Once you’ve rated your risks as
high or low, you then need to focus on the high ones; obviously, those are the
only ones that need to be mitigated. But are you obligated to mitigate every
high risk? No. An important principle of risk management is that no
organization has unlimited resources available for risk mitigation. Your
organization needs to decide which of the high risks you can afford to mitigate,
and just focus on those. This means you should assess your vendors just based
on the risks you’re trying to mitigate. In your questionnaires, you shouldn’t
ask a vendor about a risk if you don’t care what their answer is. You’re just
wasting your and the vendor’s time.
3. Once you have developed a list of risks you wish to
mitigate, you need to add to that list the six risks in R1.2 (if you haven’t already
identified them as important risks independently). You need to do this, not
because these are the most significant supply chain cybersecurity risks to the
BES (although they are all important risks), and certainly not because they’re the
only supply chain risks to the BES. The SDT included those risks in R1.2
because FERC had mandated them at various disconnected places in Order 829. In
other words, the SDT was saying, “We want you to identify risks you think are
important and mitigate them. But, since FERC wants these six risks to be in
your plan, you need to make sure you include them as well.”
Given my interpretation of
CIP-013, how did I think it would be audited? It seemed quite logical to me:
a)
R1.1 would be audited
based on how good a job the entity did of “identifying and assessing” risks. If
they had made an honest effort to determine at least some of the most important
supply chain cyber risks to the BES, that would be fine.
b)
R1.2 would be audited
based on whether the entity included the six risks in R1.2.1 – R1.2.6 in its
plan.
c)
R2 would be audited
based on how well the entity implemented its plan – i.e., whether it took steps
to mitigate all the risks it had said it would mitigate in the plan.
d)
R3 would be audited based
on whether the entity had reviewed its plan every 15 months, and whether they
had honestly taken steps to fix any problems or deficiencies they found in the
plan.
During the runup to CIP-013
implementation in 2017-2020, I wrote a number of posts on what CIP-013 means,
in which I elaborated on the above logic. Frankly, I thought that logic was so
compelling that it would be widely adopted by NERC entities. After all, why
would the CIP-013 drafting team tell NERC entities to develop a plan to “identify
and assess” risks to the BES if they didn’t mean it?
But I was wrong. From what I’ve
heard, there are few NERC entities that have interpreted CIP-013 to be about
anything more than R1.2.1 – R1.2.6. And now, I wonder why I ever thought
otherwise. After all, if NERC entities have learned anything from their 15 or
so years of experience with NERC CIP compliance, it’s that they need to keep
their “compliance footprint” as small as possible. That is, they need to keep
their nose close to the grindstone and never stray beyond the strictest
possible interpretation of the standards. To do anything more than what’s
strictly required doesn’t win you any Brownie points; in fact, it might
possibly leave you with a completely avoidable violation – an “own goal”, if
you will.
However, I’m not blaming NERC
entities for this situation. I’m also not blaming NERC, and certainly not the
auditors. I’m blaming these two facts:
First, the standard, which I had
admired for its pristine simplicity, was in retrospect too simple. Instead of
simply telling NERC entities to “identify and assess” risks, R1 should have
given them suggestions on how to do that within R1 itself. For example, R1.1
might have included a set of ten or so “areas of risk” that must be identified in
the plan, e.g. “vendor remote access”, “software development process”, “secure shipment”,
etc. The entity would be required to scrutinize each of these areas for risks that
they should add to their plan. In some cases, they would be justified in
ignoring one of those areas entirely; for example, if they don’t allow vendor
remote access at all, they obviously don’t need to worry about securing their
vendor remote access system.
Doing this would also have given
the auditors something to hang their hat on when they audited the entity for
CIP-013 compliance, other than simply determining whether the entity had done a
good job of developing their plan. Instead, they could have verified that the
entity examined each of the ten areas and made a conscious effort to determine
whether there were important risks for them in each area. Since they didn’t do
that, NERC entities focused entirely on the six items that were clearly
required by CIP-013 R1: the six risks in R1.2.
Thus, the first lesson to be
learned from the CIP-013 experience is that, in the world of prescriptive requirements
like CIP-010 R1 (configuration management) and CIP-007 R2 (patch management), handing
a blank slate to both the entity and the auditor and saying “You can figure
this out for yourself” – which is unfortunately what the SDT did[i] - is asking for trouble.
Second and more importantly, NERC
auditors in general (i.e. for all the standards, not just the CIP standards)
aren’t trained to judge how well an entity has assessed and mitigated risks;
they’re trained to determine if they did or didn’t do X. While I’m sure some of
them, especially CIP auditors, understand risk very well (if for no other
reason than that almost every other mandatory cybersecurity standard is based
on risk), for many of them it’s a foreign concept. NERC needs to develop methods
for auditing risk-based requirements, not just prescriptive ones, and then
train the auditors on those methods.
Of course, fixing these two
problems won’t be easy. But if NERC CIP is going to make a successful
transition to the cloud, these two problems will need to be addressed.
Are you a vendor of current or
future cloud-based services or software that would like to figure out an
appropriate strategy for the next few years, as well as beyond that? Or are you
a NERC entity that is struggling to understand what your current options are
regarding cloud-based software and services? Please drop me an email so we can
set up a time to discuss this!
Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[i]
Having participated in many of that SDT’s discussions, I know why they made
this mistake: FERC had given NERC a strict deadline to develop and approve the
new supply chain security standard. The SDT couldn’t afford to add any provisions
to CIP-013 that might stir up controversy and result in extra ballots being
necessary (although there were controversies anyway). In other words, FERC’s deadline
backfired spectacularly. This points to a big problem with NERC’s standards
development process, at least when it comes to cybersecurity: you can have a comprehensive
standard that takes a long time to approve, or you can have a minimal standard
that gets approved relatively quickly. But you can’t have both.
No comments:
Post a Comment