I hope it’s
obvious by now that I firmly believe
CIP-013 is a standard for risk management, as stated in the Purpose (found in
Section 3 of the standard): “To mitigate cyber security risks to the reliable
operation of the Bulk Electric System (BES) by implementing security controls
for supply chain risk management of BES Cyber Systems.” In other words, CIP-013
requires NERC entities to mitigate supply chain risks to the BES by managing them.
You manage
the risks by developing a plan to identify the most important risks in your
environment, and then mitigating those. As I mentioned in my last
post, this inherently admits that there are a lot of less-important risks
that you simply won’t mitigate at all, since nobody has an infinite budget for
this. But as I’ll discuss below, taking this approach in theory guarantees that
you will mitigate the most supply chain security risk possible.
However (and
I realize this will come as shocking news to you), I have to say there isn’t
100% agreement with my opinions in the NERC community, even though nobody has
ever come up to me on the street and said “Hey, Alrich! You know, you’re full
of s___ on CIP 13, or even sent me an email to that effect. However, nobody has
ever provided me with a full description of an alternative way to comply with
CIP-013, one that follows the wording (I’ve heard and/or read a couple
methodologies that pretty much pretended
R1.1 doesn’t exist).
So I know that
many and probably most NERC entities won’t follow what I’m saying about CIP-013
compliance (which I flatter myself to say is very close to what Lew Folkerth of
RF is saying. BTW, you can now get Lew’s two articles on CIP 13 – without having
to download the whole 13 MB newsletter – by going here
and here.
Lew says there will be a third article in this month’s newsletter, which should
be posted any day on RF’s website. You can sign up to get notification of those
newsletters by going to RF’s home page and
scrolling to the bottom).
But I have thought about how most entities
will probably comply with CIP-013, and I admit it certainly won’t be a disaster
for the grid: They’ll follow best practices. And where will they find those?
Well…everywhere. You can think of NIST 800-161, NIST 800-171, NIST 800-53, the
white papers produced by APPA/NRECA, NAGF, UTC and others…as all best
practices. But since even just one of the NIST documents would provide more
than enough best practices for even the largest utility to ever be able to
implement, how does the utility decide which ones it will adopt and which ones
it will ignore? Of course, the reason for this question is what I mentioned in
the second paragraph above: No utility (or any organization, for that matter)
has an unlimited budget available for implementing best practices for supply
chain risk management, or for that matter any other worthy goal.
And here’s
the difference between risk management and best practices: If you decide to
address supply chain security (and/or CIP-013 compliance, since in the case of
CIP-013, security and compliance are literally the same thing) using a risk
management approach, you will in theory ensure that every dollar or hour of
staff time that you spend on supply chain security will yield the maximum
possible return, which means reduction in supply chain security risk. In other
words, if you follow the risk management approach, you are sure to achieve the
most bang for the buck. This is because you will rank your risks by their
importance (i.e. their degree of risk, which I define as likelihood plus
impact) and only mitigate the most important ones.
If you take
the best practices approach, you have no way of being sure that you are getting
the greatest possible risk reduction for the resources you expend. This is
because the return from mitigation (and best practices are of course
mitigations for particular threats, although the threats aren’t usually
explicitly stated in documents like NIST 800-53) depends entirely on the degree
of risk posed by the threats that you’re mitigating. You could spend exactly
the same amount of time and money mitigating a very serious threat (say one
that has a very high impact and a moderate likelihood) as you would on a much
less serious threat (one which also has a very high impact, but which is very
unlikely to be realized), yet in the first case you would be reducing a lot of
risk, while in the second case you would be reducing very little risk. And the
problem with taking the best practices approach is that you don’t have any
structured way to distinguish between the two cases, because you’re just applying
mitigations that you like for one reason or another; you aren’t explicitly
considering the degree of risk posed by the two threats that you’re mitigating.
For example,
let’s take two very serious supply chain security threats:
- The threat that you will install a software product on a
BES Cyber System that includes a piece of third party code containing an
undisclosed back door. A malicious third party learns of this back door
and exploits it to cause damage to the BES.
- The threat that you will purchase a BCS with a motherboard,
into which a chip containing a back door has been inserted. A malicious
third party learns of this back door and exploits it to cause damage to
the BES.
What is the
degree of risk posed by each of these threats? The impact of the threat if
realized is the same in each case: high. It doesn’t matter whether an attacker gains
control of a particular BCS using a hardware back door or a software back door.
What they can do once they gain control is exactly the same.
But what
about the likelihood? I’d say it’s high in the case of the first threat, since
it’s happened multiple times. There have been a number of cases
of software back doors. But how about the second threat? It has certainly been
talked about a lot, and was the subject of a big Bloomberg article at the end
of last year. But the article has been widely doubted, as well as denied by a
couple of the “victims” mentioned in it, including Apple and Amazon. There may
have been a successful attack that I haven’t heard of, but in any case I think
this threat has a low likelihood of being realized.
Since I
believe the risk of a security threat being realized is the sum of the
likelihood and impact, and if we assign values of 1/2/3 to low/medium/high
respectively, this means the risk score of the first threat is 6, while the
risk score of the second threat is 4. Clearly, the first threat poses a significantly
higher risk than the second.
But what
does it cost to mitigate each of those threats? For the first threat, my guess
is most utilities will just use the mitigation of only buying BCS software from
a trusted vendor; they will trust their vendor to only incorporate third-party
software from sources they trust, who
won’t plant back doors in their products. Of course, there are more expensive
mitigation steps they can take, like purchasing various software vendor risk
services, doing penetration testing to find back doors or perhaps signing up
for aDolus, a really interesting service
I first learned about last week.[i] These
all have a fairly moderate cost.
On the other
hand, for the second threat, the sky’s the limit when you talk about mitigation
cost. You can install an electron microscope and look for traces on the board
or changed features of the chip that might give away that it’s different from the
normal one (and of course, you have to do this for every chip on the
motherboard, since there’s no way of knowing beforehand which one might be a
counterfeit). You could also have a team fly to the country of origin of the motherboard,
examine component inventories in the factory, inspect the factories where the
components are made, etc. And if this were a very high-likelihood threat as
well as very high-impact, this might be justified.
My point in
discussing cost is that it’s very unlikely that the cost of mitigating the
second threat is any less than the cost of mitigating the first threat, and it’s
very likely to be far higher. But even if the costs are the same, the fact that
the risk being mitigated is higher in the first threat than it is in the second
means that the risk reduction achieved will be greater. And this means the
return on the investment of mitigating the first threat is higher than the
return on the second threat.
Yet I have had
discussions with people who think that mitigating the second threat is as
important as mitigating the first one; if you asked them for best practices,
they would probably recommend you invest as much time and/or money in finding
rogue chips as you do in verifying that your software vendor vets their vendors of software components
carefully.
Of course,
it’s unlikely that you will throw a lot of resources at mitigating a threat
that has low likelihood or low impact or both. But think about it: This shows
you’re at least on some level doing the risk analysis anyway! There is simply
no way you could determine, merely from examining the mitigation itself,
whether it mitigates a high-risk or a low-risk threat. The mitigation itself
has no risk score. You might get lucky and subconsciously perform the risk
analysis, then decide that your “gut feel” was that it was much better to spend
your money on verifying that your software vendor has a good handle on their
supply chain, than it was in verifying that no chips had been substituted on a
motherboard. But your gut feel might very well have told you just the opposite,
especially if you’d just finished reading the Bloomberg article.
To sum up, I
really don’t think just following a list of supply chain security best
practices (if you can find a realistic list targeted to supply chains for control
systems for electric utilities, which I haven’t seen yet) is going to lead you
astray. It’s certainly much better than not doing anything at all for supply
chain security (or CIP-013 compliance). But you’ll never be able to get as good
a return on your investment in risk mitigation as you would if you explicitly
considered risk from the start. This is why I think the risk management
approach is much better than the best practices approach. And it’s also why
FERC ordered
NERC to develop a risk-based standard in 2016.
Any opinions expressed in this blog post are strictly mine
and are not necessarily shared by any of the clients of Tom Alrich LLC.
If you would like to comment on what you have read here, I
would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that
if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or
challenges like what is discussed in this post – especially on compliance with
CIP-013. To discuss this, you can email me at the same address.
[i]
Full disclosure: After he showed me the product and I became very interested in
it, the founder of this company – a longtime friend - mentioned the idea of my
providing consulting services to them. I’m not exactly sure now if there’s
anything I can really do to help them, and I don’t want to spoil a friendship
by taking money and not providing value, so I’m not sure what will come of
this. But I don’t want to hide this fact.
Tom - nice one (and thanks for the mention of UTC). Agree completely. When dealing with utilities, "best practices" are a form of intellectual laziness. It's a cliche that no two are alike, but it's true. When I ask our member utilities what services they prioritize with their telecoms, there is no pattern, other than teleprotection comes first. So I've migrated to "best questions", which is probably a cute way of saying... do the risk analysis!
ReplyDeleteThanks, Bob. I like your perspective!
ReplyDelete