Thursday, April 18, 2019

CIP-013: Risk Management vs. Best Practices



I hope it’s obvious by now that I firmly believe CIP-013 is a standard for risk management, as stated in the Purpose (found in Section 3 of the standard): “To mitigate cyber security risks to the reliable operation of the Bulk Electric System (BES) by implementing security controls for supply chain risk management of BES Cyber Systems.” In other words, CIP-013 requires NERC entities to mitigate supply chain risks to the BES by managing them.

You manage the risks by developing a plan to identify the most important risks in your environment, and then mitigating those. As I mentioned in my last post, this inherently admits that there are a lot of less-important risks that you simply won’t mitigate at all, since nobody has an infinite budget for this. But as I’ll discuss below, taking this approach in theory guarantees that you will mitigate the most supply chain security risk possible.

However (and I realize this will come as shocking news to you), I have to say there isn’t 100% agreement with my opinions in the NERC community, even though nobody has ever come up to me on the street and said “Hey, Alrich! You know, you’re full of s___ on CIP 13, or even sent me an email to that effect. However, nobody has ever provided me with a full description of an alternative way to comply with CIP-013, one that follows the wording (I’ve heard and/or read a couple methodologies that pretty much pretended R1.1 doesn’t exist).

So I know that many and probably most NERC entities won’t follow what I’m saying about CIP-013 compliance (which I flatter myself to say is very close to what Lew Folkerth of RF is saying. BTW, you can now get Lew’s two articles on CIP 13 – without having to download the whole 13 MB newsletter – by going here and here. Lew says there will be a third article in this month’s newsletter, which should be posted any day on RF’s website. You can sign up to get notification of those newsletters by going to RF’s home page and scrolling to the bottom).

But I have thought about how most entities will probably comply with CIP-013, and I admit it certainly won’t be a disaster for the grid: They’ll follow best practices. And where will they find those? Well…everywhere. You can think of NIST 800-161, NIST 800-171, NIST 800-53, the white papers produced by APPA/NRECA, NAGF, UTC and others…as all best practices. But since even just one of the NIST documents would provide more than enough best practices for even the largest utility to ever be able to implement, how does the utility decide which ones it will adopt and which ones it will ignore? Of course, the reason for this question is what I mentioned in the second paragraph above: No utility (or any organization, for that matter) has an unlimited budget available for implementing best practices for supply chain risk management, or for that matter any other worthy goal.

And here’s the difference between risk management and best practices: If you decide to address supply chain security (and/or CIP-013 compliance, since in the case of CIP-013, security and compliance are literally the same thing) using a risk management approach, you will in theory ensure that every dollar or hour of staff time that you spend on supply chain security will yield the maximum possible return, which means reduction in supply chain security risk. In other words, if you follow the risk management approach, you are sure to achieve the most bang for the buck. This is because you will rank your risks by their importance (i.e. their degree of risk, which I define as likelihood plus impact) and only mitigate the most important ones.

If you take the best practices approach, you have no way of being sure that you are getting the greatest possible risk reduction for the resources you expend. This is because the return from mitigation (and best practices are of course mitigations for particular threats, although the threats aren’t usually explicitly stated in documents like NIST 800-53) depends entirely on the degree of risk posed by the threats that you’re mitigating. You could spend exactly the same amount of time and money mitigating a very serious threat (say one that has a very high impact and a moderate likelihood) as you would on a much less serious threat (one which also has a very high impact, but which is very unlikely to be realized), yet in the first case you would be reducing a lot of risk, while in the second case you would be reducing very little risk. And the problem with taking the best practices approach is that you don’t have any structured way to distinguish between the two cases, because you’re just applying mitigations that you like for one reason or another; you aren’t explicitly considering the degree of risk posed by the two threats that you’re mitigating.

For example, let’s take two very serious supply chain security threats:

  1. The threat that you will install a software product on a BES Cyber System that includes a piece of third party code containing an undisclosed back door. A malicious third party learns of this back door and exploits it to cause damage to the BES.
  2. The threat that you will purchase a BCS with a motherboard, into which a chip containing a back door has been inserted. A malicious third party learns of this back door and exploits it to cause damage to the BES.

What is the degree of risk posed by each of these threats? The impact of the threat if realized is the same in each case: high. It doesn’t matter whether an attacker gains control of a particular BCS using a hardware back door or a software back door. What they can do once they gain control is exactly the same.

But what about the likelihood? I’d say it’s high in the case of the first threat, since it’s happened multiple times. There have been a number of cases of software back doors. But how about the second threat? It has certainly been talked about a lot, and was the subject of a big Bloomberg article at the end of last year. But the article has been widely doubted, as well as denied by a couple of the “victims” mentioned in it, including Apple and Amazon. There may have been a successful attack that I haven’t heard of, but in any case I think this threat has a low likelihood of being realized.

Since I believe the risk of a security threat being realized is the sum of the likelihood and impact, and if we assign values of 1/2/3 to low/medium/high respectively, this means the risk score of the first threat is 6, while the risk score of the second threat is 4. Clearly, the first threat poses a significantly higher risk than the second.

But what does it cost to mitigate each of those threats? For the first threat, my guess is most utilities will just use the mitigation of only buying BCS software from a trusted vendor; they will trust their vendor to only incorporate third-party software from sources they trust, who won’t plant back doors in their products. Of course, there are more expensive mitigation steps they can take, like purchasing various software vendor risk services, doing penetration testing to find back doors or perhaps signing up for aDolus, a really interesting service I first learned about last week.[i] These all have a fairly moderate cost.

On the other hand, for the second threat, the sky’s the limit when you talk about mitigation cost. You can install an electron microscope and look for traces on the board or changed features of the chip that might give away that it’s different from the normal one (and of course, you have to do this for every chip on the motherboard, since there’s no way of knowing beforehand which one might be a counterfeit). You could also have a team fly to the country of origin of the motherboard, examine component inventories in the factory, inspect the factories where the components are made, etc. And if this were a very high-likelihood threat as well as very high-impact, this might be justified.

My point in discussing cost is that it’s very unlikely that the cost of mitigating the second threat is any less than the cost of mitigating the first threat, and it’s very likely to be far higher. But even if the costs are the same, the fact that the risk being mitigated is higher in the first threat than it is in the second means that the risk reduction achieved will be greater. And this means the return on the investment of mitigating the first threat is higher than the return on the second threat.

Yet I have had discussions with people who think that mitigating the second threat is as important as mitigating the first one; if you asked them for best practices, they would probably recommend you invest as much time and/or money in finding rogue chips as you do in verifying that your software vendor vets their vendors of software components carefully.

Of course, it’s unlikely that you will throw a lot of resources at mitigating a threat that has low likelihood or low impact or both. But think about it: This shows you’re at least on some level doing the risk analysis anyway! There is simply no way you could determine, merely from examining the mitigation itself, whether it mitigates a high-risk or a low-risk threat. The mitigation itself has no risk score. You might get lucky and subconsciously perform the risk analysis, then decide that your “gut feel” was that it was much better to spend your money on verifying that your software vendor has a good handle on their supply chain, than it was in verifying that no chips had been substituted on a motherboard. But your gut feel might very well have told you just the opposite, especially if you’d just finished reading the Bloomberg article.

To sum up, I really don’t think just following a list of supply chain security best practices (if you can find a realistic list targeted to supply chains for control systems for electric utilities, which I haven’t seen yet) is going to lead you astray. It’s certainly much better than not doing anything at all for supply chain security (or CIP-013 compliance). But you’ll never be able to get as good a return on your investment in risk mitigation as you would if you explicitly considered risk from the start. This is why I think the risk management approach is much better than the best practices approach. And it’s also why FERC ordered NERC to develop a risk-based standard in 2016.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013. To discuss this, you can email me at the same address.

[i] Full disclosure: After he showed me the product and I became very interested in it, the founder of this company – a longtime friend - mentioned the idea of my providing consulting services to them. I’m not exactly sure now if there’s anything I can really do to help them, and I don’t want to spoil a friendship by taking money and not providing value, so I’m not sure what will come of this. But I don’t want to hide this fact.

2 comments:

  1. Tom - nice one (and thanks for the mention of UTC). Agree completely. When dealing with utilities, "best practices" are a form of intellectual laziness. It's a cliche that no two are alike, but it's true. When I ask our member utilities what services they prioritize with their telecoms, there is no pattern, other than teleprotection comes first. So I've migrated to "best questions", which is probably a cute way of saying... do the risk analysis!

    ReplyDelete
  2. Thanks, Bob. I like your perspective!

    ReplyDelete