Friday, April 12, 2019

CIP-013 and acceptance of risk



A longtime principle of risk management - for financial risks, general business risks, security risks, etc. – is acceptance of risk. To accept a risk is to acknowledge that it exists, but at the same time to decide that the costs of mitigating the risk outweigh the expected loss if the risk is realized. This is simply an acknowledgement that there are too many risks to make it worthwhile even to attempt to mitigate them all. Of course, this is especially true in cyber security, where there are a very large number of risks, if not an infinite number.

The team that drafted NERC CIP version 1 was composed of cyber security professionals who understood this principle very well. So they made sure that the draft of CIP v1 that was submitted to FERC made liberal use of acceptance of risk. Many requirements were written so that the entity could either perform what was required or “accept the risk” and not do anything. This seemed to the team to be an eminently reasonable approach.

However, it didn’t seem at all reasonable to FERC. They ordered the “acceptance of risk” language be removed from the standards; this was done in CIP v2. Their reasoning was very simple: There is no way a single entity can accept risk on behalf of all the entities that are part of the Bulk Electric System.

FERC’s reasoning made sense in the context of all of the standards that had so far been developed by NERC (the so-called “693” or “Operations and Planning” standards). These standards are almost all based on the laws of physics: You do or don’t do something (like trimming trees regularly under transmission lines), and the next thing you know, there’s a cascading outage that blacks out a large swath of the US and Canada. Or you make one wrong move in a key substation, and the next moment there’s a disturbance that takes out most of south Florida and is felt in less than a second in Canada. By not doing what they should have been doing, the utilities involved were essentially accepting risk on behalf of the entire North American power grid – but the rest of the grid had no chance to weigh in on whether they should do this or not. They just felt the consequences.

So FERC applied that reasoning to CIP v1, probably the first mandatory cyber security standard for the power grid of any country or region of the world. And why shouldn’t they? As everyone involved with developing CIP said at the time (and as many in the NERC community still say), the CIP standards are just basic best practices. Any NERC entity that doesn’t follow them leaves a hole that’s just waiting for some adversary to walk through.

But the big problem with taking a “best practices” approach to cyber security is it assumes that the set of cyber threats that need to be mitigated is the same for all entities and for all times. Specifically, it assumes that a) there will be no significant new risks that appear over time, and that all entities face the same threat landscape. And if this assumption doesn’t prove to be true, then b) the standards will be flexible enough to incorporate these new or variable threats.

Well, guess what? Assumption a) isn’t true for cyber threats. New threats appear all the time, and different NERC entities face different threats and to different degrees. As for assumption b), we all know too well that the NERC standards development framework makes it close to impossible to incorporate new cyber threats into the standards in anything less than a few years. For example, the phishing and ransomware threats have been in place for many years, yet there isn’t even discussion of developing new requirements to deal with these. And the threat posed by malware-infected laptops was well known since the late 1990’s, yet when did a requirement come into effect that addressed this threat? 2017.

This situation has strained the current NERC CIP standards to close to the breaking point, with entities spending huge amounts of time and money complying with certain very prescriptive CIP requirements, but then being starved of resources to deal with other cyber threats that aren’t addressed in CIP at all.[i] FERC knew this when they wrote Order 829 in 2016, which ordered NERC to develop a supply chain cyber security standard that was risk-based and not “one size fits all”. While they didn’t say it this way, this seems to me to be the reasoning they followed:

  1. It was clear to FERC, as it’s clear now, that supply chain is by far the biggest area of cyber risk faced by electric utilities, or almost any other industry, today. Think Target, Stuxnet, NotPetya, and the current Russian attacks on the US power industry – all came, or are coming, through the supply chain. FERC felt it was necessary for CIP to address supply chain risks as soon as possible and as thoroughly as possible.
  2. At the same time, FERC knew that the US power industry was struggling mightily just to comply with the existing CIP standards, in large part because they aren’t risk-based and only allow for one type of variation among individual entities: the impact level of particular assets on the BES (and even then, only in three very broad categories). Asking the industry to take on a huge new burden for supply chain security was impossible.
  3. The only way that the burden of the new standard would be manageable would be if it were risk-based, with the entity itself determining the most important risks for it to mitigate, as well as how to mitigate them.
  4. If the entity does this, it will be able to allocate its limited risk mitigation budget (and who has an unlimited budget?) in a way that every dollar or hour spent on supply chain cyber risk mitigation reduces the maximum possible amount of cyber risk. The alternative is a set of mostly prescriptive CIP requirements (as in CIP-002 through -011) that decide for the entity what are the risks it needs to address, as well as exactly what it needs to do to mitigate each of them. There is no room for variation among entities or over time. This will inevitably result in a lower amount of total risk being mitigated, than if the entity is in charge of deciding for itself what risks it will mitigate, and how it will mitigate them.

This is why CIP-013 requires the entity to do just three things: a) Develop a supply chain cyber security risk management plan (R1[ii]); b) Implement the plan (R2); and c) Review the plan every 15 months (R3). If the entity follows the wording of R1 (and it isn’t that easy!), it will be sure to get the most “bang for the buck” in allocating its limited budget to address supply chain cyber risk. And if the entity follows the wording of R3, it will be sure to make adjustments to its plan over time, so that it continues to address the most important supply chain cyber risks, using the most current mitigations for those risks.

Where does acceptance of risk come in here? It’s very simple: There’s no way that risk management can work, unless the entity can accept some risks without mitigating them. The entity has to decide what are the biggest threats that it faces, determine the risk posed by each of those threats, and line the threats up in a big spreadsheet, ranked by their degree of risk. Then the entity needs to start at the top and decide which threats (risks) it can mitigate, then draw a line under the last one of these. The entity will mitigate all of the risks above the line, and it will accept all of the risks below the line.[iii]

Of course, FERC never said that the supply chain standard should allow acceptance of risk, and CIP-013 doesn’t use those words at all. But I contend that it doesn’t have to. It makes no sense to talk about risk management if you aren’t accepting some risks – since, in the cyber security domain and even more so in the supply chain cyber security domain, there are close to an infinite number of risks. In complying with CIP-013, the entity is not only allowed to accept some risks, it is required to do so; otherwise, it’s literally impossible to comply with the standard without bankrupting the utility. And that doesn’t exactly do a whole lot for grid reliability.

If you’re wondering whether I’m the only person saying this, I’m not. Lew Folkerth of RF, the only person within the wider NERC organization who has written about how to comply with CIP-013, has written two great articles for RF’s bi-monthly newsletter (which you can get in PDF form if you want to email me. Otherwise, you have to download two 13 MB newsletter files). I’ve written about these in four posts, starting with this one (these posts haven’t ended, since Lew is promising a third article, to appear in the newsletter that will be released this month. And you thought the Muller report was going to cause the most excitement when it’s released this month? Not in the Alrich household, I can tell you that!).

In the first of Lew’s two articles, he says, regarding auditing of CIP-013 R1.1: “You will need to be able to show an audit team that you have identified possible supply chain risks to your high and medium impact BES Cyber Systems, assessed those risks, and put processes and controls in place to address those risks that pose the highest risk to the BES.” (my emphasis) If you think about it you’ll realize that, in saying that the auditors will look to see that you’ve addressed the highest supply chain cyber risks, Lew’s also saying that the auditors aren’t going to expect you to mitigate risks that aren’t among the highest.

In his second article, Lew makes this more explicit when he says: “You can’t address all risks, so you will need to prioritize the risks you will address.” He goes on to describe a process almost exactly like the one I described above, including identifying threats (risks), assigning risk scores to each one, then planning to mitigate the threats with the highest scores. Amen.

Most sermons open with a reading of Scripture. But it seems this one has ended with it.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013. To discuss this, you can email me at the same address.


[i] I would love to refer to a post for more information on what I’ve just said, but it’s scattered around lots of posts. However, I did write an article on this topic for a UK security journal (print only), that I’m allowed to distribute in a PDF file. If you’d like me to send that to you, send me an email at the address above.

[ii] R1.2.1 through R1.2.6 list six specific risks that must be mitigated in the plan. They are there because FERC specifically ordered each of those to be included, when they wrote Order 829. You can think of these as the most important risks to be mitigated, but certainly not the only ones. The entity is in charge of deciding what other risks it will mitigate, consistent with its budget.

[iii] This is a simplified description, since  it may be possible for the entity to mitigate the same amount of risk, or even more, by only partially mitigating each of the risks at the top of the spreadsheet - yet mitigating more of them than if it required either total or no mitigation. But this is too complicated an idea to discuss in this post.

No comments:

Post a Comment