Thursday, January 7, 2021

Accepting risk in CIP-013


Note from Tom: I am close to finishing my book “Supply chain cybersecurity for critical infrastructure”. By far the hardest chapter to write (and the longest) was the one on accepting risk. This sounds offhand like a fairly simple idea, but I can assure you it isn’t when you start digging into it. Below is the appendix to the chapter on accepting risk, which deals specifically with accepting risk in CIP-013. I especially want to thank Kevin Perry for providing important comments on this document.

I apologize that this post includes a lot of references to other chapters, as well as terms that are undefined because you haven’t seen my chapter on definitions. I expect the book itself will be out within two months, so these won’t be mysteries for too much longer. However, I’ll be glad to answer any questions you want to email me. 

If your organization is a NERC Entity subject to compliance with CIP-013-1, you may have heard that acceptance of risk is not allowed in NERC CIP standards in general, including CIP-013-1. In fact, the authors have heard that at least one NERC Regional Entity has stated they will not allow an Entity to state in their CIP-013-1 supply chain cybersecurity risk management Plan that they will accept Risks, in circumstances like those discussed earlier in this Chapter.

The Region in question was probably basing this opinion on a statement FERC made in January 2008, when they issued Order 706[i]. They stated that “acceptance of risk” is not allowed when it comes to the NERC CIP standards. FERC issued the order while approving the CIP version 1 standards, which used the words “or accept the risk” in multiple requirements (of course, this is a common practice in cybersecurity standards and frameworks).

In the order, FERC pointed out that it is not possible for a NERC Entity to accept a Risk that applies to the Bulk Electric System, since – as was discussed earlier in this Chapter – no single organization “owns” the BES. In fact, the BES is a public good, so it is effectively “owned” by all electricity users in the US and Canada. The references to acceptance of risk were removed in version 2 of the CIP standards.

The authors agree it was appropriate for FERC to order that acceptance of risk be removed as an option for the CIP version 1 standards. We say this because in CIP v1, all of the requirements were prescriptive, as opposed to risk-based. In other words, each requirement in CIP version 1 (and most of the requirements in CIP-003-6 through CIP-011-2 today) mandated that the NERC Entity perform a certain set of steps.

While it is true that these steps were all intended to mitigate a particular Risk or Risks (depending on the requirement), it is also true that, in a prescriptive requirement, the NERC Entity is not provided any other option than to perform those steps. They certainly cannot accept the Risk and not comply with the requirement at all. Even if the Entity believes there is a better way to mitigate the Risk(s) addressed by those steps, they cannot propose to use it. They have to follow the prescribed steps – which is why we call these prescriptive requirements.

The reason there are prescriptive CIP requirements at all is that the Standards Drafting Team (SDT) that drafted them decided that a) a particular risk is important enough that it needs to be mitigated, and b) the particular set of steps prescribed constitutes the single best way to mitigate that risk.

One CIP requirement started out prescriptive in CIP version 1 but changed to risk-based when version 5 was implemented, so it furnishes a good example for this discussion. We are talking about the CIP requirement that addresses the Risk that a Critical Cyber Asset/BES Cyber System will be infected with malware and thus have a negative impact on the BES. Of course, this is the anti-malware requirement

In CIP versions 1-3, the anti-malware requirement was CIP-007-1 R4. It prescribed that antivirus software should be deployed on all Critical Cyber Assets.[ii] This requirement was quite appropriate for devices running Windows™ or Linux operating systems on Intel-standard hardware.

However, the requirement was not at all appropriate for devices like routers, switches, and electronic relays, which are almost always provided in a sealed box and can never be updated by the user. It was also not appropriate for programmable logic controllers (PLCs) and similar devices, which do not run either Windows or Linux. For these devices, the user was required to apply to their Region for a Technical Feasibility Exception, a time-consuming process for both the NERC Entity and their Regional Entity.[iii]

The need for TFEs was greatly reduced when version 5 of the CIP standards came into effect on July 1, 2016. The SDT that drafted CIP v5 took pains to eliminate any reference to technical feasibility (explicit or implicit) from as many requirements as possible. Because the need to comply with the anti-malware requirement was by far the biggest cause of NERC Entities having to create TFEs in the first place, that requirement became “ground zero” for this effort.

The solution the Standards Drafting Team (SDT) came up with was quite elegant: They drafted a new risk-based requirement, CIP-007-5 R3, that simply said the NERC Entity must “Deploy method(s) to deter, detect, or prevent malicious code.” This requirement is still found in the current CIP-007-6.

Because the new requirement did not specify a particular Mitigation for the Risk of malware infection, it allowed the NERC Entity to determine for itself what is the best method for deterring, detecting or preventing malware infection in the case of each device covered by CIP; moreover, that method could be different for every BES Cyber System. While this new requirement did not specifically mention the word “risk”, in the authors’ opinion it is a risk-based requirement.

Of course, for Windows and Linux systems on which software can be loaded by the user, there is little dispute that use of antivirus software (or in some cases application whitelisting software[iv]) is the best Mitigation for the Risk of malware infection. Thus, if a NERC Entity tried to tell their CIP auditor that they deployed a different Mitigation on a Windows device, they would most likely have some serious explaining to do. They would need to demonstrate why the alternative mitigates more risk than antivirus software would.

However, for devices that cannot run antivirus software, the NERC Entity can use risk to determine what is the appropriate mitigation for the Risk of malware infection. For example, if device A is deployed on a network that has no communication with the internet and if physical access is tightly restricted, the Likelihood that malware will infect this device is much lower than it would be for device B, which is deployed on a network with internet access and less rigorous physical access restrictions.

If we assume for the moment that the best way[v] to mitigate the Risk of malware infection, in cases where antivirus software cannot be deployed on a device, is to dedicate an intrusion detection system (IDS) to the device, it should not be hard to justify the decision to deploy the IDS in front of device B, but not device A.

Of course, if the IDS were deployed in front of device A, the Likelihood of malware infection would be somewhat lower than it would be if it weren’t deployed. However, given the substantial cost of deploying that IDS, it should be easy to argue that this cost would not be justified by the small amount of risk that would be mitigated. In other words, the Entity that makes that argument has decided to accept the remaining risk to device A, rather than incur the cost of deploying an IDS. If the prescriptive anti-malware requirement were still in effect, the NERC Entity would not have this choice, and therefore would not be able to accept the risk. Risk acceptance is only possible when complying with risk-based requirements.

Two cases for accepting risk in CIP-013-1

While there are at least three other individual CIP requirements that are also risk-based (CIP-011-1 R1, CIP-010-2 R4 and CIP-003-6 R2), CIP-013-1 is the first CIP standard that is entirely risk-based[vi]. The authors contend that, in complying with CIP-013-1, acceptance of risk by the NERC Entity is permitted. There are two cases in which your organization can accept risk, while complying with CIP-013-1.

First case: CIP-013-1 R1.1 requires the NERC Entity to “identify and assess” supply chain cybersecurity Risks to the BES. In Chapter xx “Identifying Risks”, we describe the process that we recommend your organization follow to identify Risks. We state that if, in your best judgment, there is a low Likelihood that a particular Risk will be found in any Supplier or Vendor’s environment – and that it is unlikely this situation will change – you should leave that Risk out of your supply chain cybersecurity risk management Plan[vii]. The reason for this is that we consider a Risk that has a low Likelihood of being realized to already be mitigated, since the purpose of mitigation is to reduce that Likelihood from high to low.

In that Chapter, we used the example of the Risk that a Supplier or Vendor’s network is attached to the internet, but there is no firewall in place. Because it is highly unlikely that any organization is attached to the internet without a firewall nowadays (or even in at least say the last 10-15 years), there is no need to include this Risk in your Plan. This is because doing so will require extra work on your part (for example, asking a question based on this Risk in every Supplier/Vendor questionnaire), but will not result in any additional risk mitigation. Because there is always some residual Likelihood that a Risk will be realized even after it has been mitigated (in this case, this means there are probably a few organizations somewhere in the world that do not have firewalls, but the Likelihood of this happening is so low that this Risk can be ignored), by deciding you will not apply mitigation to this risk you are in fact accepting the residual risk.

There are a huge number of supply chain cybersecurity Risks, some quite fanciful (such as the Risk that a foreign invasion of the US will result in a Supplier not being able to ship its Products to your organization, meaning a critical BES Cyber System component will not be available to replace an identical component that has failed), and some not (as in the case of almost all of the other Risks discussed in this book). Given that even a Risk with low Likelihood of being realized still has a Likelihood that is greater than zero, you will always leave some residual Likelihood[viii] whenever you decide not to include one of those Risks in your plan.

However, as far as your CIP-013-1 R1.1 supply chain cybersecurity risk management Plan is concerned, it is better not even to mention that, by leaving some Risks out of your plan, you are in effect accepting some residual Risk. There is no need at all to say anything about why you left Risks out of your Plan in the first place. If you tried to itemize every Risk you left out of your Plan and explain why you left it out, you would probably spend the rest of your career doing that.

Second case: The second case of accepting a Risk comes up because there will always be some Risks that, in your judgment, do have a high Likelihood of being realized in at least one Supplier’s or Vendor’s environment. For example, you may believe it is likely that at least one software Supplier of yours has not separated their development network from their IT network, meaning an attacker who penetrated the IT network could easily pivot into the development network.[ix] And even if you are sure that all of your current software Suppliers have implemented this Mitigation, it is possible that a future Supplier will not have implemented it. This is the kind of Risk that you should list in your supply chain cybersecurity risk management Plan.

For every Risk in your Plan, you should regularly assess each of your Suppliers and Vendors to determine the Likelihood that this Risk is present in their environment – in other words, you need to include a question in your questionnaire about whether they have separated the two networks. If the Supplier has not separated their IT and development networks, their Likelihood Score is high for this Risk. If they have separated them, it is low. This process was discussed in Chapter xx “Assessing Supplier/Vendor Risks”.

But it is during the Procurement Risk Assessment (PRA), which is discussed in Chapter xx “Procurement Risk Assessments”, that accepting a Risk may be necessary. There are two situations in which you can do this:

Situation 1: You begin the PRA by going through all of the Risks that apply to this Procurement (i.e. all of the Supplier/Vendor Risks that apply, and any Entity Risks that apply) and identifying those that have a low Likelihood Score for the Supplier and/or Vendor involved with this Procurement; you remove these Risks from consideration in the PRA. The most common reason why a Supplier/Vendor Risk would have a low Likelihood Score is that the response they gave to the corresponding question in their most recent questionnaire indicated there was a low Likelihood that the Risk was present in their environment.

For example, using the example just discussed of network separation, if a software supplier indicates it has separated its IT and software development networks, their Likelihood Score for this Risk will be low, and you do not need to consider this Risk during your PRA. However, as we discussed in the first example of accepting risk, even a Risk with a low Likelihood Score still has a greater than zero Likelihood of being present in the Supplier’s or Vendor’s environment. Therefore, in deciding not to consider low Likelihood Risks in the PRA, you are essentially deciding to accept the small amount of residual risk present in each of these Risks, despite their low Likelihood (although once again, there is no reason to mention this in your CIP-013-1 Plan).

Situation 2: It is also possible that you will discover, as you start your PRA, that the Supplier or Vendor was assigned a high Likelihood Score for one or more Risks in their most recent assessment. In these cases, as discussed in Chapter xx “Procurement Risk Assessments”, you need to identify one or more Mitigations that can be applied to each of these Risks during the Procurement of the Product or Service, the Installation of the Product, and/or the Use of the Service.

To continue with our previous example, a software Supplier may have indicated in their most recent assessment that they have not separated their IT and development networks. Moreover, your organization’s efforts to get them to agree to mitigate this Risk – as described in Chapter xx “Contract Language” and Chapter xx “Other means of obtaining Supplier or Vendor commitment” – may have been unsuccessful, at least so far. This is the reason why the Supplier still has a high Likelihood Score for this Risk.

This means the Likelihood that the software Product you are going to purchase in this Procurement was developed using a network that was not separated from the Supplier’s IT network is high. Of course, that means the Product is more likely to contain malware or a backdoor than a Product that was developed on a network that was separated from the Supplier’s IT network.

Given that it is not possible to get the Supplier to separate their networks now – and also given that it would not make the Product you are procuring any safer if they did this, since obviously the Product was developed in the past, perhaps years ago – the question now becomes: What Mitigation(s) can we apply during the Procurement, Installation and/or Use of this Product to lower the Likelihood that the Supplier’s lack of network separation will result in one of our BES Cyber Systems being compromised, leading to a negative Impact on the Bulk Electric System?

There are a number of Mitigations that you might apply (and you might decide to apply all of them). One Mitigation to apply during Procurement could be to require the Supplier to conduct a thorough security scan and test of the Product before sending it to you. Another Mitigation, applied during Installation of the software, could be for your organization to do this scanning and testing after downloading the Product but before installing it on your production network (of course, you would do this on a test network).[x] A Mitigation you can apply during Use of the Product is to conduct regular security scans once it is installed, since it is very possible that a vulnerability included in the Product when you received it was not identified until long afterwards.

However, suppose you decide that none of these Mitigations will completely mitigate the Risk posed by the lack of separation between the Supplier’s IT and development networks. You might decide this because you believe the Supplier could have been penetrated by attackers like the Russian attackers that penetrated the SolarWinds development network, and that those attackers might have installed new malware like the Sunburst malware (the malware installed by the Russians in a patch for SolarWinds’ Orion Product)[xi]. If, like Sunburst, the new malware was based on a “zero-day” vulnerability[xii], scanning the Product, by you or the Supplier, will not detect it.

In fact, if you seriously believe there is a high Likelihood that the Product you are going to procure could contain malware based on a zero-day vulnerability that has been planted by an attacker who intends to utilize the malware to penetrate your network after the Product has been installed (i.e. an attacker has installed a backdoor in the Product), the only good Mitigation is to stop the Procurement altogether and perhaps look for a different Supplier.

However, suppose that, when you suggest stopping the Procurement, it is made clear to you that this is an unacceptable Mitigation; the Procurement will go on, whether or not you say it is a high-risk one (and it is likely that you will be asked to produce some proof that there really is a backdoor in the Product. Of course, you will not have proof, and whatever reasons you have for your suspicions may not hold water with your superiors).

In that case, you have a choice: a) apply whatever Mitigations you can (it will certainly not hurt to scan and test the Product) and accept whatever risk remains; or b) look for another job. If you choose the first option, you need to document your acceptance of the residual risk, both for general risk management purposes and specifically for CIP-013-1 compliance purposes.

In your document, you need to provide the following information:

1.      A description of the Risk you are accepting;

2.      The original degree of risk, before you applied any Mitigations (this will almost always be “high”);

3.      The Mitigations you applied or will apply during Procurement, Installation and/or Use of the Product, and how they will reduce the original degree of risk; and

4.      The degree of residual risk that you are accepting (i.e. your subjective estimate of whether you are accepting a low, moderate, or high amount of risk).

Note that there might be financial considerations here as well. In other words, suppose that, during a PRA, you decide that for one high-Likelihood Risk there is only one possible Mitigation, which is very expensive. When you bring this up to your superiors, you are told this is a non-starter, meaning you need to just accept this Risk and move on. As in the previous example, since you can never prove to anyone that a Risk really does have a high Likelihood, there is no way to win this argument. Again you are faced with the choice of accepting the Risk or looking for another job.

If you choose to accept the Risk, you need to prepare the same documentation we just discussed. And in the documentation, you should be honest that the reason your organization cannot apply the one Mitigation that will address this Risk is financial. Remember, this is a valid reason when we are talking about compliance with a risk-based requirement.

No organization has infinite resources available to it, meaning no organization can fully mitigate every Risk that it faces. There will always be cases where a particular Risk simply cannot be fully mitigated, without inadequately mitigating one or more other Risks. What is important is that you weigh the different Risks against each other as much as possible, so that in general you are allocating your scare resources in a way that mitigates the greatest possible amount of risk.

Of course, in cybersecurity – as opposed to finance, for example - it is never possible to quantify these calculations to any degree of accuracy. A lot of the calculations will simply be based on gut feel. But even that is better than not doing any calculations at all. Not even trying to prioritize your Mitigations by amount of Risk mitigated is likely to lead to misallocation of your resources, so that large amounts are thrown at mitigating Risks that are not too important, while important Risks receive inadequate mitigation resources.

This is the second situation in which your organization can accept risk while conducting a Procurement Risk Assessment in compliance with CIP-013-1. Note that, unlike in the first situation - which just involved accepting Risks that were already low Likelihood – you need to provide documentation for what you did.

Is it time to review your CIP-013 R1 plan? Remember, you can change it at any time, as long as you document why you did that. If you would like me to give you suggestions on how the plan could be improved, please email me.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[ii] In CIP versions 1-4, the types of devices being protected were Critical Cyber Assets. Starting in CIP version 5, they became BES Cyber Assets, which the NERC Entity groups into BES Cyber Systems. 

[iii] When the CIP version 1 standards came into effect, the Technical Feasibility Exception (TFE) process was not in place, but some of the requirements referred to the idea of technical feasibility. For example, CIP-007-1 R4 read “The Responsible Entity shall use anti-virus software and other malicious software (“malware”) prevention tools, where technically feasible, to detect, prevent, deter, and mitigate the introduction, exposure, and propagation of malware on all Cyber Assets within the Electronic Security Perimeter(s)” (our emphasis). 

Initially, the presence of the words “where technically feasible” (which appeared in other CIP version 1 requirements besides this one) meant that compliance with this requirement, for a Critical Cyber Asset that was not capable of running antivirus software, was fairly easy: The NERC Entity would simply need to declare in the audit that running antivirus software on the device was not technically feasible. 

However, FERC was not happy with this situation. They pointed out that, just because a device was not capable of running antivirus software, this did not mean that the Risk of malware infection miraculously disappeared. The Risk was still present and needed to be mitigated in some way. Therefore, they ordered NERC to develop the TFE process, and to require it in any case where a NERC Entity could not strictly comply with a CIP requirement due to technical infeasibility (this applied to all CIP requirements that included the “where technically feasible” phrase, as well as a few others). 

In the TFE process, the NERC Entity, upon realizing that it is not possible to comply with the strict language of a CIP requirement (e.g. it is not possible to run antivirus software on a particular device, as required by the anti-malware requirement), must apply to their Regional Entity for a TFE. In applying for the TFE, they need to provide two items. The first is evidence that it is indeed not possible to comply with the strict wording of the requirement in question, for the system in question. In the case of the anti-malware requirement, this meant the Entity had to provide evidence that it was not possible to run antivirus software on the device (usually, this evidence was a letter to that effect, written by the manufacturer). 

The other item the Entity needs to provide, in applying for a TFE, is a mitigation plan showing alternative steps the Entity will take to mitigate the Risk addressed by the requirement. In the case of the anti-malware requirement, these included alternative steps to “detect, prevent, deter, and mitigate” malware on the device. 

The mitigation plan needs to be regularly updated, and the TFE has to be renewed, until a) the Risk has been mitigated; b) the device is retired; or c) the manufacturer determines that it is now possible to run antivirus software on the device after all. 

TFEs quickly became probably the biggest sore spot for NERC Entities, when it came to the CIP standards. It is no exaggeration to say that some larger electric utilities easily invested many thousands of person-hours in developing and maintaining TFEs. Moreover, the Regional Entities had to hire or contract many staff members in order to handle the huge volume of paperwork required to process TFEs). 

Because of these problems, the SDT that drafted the CIP version 5 standards made it a goal to eliminate any reference (explicit or implicit) to technical feasibility in any of the requirements. A few of the current CIP requirements still include words like “where technically feasible”. This means that, if the NERC Entity is unable to comply with the strict wording of the requirement in the case of a particular device, they still need to apply for a TFE. 

However, the number of these requirements has been drastically reduced. This has in practice led to a significant reduction in the number of TFEs required, a cause of great satisfaction on the part of both NERC Entities and the NERC Regions. 

[iv] Application whitelisting (AWL) is a technology in which only certain pre-defined software versions are allowed to be executed on a system; all other software is blocked. Since malware always requires loading different software (even though it might have the name as a piece of software already loaded. Because AWL compares the hash value of the software to a predetermined value before it executes the software, it will not be fooled by a simple name change), it is blocked by AWL. However, managing AWL is cumbersome, since even patching existing software requires making a change to the “whitelist”. AWL is best suited for systems that perform a fixed function that rarely changes, like industrial control systems. 

[v] Of course, there are other Mitigations that would be less expensive than this one, although they would also probably be less effective. 

[vi] Some NERC CIP compliance professionals would argue that CIP-014-2 is also a risk-based standard. The authors agree it has some risk-based characteristics, but there are other characteristics (especially in how it has been audited, since it has been in effect for 4-5 years) that do not seem to be risk-based at all. However, we admit that neither of us has worked closely with the standard, so we are not qualified to judge it to be one way or the other. 

[vii] Note that the supply chain risk management methodology discussed in this book classifies Risks into Supplier/Vendor Risks and Entity Risks. We are here only discussing the former, but our arguments apply equally to Entity Risks. On the other hand, it is fairly unlikely that there are Entity Risks that cannot be mitigated, because mitigation is almost entirely in the Entity’s control, unlike mitigation of Supplier/Vendor Risks. For this reason, it is unlikely that any organization would need to accept an Entity Risk.   

[viii] The phrase “residual Likelihood” may cause some consternation, since normally one hears about residual risk. Since Risk = Likelihood X Impact, why have we chosen not to discuss residual Risk? We have made that choice for the same reason that we give Likelihood Scores, not Risk Scores, when we are assessing Risks that apply to Vendors, Suppliers or the Entity itself: When dealing with supply chain security of critical infrastructure, we believe that Impact should always be considered to be high. 

It would be very hard – and in many cases impossible – to specify what exactly the Impact is of say a backdoor being implanted in a device during development, since the exploit of that backdoor could have a very different Impact depending on time of day, season of the year, where the device was installed at the time, etc. Therefore, the prudent course is always to assume the Impact of a supply chain compromise of critical infrastructure hardware or software is high. That being the case, only Likelihood of compromise varies, and that is all that needs to be tracked. The Risk is high when Likelihood is high, and low when Likelihood is low. 

However, for purposes of CIP-013-1 compliance, it would be an excellent idea not to confuse the auditors by talking about acceptance of Likelihood. Simply call it acceptance of risk and be done with it. 

[ix] While, as of the writing of this Chapter, it is not known how the Russian attackers were able to compromise the SolarWinds patch development process, it is possible that lack of network separation could have been the reason. In any case, maintaining strict separation (with separate authentication) between the IT and development networks (just as NERC Entities are required to have strict separation between their IT and their BES networks by CIP-005-6 R1 and R2) is one important Mitigation for the Risk of compromise of the Supplier’s development network, and should be required of software developers. 

[x] Of course, this step is required for software being installed on High impact BES Cyber Systems, EACMS and Protected Cyber Assets, per CIP-010-3 R3.3. 

[xi] For a further discussion of the SolarWinds attacks of 2020 and how their implications for the supply chain cybersecurity risk management framework discussed in this book, see Chapter xx “The SolarWinds attacks”. 

[xii] A zero-day vulnerability is a vulnerability that was clandestinely discovered by the attackers or another “dark” organization, which has not been revealed to the Supplier of the Product(s) in which the vulnerability is present. Thus, being up-to-date on patches for all software will not save your organization in this case, since none of your Suppliers will have developed a patch for it yet. There is no good way for any end user organization (including a very sophisticated one) to identify a zero-day vulnerability. That is what makes these vulnerabilities so dangerous.

 

No comments:

Post a Comment