Note from Tom: I am close to
finishing my book “Supply chain cybersecurity for critical infrastructure”. By far
the hardest chapter to write (and the longest) was the one on accepting risk.
This sounds offhand like a fairly simple idea, but I can assure you it isn’t
when you start digging into it. Below is the appendix to the chapter on
accepting risk, which deals specifically with accepting risk in CIP-013. I
especially want to thank Kevin Perry for providing important comments on this
document.
I apologize that this post
includes a lot of references to other chapters, as well as terms that are
undefined because you haven’t seen my chapter on definitions. I expect the book
itself will be out within two months, so these won’t be mysteries for too much
longer. However, I’ll be glad to answer any questions you want to email me.
If your organization is a NERC Entity subject to compliance
with CIP-013-1, you may have heard that acceptance of risk is not allowed in
NERC CIP standards in general, including CIP-013-1. In fact, the authors have
heard that at least one NERC Regional Entity has stated they will not allow an Entity
to state in their CIP-013-1 supply chain cybersecurity risk management Plan
that they will accept Risks, in circumstances like those discussed earlier in
this Chapter.
The Region in question was probably basing this opinion on a
statement FERC made in January 2008, when they issued Order 706[i].
They stated that “acceptance of risk” is not allowed when it comes to the NERC
CIP standards. FERC issued the order while approving the CIP version 1
standards, which used the words “or accept the risk” in multiple requirements (of
course, this is a common practice in cybersecurity standards and frameworks).
In the order, FERC pointed out that it is not possible for a
NERC Entity to accept a Risk that applies to the Bulk Electric System, since –
as was discussed earlier in this Chapter – no single organization “owns” the
BES. In fact, the BES is a public good, so it is effectively “owned” by all electricity
users in the US and Canada. The references to acceptance of risk were removed in
version 2 of the CIP standards.
The authors agree it was appropriate for FERC to order that
acceptance of risk be removed as an option for the CIP version 1 standards. We
say this because in CIP v1, all of the requirements were prescriptive, as
opposed to risk-based. In other words, each requirement in CIP version 1 (and
most of the requirements in CIP-003-6 through CIP-011-2 today) mandated that
the NERC Entity perform a certain set of steps.
While it is true that these steps were all intended to
mitigate a particular Risk or Risks (depending on the requirement), it is also
true that, in a prescriptive requirement, the NERC Entity is not provided any
other option than to perform those steps. They certainly cannot accept the Risk
and not comply with the requirement at all. Even if the Entity believes there is
a better way to mitigate the Risk(s) addressed by those steps, they cannot
propose to use it. They have to follow the prescribed steps – which is why we
call these prescriptive requirements.
The reason there are prescriptive CIP requirements at all is
that the Standards Drafting Team (SDT) that drafted them decided that a) a
particular risk is important enough that it needs to be mitigated, and b) the
particular set of steps prescribed constitutes the single best way to mitigate
that risk.
One CIP requirement started out prescriptive in CIP version 1
but changed to risk-based when version 5 was implemented, so it furnishes a
good example for this discussion. We are talking about the CIP requirement that
addresses the Risk that a Critical Cyber Asset/BES Cyber System will be infected
with malware and thus have a negative impact on the BES. Of course, this is the
anti-malware requirement
In CIP versions 1-3, the anti-malware requirement was CIP-007-1
R4. It prescribed that antivirus software should be deployed on all Critical
Cyber Assets.[ii] This
requirement was quite appropriate for devices running Windows™ or Linux
operating systems on Intel-standard hardware.
However, the requirement was not at all appropriate for devices
like routers, switches, and electronic relays, which are almost always provided
in a sealed box and can never be updated by the user. It was also not
appropriate for programmable logic controllers (PLCs) and similar devices,
which do not run either Windows or Linux. For these devices, the user was
required to apply to their Region for a Technical Feasibility Exception, a time-consuming
process for both the NERC Entity and their Regional Entity.[iii]
The need for TFEs was greatly reduced when version 5 of the
CIP standards came into effect on July 1, 2016. The SDT that drafted CIP v5 took
pains to eliminate any reference to technical feasibility (explicit or implicit)
from as many requirements as possible. Because the need to comply with the
anti-malware requirement was by far the biggest cause of NERC Entities having
to create TFEs in the first place, that requirement became “ground zero” for this
effort.
The solution the Standards Drafting Team (SDT) came up with
was quite elegant: They drafted a new risk-based requirement, CIP-007-5 R3,
that simply said the NERC Entity must “Deploy method(s) to deter, detect, or
prevent malicious code.” This requirement is still found in the current
CIP-007-6.
Because the new requirement did not specify a particular
Mitigation for the Risk of malware infection, it allowed the NERC Entity to
determine for itself what is the best method for deterring, detecting or
preventing malware infection in the case of each device covered by CIP;
moreover, that method could be different for every BES Cyber System. While this
new requirement did not specifically mention the word “risk”, in the authors’
opinion it is a risk-based requirement.
Of course, for Windows and Linux systems on which software
can be loaded by the user, there is little dispute that use of antivirus
software (or in some cases application whitelisting software[iv])
is the best Mitigation for the Risk of malware infection. Thus, if a NERC
Entity tried to tell their CIP auditor that they deployed a different
Mitigation on a Windows device, they would most likely have some serious
explaining to do. They would need to demonstrate why the alternative mitigates
more risk than antivirus software would.
However, for devices that cannot run antivirus software, the
NERC Entity can use risk to determine what is the appropriate mitigation for
the Risk of malware infection. For example, if device A is deployed on a
network that has no communication with the internet and if physical access is tightly
restricted, the Likelihood that malware will infect this device is much lower
than it would be for device B, which is deployed on a network with internet
access and less rigorous physical access restrictions.
If we assume for the moment that the best way[v]
to mitigate the Risk of malware infection, in cases where antivirus software
cannot be deployed on a device, is to dedicate an intrusion detection system
(IDS) to the device, it should not be hard to justify the decision to deploy
the IDS in front of device B, but not device A.
Of course, if the IDS were deployed in front of device A,
the Likelihood of malware infection would be somewhat lower than it would be if
it weren’t deployed. However, given the substantial cost of deploying that IDS,
it should be easy to argue that this cost would not be justified by the small
amount of risk that would be mitigated. In other words, the Entity that makes
that argument has decided to accept the remaining risk to device A, rather than
incur the cost of deploying an IDS. If the prescriptive anti-malware
requirement were still in effect, the NERC Entity would not have this choice,
and therefore would not be able to accept the risk. Risk acceptance is only
possible when complying with risk-based requirements.
Two cases for accepting risk in CIP-013-1
While there are at least three other individual CIP
requirements that are also risk-based (CIP-011-1 R1, CIP-010-2 R4 and CIP-003-6
R2), CIP-013-1 is the first CIP standard that is entirely risk-based[vi].
The authors contend that, in complying with CIP-013-1, acceptance of risk by
the NERC Entity is permitted. There are two cases in which your organization
can accept risk, while complying with CIP-013-1.
First case: CIP-013-1 R1.1 requires the NERC Entity
to “identify and assess” supply chain cybersecurity Risks to the BES. In
Chapter xx “Identifying Risks”, we describe the process that we
recommend your organization follow to identify Risks. We state that if, in your
best judgment, there is a low Likelihood that a particular Risk will be found
in any Supplier or Vendor’s environment – and that it is unlikely this
situation will change – you should leave that Risk out of your supply chain
cybersecurity risk management Plan[vii].
The reason for this is that we consider a Risk that has a low Likelihood of
being realized to already be mitigated, since the purpose of mitigation is to
reduce that Likelihood from high to low.
In that Chapter, we used the example of the Risk that a
Supplier or Vendor’s network is attached to the internet, but there is no
firewall in place. Because it is highly unlikely that any organization is
attached to the internet without a firewall nowadays (or even in at least say
the last 10-15 years), there is no need to include this Risk in your Plan. This
is because doing so will require extra work on your part (for example, asking a
question based on this Risk in every Supplier/Vendor questionnaire), but will
not result in any additional risk mitigation. Because there is always some
residual Likelihood that a Risk will be realized even after it has been
mitigated (in this case, this means there are probably a few organizations
somewhere in the world that do not have firewalls, but the Likelihood of this
happening is so low that this Risk can be ignored), by deciding you will not
apply mitigation to this risk you are in fact accepting the residual risk.
There are a huge number of supply chain cybersecurity Risks,
some quite fanciful (such as the Risk that a foreign invasion of the US will
result in a Supplier not being able to ship its Products to your organization,
meaning a critical BES Cyber System component will not be available to replace
an identical component that has failed), and some not (as in the case of almost
all of the other Risks discussed in this book). Given that even a Risk with low
Likelihood of being realized still has a Likelihood that is greater than zero,
you will always leave some residual Likelihood[viii]
whenever you decide not to include one of those Risks in your plan.
However, as far as your CIP-013-1 R1.1 supply chain
cybersecurity risk management Plan is concerned, it is better not even to
mention that, by leaving some Risks out of your plan, you are in effect
accepting some residual Risk. There is no need at all to say anything about why
you left Risks out of your Plan in the first place. If you tried to itemize
every Risk you left out of your Plan and explain why you left it out, you would
probably spend the rest of your career doing that.
Second case: The second case of accepting a Risk
comes up because there will always be some Risks that, in your judgment, do
have a high Likelihood of being realized in at least one Supplier’s or Vendor’s
environment. For example, you may believe it is likely that at least one software
Supplier of yours has not separated their development network from their IT
network, meaning an attacker who penetrated the IT network could easily pivot
into the development network.[ix]
And even if you are sure that all of your current software Suppliers have
implemented this Mitigation, it is possible that a future Supplier will not
have implemented it. This is the kind of Risk that you should list in your
supply chain cybersecurity risk management Plan.
For every Risk in your Plan, you should regularly assess
each of your Suppliers and Vendors to determine the Likelihood that this Risk
is present in their environment – in other words, you need to include a
question in your questionnaire about whether they have separated the two
networks. If the Supplier has not separated their IT and development networks,
their Likelihood Score is high for this Risk. If they have separated them, it
is low. This process was discussed in Chapter xx “Assessing Supplier/Vendor
Risks”.
But it is during the Procurement Risk Assessment (PRA),
which is discussed in Chapter xx “Procurement Risk Assessments”, that accepting
a Risk may be necessary. There are two situations in which you can do this:
Situation 1: You begin the PRA by going through all
of the Risks that apply to this Procurement (i.e. all of the Supplier/Vendor
Risks that apply, and any Entity Risks that apply) and identifying those that
have a low Likelihood Score for the Supplier and/or Vendor involved with this
Procurement; you remove these Risks from consideration in the PRA. The most
common reason why a Supplier/Vendor Risk would have a low Likelihood Score is
that the response they gave to the corresponding question in their most recent
questionnaire indicated there was a low Likelihood that the Risk was present in
their environment.
For example, using the example just discussed of network
separation, if a software supplier indicates it has separated its IT and software
development networks, their Likelihood Score for this Risk will be low, and you
do not need to consider this Risk during your PRA. However, as we discussed in
the first example of accepting risk, even a Risk with a low Likelihood Score
still has a greater than zero Likelihood of being present in the Supplier’s or
Vendor’s environment. Therefore, in deciding not to consider low Likelihood
Risks in the PRA, you are essentially deciding to accept the small amount of
residual risk present in each of these Risks, despite their low Likelihood
(although once again, there is no reason to mention this in your CIP-013-1 Plan).
Situation 2: It is also possible that you will
discover, as you start your PRA, that the Supplier or Vendor was assigned a
high Likelihood Score for one or more Risks in their most recent assessment. In
these cases, as discussed in Chapter xx “Procurement Risk Assessments”, you
need to identify one or more Mitigations that can be applied to each of these
Risks during the Procurement of the Product or Service, the Installation of the
Product, and/or the Use of the Service.
To continue with our previous example, a software Supplier
may have indicated in their most recent assessment that they have not separated
their IT and development networks. Moreover, your organization’s efforts to get
them to agree to mitigate this Risk – as described in Chapter xx
“Contract Language” and Chapter xx “Other means of obtaining Supplier or
Vendor commitment” – may have been unsuccessful, at least so far. This is the reason
why the Supplier still has a high Likelihood Score for this Risk.
This means the Likelihood that the software Product you are
going to purchase in this Procurement was developed using a network that was
not separated from the Supplier’s IT network is high. Of course, that means the
Product is more likely to contain malware or a backdoor than a Product that was
developed on a network that was separated from the Supplier’s IT network.
Given that it is not possible to get the Supplier to
separate their networks now – and also given that it would not make the Product
you are procuring any safer if they did this, since obviously the Product was
developed in the past, perhaps years ago – the question now becomes: What Mitigation(s)
can we apply during the Procurement, Installation and/or Use of this Product to
lower the Likelihood that the Supplier’s lack of network separation will result
in one of our BES Cyber Systems being compromised, leading to a negative Impact
on the Bulk Electric System?
There are a number of Mitigations that you might apply (and
you might decide to apply all of them). One Mitigation to apply during
Procurement could be to require the Supplier to conduct a thorough security
scan and test of the Product before sending it to you. Another Mitigation,
applied during Installation of the software, could be for your organization to
do this scanning and testing after downloading the Product but before installing
it on your production network (of course, you would do this on a test network).[x]
A Mitigation you can apply during Use of the Product is to conduct regular
security scans once it is installed, since it is very possible that a
vulnerability included in the Product when you received it was not identified
until long afterwards.
However, suppose you decide that none of these Mitigations
will completely mitigate the Risk posed by the lack of separation between the
Supplier’s IT and development networks. You might decide this because you
believe the Supplier could have been penetrated by attackers like the Russian
attackers that penetrated the SolarWinds development network, and that those
attackers might have installed new malware like the Sunburst malware (the
malware installed by the Russians in a patch for SolarWinds’ Orion Product)[xi].
If, like Sunburst, the new malware was based on a “zero-day” vulnerability[xii],
scanning the Product, by you or the Supplier, will not detect it.
In fact, if you seriously believe there is a high Likelihood
that the Product you are going to procure could contain malware based on a
zero-day vulnerability that has been planted by an attacker who intends to utilize
the malware to penetrate your network after the Product has been installed
(i.e. an attacker has installed a backdoor in the Product), the only good
Mitigation is to stop the Procurement altogether and perhaps look for a
different Supplier.
However, suppose that, when you suggest stopping the
Procurement, it is made clear to you that this is an unacceptable Mitigation;
the Procurement will go on, whether or not you say it is a high-risk one (and
it is likely that you will be asked to produce some proof that there really is
a backdoor in the Product. Of course, you will not have proof, and whatever
reasons you have for your suspicions may not hold water with your superiors).
In that case, you have a choice: a) apply whatever
Mitigations you can (it will certainly not hurt to scan and test the Product)
and accept whatever risk remains; or b) look for another job. If you choose the
first option, you need to document your acceptance of the residual risk, both
for general risk management purposes and specifically for CIP-013-1 compliance
purposes.
In your document, you need to provide the following
information:
1.
A description of the Risk you are accepting;
2.
The original degree of risk, before you applied
any Mitigations (this will almost always be “high”);
3.
The Mitigations you applied or will apply during
Procurement, Installation and/or Use of the Product, and how they will reduce the
original degree of risk; and
4.
The degree of residual risk that you are
accepting (i.e. your subjective estimate of whether you are accepting a low, moderate,
or high amount of risk).
Note that there might be financial considerations here as
well. In other words, suppose that, during a PRA, you decide that for one
high-Likelihood Risk there is only one possible Mitigation, which is very
expensive. When you bring this up to your superiors, you are told this is a
non-starter, meaning you need to just accept this Risk and move on. As in the
previous example, since you can never prove to anyone that a Risk really does
have a high Likelihood, there is no way to win this argument. Again you are faced
with the choice of accepting the Risk or looking for another job.
If you choose to accept the Risk, you need to prepare the
same documentation we just discussed. And in the documentation, you should be
honest that the reason your organization cannot apply the one Mitigation that
will address this Risk is financial. Remember, this is a valid reason when we
are talking about compliance with a risk-based requirement.
No organization has infinite resources available to it,
meaning no organization can fully mitigate every Risk that it faces. There will
always be cases where a particular Risk simply cannot be fully mitigated,
without inadequately mitigating one or more other Risks. What is important is
that you weigh the different Risks against each other as much as possible, so
that in general you are allocating your scare resources in a way that mitigates
the greatest possible amount of risk.
Of course, in cybersecurity – as opposed to finance, for
example - it is never possible to quantify these calculations to any degree of
accuracy. A lot of the calculations will simply be based on gut feel. But even
that is better than not doing any calculations at all. Not even trying to
prioritize your Mitigations by amount of Risk mitigated is likely to lead to misallocation
of your resources, so that large amounts are thrown at mitigating Risks that
are not too important, while important Risks receive inadequate mitigation
resources.
This is the second situation in which your organization can
accept risk while conducting a Procurement Risk Assessment in compliance with
CIP-013-1. Note that, unlike in the first situation - which just involved
accepting Risks that were already low Likelihood – you need to provide
documentation for what you did.
Is it time to review your
CIP-013 R1 plan? Remember, you can change it at any time, as long as you
document why you did that. If you would like me to give you suggestions on how
the plan could be improved, please email me.
Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[ii] In CIP versions 1-4, the types of devices being protected were Critical Cyber Assets. Starting in CIP version 5, they became BES Cyber Assets, which the NERC Entity groups into BES Cyber Systems.
[iii] When the CIP version 1 standards came into effect, the Technical Feasibility Exception (TFE) process was not in place, but some of the requirements referred to the idea of technical feasibility. For example, CIP-007-1 R4 read “The Responsible Entity shall use anti-virus software and other malicious software (“malware”) prevention tools, where technically feasible, to detect, prevent, deter, and mitigate the introduction, exposure, and propagation of malware on all Cyber Assets within the Electronic Security Perimeter(s)” (our emphasis).
Initially, the presence of the words “where technically feasible” (which appeared in other CIP version 1 requirements besides this one) meant that compliance with this requirement, for a Critical Cyber Asset that was not capable of running antivirus software, was fairly easy: The NERC Entity would simply need to declare in the audit that running antivirus software on the device was not technically feasible.
However, FERC was not happy with this situation. They pointed out that, just because a device was not capable of running antivirus software, this did not mean that the Risk of malware infection miraculously disappeared. The Risk was still present and needed to be mitigated in some way. Therefore, they ordered NERC to develop the TFE process, and to require it in any case where a NERC Entity could not strictly comply with a CIP requirement due to technical infeasibility (this applied to all CIP requirements that included the “where technically feasible” phrase, as well as a few others).
In the TFE process, the NERC Entity, upon realizing that it is not possible to comply with the strict language of a CIP requirement (e.g. it is not possible to run antivirus software on a particular device, as required by the anti-malware requirement), must apply to their Regional Entity for a TFE. In applying for the TFE, they need to provide two items. The first is evidence that it is indeed not possible to comply with the strict wording of the requirement in question, for the system in question. In the case of the anti-malware requirement, this meant the Entity had to provide evidence that it was not possible to run antivirus software on the device (usually, this evidence was a letter to that effect, written by the manufacturer).
The other item the Entity needs to provide, in applying for a TFE, is a mitigation plan showing alternative steps the Entity will take to mitigate the Risk addressed by the requirement. In the case of the anti-malware requirement, these included alternative steps to “detect, prevent, deter, and mitigate” malware on the device.
The mitigation plan needs to be regularly updated, and the TFE has to be renewed, until a) the Risk has been mitigated; b) the device is retired; or c) the manufacturer determines that it is now possible to run antivirus software on the device after all.
TFEs quickly became probably the biggest sore spot for NERC Entities, when it came to the CIP standards. It is no exaggeration to say that some larger electric utilities easily invested many thousands of person-hours in developing and maintaining TFEs. Moreover, the Regional Entities had to hire or contract many staff members in order to handle the huge volume of paperwork required to process TFEs).
Because of these problems, the SDT that drafted the CIP version 5 standards made it a goal to eliminate any reference (explicit or implicit) to technical feasibility in any of the requirements. A few of the current CIP requirements still include words like “where technically feasible”. This means that, if the NERC Entity is unable to comply with the strict wording of the requirement in the case of a particular device, they still need to apply for a TFE.
However, the number of these requirements has been drastically reduced. This has in practice led to a significant reduction in the number of TFEs required, a cause of great satisfaction on the part of both NERC Entities and the NERC Regions.
[iv] Application whitelisting (AWL) is a technology in which only certain pre-defined software versions are allowed to be executed on a system; all other software is blocked. Since malware always requires loading different software (even though it might have the name as a piece of software already loaded. Because AWL compares the hash value of the software to a predetermined value before it executes the software, it will not be fooled by a simple name change), it is blocked by AWL. However, managing AWL is cumbersome, since even patching existing software requires making a change to the “whitelist”. AWL is best suited for systems that perform a fixed function that rarely changes, like industrial control systems.
[v] Of course, there are other Mitigations that would be less expensive than this one, although they would also probably be less effective.
[vi] Some NERC CIP compliance professionals would argue that CIP-014-2 is also a risk-based standard. The authors agree it has some risk-based characteristics, but there are other characteristics (especially in how it has been audited, since it has been in effect for 4-5 years) that do not seem to be risk-based at all. However, we admit that neither of us has worked closely with the standard, so we are not qualified to judge it to be one way or the other.
[vii] Note that the supply chain risk management methodology discussed in this book classifies Risks into Supplier/Vendor Risks and Entity Risks. We are here only discussing the former, but our arguments apply equally to Entity Risks. On the other hand, it is fairly unlikely that there are Entity Risks that cannot be mitigated, because mitigation is almost entirely in the Entity’s control, unlike mitigation of Supplier/Vendor Risks. For this reason, it is unlikely that any organization would need to accept an Entity Risk.
[viii] The phrase “residual Likelihood” may cause some consternation, since normally one hears about residual risk. Since Risk = Likelihood X Impact, why have we chosen not to discuss residual Risk? We have made that choice for the same reason that we give Likelihood Scores, not Risk Scores, when we are assessing Risks that apply to Vendors, Suppliers or the Entity itself: When dealing with supply chain security of critical infrastructure, we believe that Impact should always be considered to be high.
It would be very hard – and in many cases impossible – to specify what exactly the Impact is of say a backdoor being implanted in a device during development, since the exploit of that backdoor could have a very different Impact depending on time of day, season of the year, where the device was installed at the time, etc. Therefore, the prudent course is always to assume the Impact of a supply chain compromise of critical infrastructure hardware or software is high. That being the case, only Likelihood of compromise varies, and that is all that needs to be tracked. The Risk is high when Likelihood is high, and low when Likelihood is low.
However, for purposes of CIP-013-1 compliance, it would be an excellent idea not to confuse the auditors by talking about acceptance of Likelihood. Simply call it acceptance of risk and be done with it.
[ix] While, as of the writing of this Chapter, it is not known how the Russian attackers were able to compromise the SolarWinds patch development process, it is possible that lack of network separation could have been the reason. In any case, maintaining strict separation (with separate authentication) between the IT and development networks (just as NERC Entities are required to have strict separation between their IT and their BES networks by CIP-005-6 R1 and R2) is one important Mitigation for the Risk of compromise of the Supplier’s development network, and should be required of software developers.
[x] Of course, this step is required for software being installed on High impact BES Cyber Systems, EACMS and Protected Cyber Assets, per CIP-010-3 R3.3.
[xi] For a further discussion of the SolarWinds attacks of 2020 and how their implications for the supply chain cybersecurity risk management framework discussed in this book, see Chapter xx “The SolarWinds attacks”.
[xii] A
zero-day vulnerability is a vulnerability that was clandestinely discovered by
the attackers or another “dark” organization, which has not been revealed to
the Supplier of the Product(s) in which the vulnerability is present. Thus,
being up-to-date on patches for all software will not save your organization in
this case, since none of your Suppliers will have developed a patch for it yet.
There is no good way for any end user organization (including a very
sophisticated one) to identify a zero-day vulnerability. That is what makes
these vulnerabilities so dangerous.
No comments:
Post a Comment