The Risk Management for Third-Party Cloud Services Standards Drafting Team is off to a running start this year. I say “start”, because during the 5-6 months that the SDT met last year, they didn’t even start to draft any standards. Instead, they were doing what NERC drafting teams sometimes do: re-think the Standards Authorization Request (SAR) that forms their “charter”. They spent the entire fall discussing what they are going to discuss when they start drafting the new (or revised) cloud standards.
There is nothing wrong with doing this, especially when the
SDT has an especially weighty burden – and it’s hard to think of a NERC CIP SDT
that’s had a weightier burden than this one has, except perhaps the team (called
“CSO706” for Cyber Security Order 706) that drafted CIP versions 2,3,4 and 5. In
fact, one member of that team, Jay Cribb, is a member of the cloud team. The
CSO706 team first met in 2008. Their last “product”, CIP version 5 (essentially,
the version that’s still in effect today), came into effect in 2016.
In my opinion, one essential attribute for any requirements they
create is that they be risk-based. That’s my term, but NERC refers to them as “performance-based”.
While some CIP requirements today are truly risk-based (even though they may
not mention the word “risk”), others are not.
In fact, a small number of CIP requirements like CIP-007 R2
patch management and CIP-010 R1 configuration management are highly
prescriptive, and require compliance on the physical or virtual device level.
Cloud service providers don’t track systems based on the device on which they
reside, since doing so would require breaking the cloud model. This means they
will never be able to provide the evidence required for a NERC entity customer
to prove compliance with these prescriptive requirements.
This is why I think all CIP requirements going forward, but
especially requirements having to do with use of the cloud, need to be risk-based,
and can’t refer (even implicitly) to devices at all. In fact, since CIP v5 came
into effect in 2016, I believe that all subsequent CIP requirements and some entire
standards, including CIP-012, CIP-013, CIP-014, CIP-003-2, CIP-010-4 and
others, have been risk-based (some more than others, truth be told).
The problem with risk-based NERC CIP requirements today is
there has been very little guidance to NERC entities or Regional auditors on
how to comply with or audit risk-based CIP requirements. This was most vividly
demonstrated in the Notice of Proposed Rulemaking (NOPR) that FERC issued in
September regarding CIP-013, which is an entirely risk-based standard. In my post
on the NOPR, I quoted the following section found near the end:
…we are concerned that the existing SCRM Reliability
Standards lack a detailed and consistent approach for entities to develop
adequate SCRM (supply chain risk management) plans related to the (1)
identification of, (2) assessment of, and (3) response to supply chain
risk. Specifically, we are concerned that the SCRM Reliability
Standards lack clear requirements for when responsible entities should perform
risk assessments to identify risks and how those risk assessments should be
conducted to properly assess risk. Further, we are concerned that
the Reliability Standards lack any requirement for an entity to respond to
supply chain risks once identified and assessed, regardless of severity.
In other words, FERC issued the NOPR because they do not
think NERC did a good job of drafting either CIP-013-1 or CIP-013-2. They are
considering ordering NERC to revise CIP-013 so it truly requires NERC entities
to develop and implement a supply chain cyber risk management plan.
I agree with FERC’s opinion, but I want to point out that
just asking NERC to re-draft CIP-013 will not necessarily fix the problem. This
is because today NERC entities don’t know how to comply with a risk-based
standard within the NERC auditing framework. It is also because most CIP
auditors have limited experience in auditing risk-based requirements.
Rather than repeat this sorry story with the new cloud
standards, it’s important that NERC and the Regions figure out how risk-based
requirements can be audited.
What I call risk-based requirements are what NERC calls
“objective-based” requirements. I used to think the two terms were synonymous,
but I now realize they’re complementary. A requirement to achieve an objective inherently
requires the entity to identify risks it faces; the entity must formulate a plan
to assess those risks and mitigate them. Of course, “mitigate” doesn’t mean
“eliminate”; it just means “make better”. Since no entity has unlimited
resources available, a plan to mitigate risks will always leave some risk on the
table; of course, that is called residual risk.
This will be easier to understand if we make up an example.
Suppose a contractor has agreed to build a new building. The customer requires
them to develop a plan to identify and mitigate the risks that could prevent
them from finishing the building on time: inclement weather, materials
shortages, etc.
The contractor lists all the risks, assesses the likelihood
that each risk will be realized, and determines the impact (in this case, days
of delay) if the risk is realized. For each risk, the contractor multiplies
likelihood times impact and determines the expected delay if the risk is
realized.
In this example, the objective is finishing the building on
time, while the risks are the different possible causes of delay. So, “risk-based”
and “objectives-based” always go hand in hand. If you’re required to achieve any
objective, you always must mitigate risks, and if you need to mitigate risks,
the only way you can identify them is to know the objective you’re trying to
achieve. If there’s no objective, there’s no risk and vice versa.
In the case of CIP-013, the objective is to secure BES Cyber
Systems (and also EACMS and PACS) by making sure the suppliers of those systems
follow secure policies and practices. Of course, the risks have to do with
suppliers not following secure policies and practices – for example, in software
development or adequately vetting their employees.
However, what does CIP-013 require today? Only that NERC entities
develop a plan to “identify and assess” supply chain cybersecurity risks to BCS,
EACMS and PACS. There is no indication of what those risks are. R1.2 lists six
specific controls that NERC entities must practice, but those were never
intended to be the entirety of supply chain security risks. Rather, they were
six items that FERC included at various random places in their 2016
order to develop a supply chain standard; the drafting team just decided to
gather them in one place. However, far too many entities limited their CIP-013
programs to just those six controls and ignored the requirement to “identify
and assess” risks altogether. This was one of the main reasons why FERC issued
their NOPR last year.
Here's how I would rewrite CIP-013 (and I’ve been saying
this for years): I wouldn’t require the entity to take specific actions (although
it wouldn’t be the end of the world if R1.2.1 – R1.2.6 were allowed to remain
in the standard). However, I would require that the plan address
specific areas of risk. These can include secure software development practices,
vetting employees for security risks, policies to ensure secure remote access to
devices inside an ESP (i.e., not just what CIP-005 R2 requires), etc.
For each of those areas, the entity would need to identify
and assess supply chain risks. If they say one of those areas doesn’t apply to
them, they would need to explain why. For example an entity’s reason for not looking
at risks from remote access might be that the entity only allows its own
employees to access devices in their ESPs remotely.
For all the other areas of risk, the entity will need to identify
a set of risks that they will address in their plan. For example, in the
software security area of risk, individual risks include an insecure development
environment, the supplier not reporting vulnerabilities when the software is
being used by customers, etc.
How will this process be audited? It will come down to the judgment
of the auditors that the entity did a good job of identifying and assessing risks
in each area of risk. However, a lot of NERC entities are deathly afraid
of having to rely on the judgment of their auditors. This isn’t because the
auditors don’t exercise good judgment in general (e.g., they have lots of car
accidents), but because NERC won’t ever take a stand on what a requirement
means and provide true guidance.
If NERC did that, auditor judgment would become a non-issue,
since both the entity and the auditor would rely on NERC’s guidance. A guidance
document would list the major areas of supply chain cybersecurity risk, as well
as the major risks in each of those areas. For each major risk, the entity
would need to a) present a credible plan for mitigating that risk, or b)
explain why the risk doesn’t apply in their case.
However, NERC doesn’t issue its own guidance on
compliance, because to do so would amount to “auditing itself”. That is, if NERC
tells the entity how to comply with a requirement, they would…what? Take all the
fun out of compliance by making it a dull paint-by-numbers exercise? If the entity
thereby achieves exactly what NERC wants them to achieve by following their guidance,
why not encourage that behavior?
I’ve also heard other excuses for NERC’s policy, including
that it’s included in GAGAS
- although nobody has shown me where it says that. What this discussion usually
comes down to is someone saying that the NERC Rules of Procedure (RoP) include an
admonition against NERC developing its own guidance. Nobody has shown me that either,
although I’ll admit I haven’t asked a lot of people about this.
But let’s stipulate that the RoP does prevent NERC
from providing compliance guidance to NERC entities. Why would that be so
terrible? After all, since NERC’s guidance will presumably reflect their view
of what’s best for the Bulk Electric System, wouldn’t it be better for the BES
if all NERC entities followed that guidance than if they didn’t? What is gained
by withholding that information?
I think the problem here is that NERC has based their auditing
on financial auditing, where it’s very important that auditors not offer
guidance that dishonest financial institutions could distort to justify
improper practices. However, cybersecurity is inherently a risk management exercise,
in which one practice might mitigate one risk but not another; therefore, an
auditor needs to exercise judgment regarding whether a particular control is effective
in a particular situation. Finance isn’t that sort of exercise.
The moral of this story is that auditing risk-based
requirements won’t work without the auditors being able to exercise judgment.
Of course, the auditing rules (presumably in the RoP) will need to require that
auditors distinguish between an entity that made an honest mistake in managing
a risk and an entity that decided to ignore a risk entirely because they didn’t
feel like bothering with it.
And this, boys and girls, is why I think the “cloud SDT” needs to be prepared for a very long ride. I think they have to deal with this problem of auditing risk-based requirements, which may require changes to the Rules of Procedure. If they don’t do that, they’ll most likely end up repeating the CIP-013 experience: creating a standard that, even if it’s initially approved by FERC, turns out not to be very effective and ends up requiring a complete rewrite.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
My book "Introduction to SBOM and VEX"
is available in paperback
and Kindle versions! For background on the book and the link to order it,
see this post.
No comments:
Post a Comment