Monday, February 10, 2025

How can NERC audit risk-based requirements?

The Risk Management for Third-Party Cloud Services Standards Drafting Team is off to a running start this year.  I say “start”, because during the 5-6 months that the SDT met last year, they didn’t even start to draft any standards. Instead, they were doing what NERC drafting teams sometimes do: re-think the Standards Authorization Request (SAR) that forms their “charter”. They spent the entire fall discussing what they are going to discuss when they start drafting the new (or revised) cloud standards.

There is nothing wrong with doing this, especially when the SDT has an especially weighty burden – and it’s hard to think of a NERC CIP SDT that’s had a weightier burden than this one has, except perhaps the team (called “CSO706” for Cyber Security Order 706) that drafted CIP versions 2,3,4 and 5. In fact, one member of that team, Jay Cribb, is a member of the cloud team. The CSO706 team first met in 2008. Their last “product”, CIP version 5 (essentially, the version that’s still in effect today), came into effect in 2016.

In my opinion, one essential attribute for any requirements they create is that they be risk-based. That’s my term, but NERC refers to them as “performance-based”. While some CIP requirements today are truly risk-based (even though they may not mention the word “risk”), others are not.

In fact, a small number of CIP requirements like CIP-007 R2 patch management and CIP-010 R1 configuration management are highly prescriptive, and require compliance on the physical or virtual device level. Cloud service providers don’t track systems based on the device on which they reside, since doing so would require breaking the cloud model. This means they will never be able to provide the evidence required for a NERC entity customer to prove compliance with these prescriptive requirements.

This is why I think all CIP requirements going forward, but especially requirements having to do with use of the cloud, need to be risk-based, and can’t refer (even implicitly) to devices at all. In fact, since CIP v5 came into effect in 2016, I believe that all subsequent CIP requirements and some entire standards, including CIP-012, CIP-013, CIP-014, CIP-003-2, CIP-010-4 and others, have been risk-based (some more than others, truth be told).

The problem with risk-based NERC CIP requirements today is there has been very little guidance to NERC entities or Regional auditors on how to comply with or audit risk-based CIP requirements. This was most vividly demonstrated in the Notice of Proposed Rulemaking (NOPR) that FERC issued in September regarding CIP-013, which is an entirely risk-based standard. In my post on the NOPR, I quoted the following section found near the end:

…we are concerned that the existing SCRM Reliability Standards lack a detailed and consistent approach for entities to develop adequate SCRM (supply chain risk management) plans related to the (1) identification of, (2) assessment of, and (3) response to supply chain risk.  Specifically, we are concerned that the SCRM Reliability Standards lack clear requirements for when responsible entities should perform risk assessments to identify risks and how those risk assessments should be conducted to properly assess risk.  Further, we are concerned that the Reliability Standards lack any requirement for an entity to respond to supply chain risks once identified and assessed, regardless of severity. 

In other words, FERC issued the NOPR because they do not think NERC did a good job of drafting either CIP-013-1 or CIP-013-2. They are considering ordering NERC to revise CIP-013 so it truly requires NERC entities to develop and implement a supply chain cyber risk management plan.

I agree with FERC’s opinion, but I want to point out that just asking NERC to re-draft CIP-013 will not necessarily fix the problem. This is because today NERC entities don’t know how to comply with a risk-based standard within the NERC auditing framework. It is also because most CIP auditors have limited experience in auditing risk-based requirements.

Rather than repeat this sorry story with the new cloud standards, it’s important that NERC and the Regions figure out how risk-based requirements can be audited.

What I call risk-based requirements are what NERC calls “objective-based” requirements. I used to think the two terms were synonymous, but I now realize they’re complementary. A requirement to achieve an objective inherently requires the entity to identify risks it faces; the entity must formulate a plan to assess those risks and mitigate them. Of course, “mitigate” doesn’t mean “eliminate”; it just means “make better”. Since no entity has unlimited resources available, a plan to mitigate risks will always leave some risk on the table; of course, that is called residual risk.

This will be easier to understand if we make up an example. Suppose a contractor has agreed to build a new building. The customer requires them to develop a plan to identify and mitigate the risks that could prevent them from finishing the building on time: inclement weather, materials shortages, etc.

The contractor lists all the risks, assesses the likelihood that each risk will be realized, and determines the impact (in this case, days of delay) if the risk is realized. For each risk, the contractor multiplies likelihood times impact and determines the expected delay if the risk is realized.

In this example, the objective is finishing the building on time, while the risks are the different possible causes of delay. So, “risk-based” and “objectives-based” always go hand in hand. If you’re required to achieve any objective, you always must mitigate risks, and if you need to mitigate risks, the only way you can identify them is to know the objective you’re trying to achieve. If there’s no objective, there’s no risk and vice versa.

In the case of CIP-013, the objective is to secure BES Cyber Systems (and also EACMS and PACS) by making sure the suppliers of those systems follow secure policies and practices. Of course, the risks have to do with suppliers not following secure policies and practices – for example, in software development or adequately vetting their employees.

However, what does CIP-013 require today? Only that NERC entities develop a plan to “identify and assess” supply chain cybersecurity risks to BCS, EACMS and PACS. There is no indication of what those risks are. R1.2 lists six specific controls that NERC entities must practice, but those were never intended to be the entirety of supply chain security risks. Rather, they were six items that FERC included at various random places in their 2016 order to develop a supply chain standard; the drafting team just decided to gather them in one place. However, far too many entities limited their CIP-013 programs to just those six controls and ignored the requirement to “identify and assess” risks altogether. This was one of the main reasons why FERC issued their NOPR last year.

Here's how I would rewrite CIP-013 (and I’ve been saying this for years): I wouldn’t require the entity to take specific actions (although it wouldn’t be the end of the world if R1.2.1 – R1.2.6 were allowed to remain in the standard). However, I would require that the plan address specific areas of risk. These can include secure software development practices, vetting employees for security risks, policies to ensure secure remote access to devices inside an ESP (i.e., not just what CIP-005 R2 requires), etc.

For each of those areas, the entity would need to identify and assess supply chain risks. If they say one of those areas doesn’t apply to them, they would need to explain why. For example an entity’s reason for not looking at risks from remote access might be that the entity only allows its own employees to access devices in their ESPs remotely.

For all the other areas of risk, the entity will need to identify a set of risks that they will address in their plan. For example, in the software security area of risk, individual risks include an insecure development environment, the supplier not reporting vulnerabilities when the software is being used by customers, etc.

How will this process be audited? It will come down to the judgment of the auditors that the entity did a good job of identifying and assessing risks in each area of risk. However, a lot of NERC entities are deathly afraid of having to rely on the judgment of their auditors. This isn’t because the auditors don’t exercise good judgment in general (e.g., they have lots of car accidents), but because NERC won’t ever take a stand on what a requirement means and provide true guidance.

If NERC did that, auditor judgment would become a non-issue, since both the entity and the auditor would rely on NERC’s guidance. A guidance document would list the major areas of supply chain cybersecurity risk, as well as the major risks in each of those areas. For each major risk, the entity would need to a) present a credible plan for mitigating that risk, or b) explain why the risk doesn’t apply in their case.

However, NERC doesn’t issue its own guidance on compliance, because to do so would amount to “auditing itself”. That is, if NERC tells the entity how to comply with a requirement, they would…what? Take all the fun out of compliance by making it a dull paint-by-numbers exercise? If the entity thereby achieves exactly what NERC wants them to achieve by following their guidance, why not encourage that behavior?

I’ve also heard other excuses for NERC’s policy, including that it’s included in GAGAS - although nobody has shown me where it says that. What this discussion usually comes down to is someone saying that the NERC Rules of Procedure (RoP) include an admonition against NERC developing its own guidance. Nobody has shown me that either, although I’ll admit I haven’t asked a lot of people about this.

But let’s stipulate that the RoP does prevent NERC from providing compliance guidance to NERC entities. Why would that be so terrible? After all, since NERC’s guidance will presumably reflect their view of what’s best for the Bulk Electric System, wouldn’t it be better for the BES if all NERC entities followed that guidance than if they didn’t? What is gained by withholding that information?

I think the problem here is that NERC has based their auditing on financial auditing, where it’s very important that auditors not offer guidance that dishonest financial institutions could distort to justify improper practices. However, cybersecurity is inherently a risk management exercise, in which one practice might mitigate one risk but not another; therefore, an auditor needs to exercise judgment regarding whether a particular control is effective in a particular situation. Finance isn’t that sort of exercise.

The moral of this story is that auditing risk-based requirements won’t work without the auditors being able to exercise judgment. Of course, the auditing rules (presumably in the RoP) will need to require that auditors distinguish between an entity that made an honest mistake in managing a risk and an entity that decided to ignore a risk entirely because they didn’t feel like bothering with it.

And this, boys and girls, is why I think the “cloud SDT” needs to be prepared for a very long ride. I think they have to deal with this problem of auditing risk-based requirements, which may require changes to the Rules of Procedure. If they don’t do that, they’ll most likely end up repeating the CIP-013 experience: creating a standard that, even if it’s initially approved by FERC, turns out not to be very effective and ends up requiring a complete rewrite.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.


No comments:

Post a Comment