Saturday, April 6, 2024

NERC CIP: Is there a shortcut to the cloud?


As I pointed out in this post in January (and have many times in previous years), NERC entities with medium and/or high impact BES Cyber Systems, Electronic Access Control or Monitoring Systems (EACMS), and Physical Access Control Systems (PACS) can’t currently make full use of the cloud, unless they want to risk violating a number of CIP requirements literally every day of the year.

As more and more software and service providers (including security service providers) announce their software or services will only be delivered in the cloud in 1-2 years, there is real concern (including among NERC and Regional Entity staff members) that there could soon be significant impacts on both grid reliability and grid security. One major ISO has said they will need to lower their security rating in two years, due to their security service providers ceasing to offer an on-premises option.

While there will soon be a process underway to make all the changes to the CIP standards (and the NERC Rules of Procedure) that are needed to make use of the cloud fully “legal”, that process will take probably six years, and maybe longer than that. That process needs to continue, but it clearly won’t finish before the reliability and security impacts begin to be felt.

Why do we have this problem? It isn’t because the language of the current CIP requirements prohibits use of the cloud. Those requirements say nothing at all about the cloud. This is because they were originally drafted starting in 2008, when use of the cloud was considered quite risky by most NERC entities (and certainly by FERC). Of course, even today it’s doubtful that many organizations of any type think of the cloud as risk-free. However, the huge number of successful attacks against on-premises systems shows on-prem isn’t risk-free, either.

Instead, the reason why we say the current CIP standards prevent use of the cloud for medium and high impact systems is that the cloud service provider would never be able to provide the evidence the NERC entity needs to prove compliance with CIP-005 R1, CIP-007 R2, CIP-010 R1 and other current requirements. As everyone who is involved with NERC compliance knows, “If you don’t have evidence that you did it, you didn’t do it.”

Why is this the case? I’m going to let you in on a dirty little secret about CIP: There are more “implicit requirements” than “explicit requirements”. Explicit requirements are the ones listed in the standards, while implicit requirements are unwritten, but implied by explicit requirements. In other words, while a NERC entity can’t be cited for violating an implicit requirement, often performing an implicit requirement is a prerequisite for complying with an explicit requirement. If you don’t “comply” with the implicit requirement, you’ll be in violation of the explicit one.

One of the most important implicit requirements in CIP has to do with the fact that BES Cyber Systems (BCS) are defined simply as collections of one or more BES Cyber Assets (BCA). BCAs are physical devices you can point to, while BCS are just a function performed cooperatively by multiple devices. You can’t point to a BCS.

The problem arises because none of the CIP requirements today mentions BCAs, only BCS. For some of the requirements, like the ones for training, policies, etc., this is fine. However, think about CIP-007 R2 for patching. It applies to BCS, but do you apply a patch to a system or to a device? Of course, you patch the device. So, complying with CIP-007 R2 in fact requires that you first comply with the implicit requirement that says something like, “Everything required by each of the parts of CIP-007 R2 needs to be repeated for every device included in the BCS.”

Now, think of what the CSP would have to do to provide evidence of compliance with just CIP-007-6 R2.2, if you put a medium or high impact BCS in the cloud today. Every 35 days, they would need to:

1.      Inventory all the physical devices on which any portion of that BCS has resided during the last 35 days (of course, systems in the cloud are always spread across many servers and data centers, and they are moved around all the time. That’s how the cloud works). Each of those physical devices needs to be included in the inventory, even if just a small part of the BCS resided on it for just a few seconds.

2.      Inventory every piece of software that is installed on any of those devices (no matter which cloud customer the software belongs to) and inquire with the developer of the software whether they have released any patches in the past 35 days.

3.      For each of those patches, evaluate it for applicability to the system it’s part of (which will be impossible, since those systems are unlikely to be ones that your organization has anything to do with. They may be owned by entities in completely different industries, foreign nationals, etc.).

And remember, the CSP will need to perform all these actions and document that they did so, despite the fact that their normal patching procedures (or perhaps those of their customers, since most CSPs follow a “shared responsibility” model) may have already patched all of the software products identified in step 2. For prescriptive CIP requirements like CIP-007 R2, CIP-005 R1 and CIP-010 R1, the only evidence of compliance is evidence that the exact steps mandated by the requirement (and by any implicit requirements, as described above) were followed.

Do I need to go on? I didn’t think so. Remember, these three steps (which might need to be performed thousands or even tens of thousands of times) are just for one part of one requirement. Think of what the CSP would need to do, to provide evidence to a NERC entity to prove compliance with every part of every requirement for say 200 BCS for a three-year audit period! The CSP would have to provide many millions of pieces of evidence to the entity.

Thus, if you signed a contract to put just one BCS in the cloud and then got on a call with the CSP to explain what they need to do to gather the evidence you need, you would probably lose them as soon as you described the first step: I strongly doubt any CSP maintains a log of every device on which a piece of a single BCS might have resided over one week, let alone three years. The CSP would apologize for not properly understanding what you needed when you first signed the contract with them, but they would have to tell you that neither they nor any other CSP could ever provide you with compliance evidence for even a single BCS for one week. There’s simply no way they could do that, without completely breaking their business model.

However, recently it occurred to me that perhaps CIP-013-2 might offer a way to include cloud services within the scope of compliance, even for NERC entities with medium or high impact systems. Consider this:

1.      A NERC entity decides to implement a single medium impact BCS in the cloud. They do this by signing a contract with the CSP.

2.      CIP-013-2 R1.1 says the scope of R1 is “the procurement of BES Cyber Systems and their associated EACMS and PACS to identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services.” (my emphasis) In signing the contract, the entity isn’t procuring any hardware or software products from the CSP, but they’re certainly procuring services.

3.      Therefore, the relationship between the entity and the CSP falls into the scope of CIP-013. The entity should treat the CSP the same as they would treat any other service provider in scope for CIP-013.

Because the cloud service is one of the products and services that needs to be addressed in the NERC entity’s supply chain cybersecurity risk management plan, the entity will need to include it as one of the procured items in the plan. Just as they must do for all other procured products and services in scope, the entity needs to describe in the plan how they will “identify and assess cyber security risk(s) to the Bulk Electric System” arising from the cloud service.

What are these risks? At a minimum, they need to include the six risks described in R1.2.1 through R1.2.6.[i] But of course, there are lots of other risks that apply to cloud service providers. Rather than leave it up to each NERC entity to decide what those risks are, NERC will need to provide a list of types of cloud risks that must be addressed in the plan.

Since the CSPs will never permit every NERC entity to audit them, NERC could do an “audit” themselves. An important component of the audit might be reviewing the CSP’s FedRAMP authorization documentation to determine whether it meets a certain set of criteria established in advance. It would be up to the entity to decide whether to accept NERC’s audit in whole, in part, or not at all). Of course, there might be other steps in the CIP-013 cloud compliance process as well.

By following CIP-013, the NERC entity no longer needs evidence that the CSP has complied with requirements like CIP-007 R2, any more than they need evidence that other vendors addressed in CIP-013 (e.g. the vendor of relays used in substations) have complied with CIP-007 R2. However, the utility will need to provide evidence of the vendor’s compliance with CIP-004-7 Requirement Parts R3.4, R6.1, R6.2, and R6.3 (and perhaps one or two other Requirement Parts in CIP-004-7). Some CSPs may balk at providing this documentation, but given that the alternative (being required to provide reams of documentation that is literally impossible to produce) is much worse, they will hopefully agree this isn’t such a terrible fate.

Where’s the catch? For one thing, I’ll admit I’ve taken some liberties with the term “services”. There’s not much doubt that the CIP-013 drafting team never intended “services” to include cloud services – but they never defined the term, so there’s no way to know what they intended (moreover, the Rules of Procedure don’t provide any mechanism for considering the drafting team’s intentions as part of a CIP audit). I’ll also admit that the NERC “audit” of the CSPs, which I described above, would require changes to the Rules of Procedure, or perhaps some sort of temporary waiver. There will need to be some sort of intervention by someone at NERC (most likely the Board of Trustees) to smooth the path for this change.

But it’s important to keep the big picture in mind: Like it or not, within two or three years, if no change is made before then, the choice in 2-3 years will be between accepting a lower level of security and (perhaps) reliability for the power grid and allowing NERC entities with high and/or medium impact environments to utilize cloud-based software and services - while being in technical violation of a host of CIP requirements. Moreover, the latter option will be rushed into place due to a grid emergency, as opposed to having plenty of time now to carefully plan and implement the CIP-013 option.

You pays your money and you takes your choice.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

My book "Introduction to SBOM and VEX" is now available in paperback and Kindle versions! For background on the book and the link to order it, see this post.


[i] These six items are included in the standard because FERC mentioned each of them at various places (and in varying contexts) in Order 829 in 2016. They’re not there because the Standards Drafting Team (or FERC itself) considered them to be the most serious supply chain cybersecurity risks. The Responsible Entity is still required to examine risks and determine for themselves which ones are important enough to mitigate and which ones are not.

No comments:

Post a Comment