Tom Alrich's Blog: Cloud CIP, Part 4: What can we learn from the CIP-013 experience?

Lew Folkerth of the RF Region and Chris Holmquest of SERC wrote a great article recently titled “The emerging risk of not using cloud services” (my emphasis); I linked to it in this post. It eloquently made the point that, far from the cloud being too risky for NERC entities to use, the risk of not using the cloud is currently greater – and growing all the time. This is because more and more software and service providers (especially security service providers) are announcing that in the coming years they will only offer a cloud-only version of their product, or at least they will no longer implement new features in their on-premises version. Both reliability and security will suffer as a result.

One sentence in the article stood out for me: “New Reliability Standards will be required, and those standards will need to be risk-based.” The article goes on to say, “There are too many variables in cloud environments to be able to write prescriptive standards for these cases.”

Of course, I completely agree with this statement, but how will this work in practice? Fortunately, these won’t be the first risk-based standards. The honor of being the first entirely risk-based standard goes to CIP-013, which came into effect in 2020. If any of you were reading this blog regularly at that time, you may remember I was a huge fan of the fact that CIP-013 is a risk based standard; I was sure it would be very successful and make everyone in NERC (both NERC entities and NERC auditors) love the idea of risk-based standards. But it didn’t quite work out like I’d hoped. Here’s my story:

I had been writing about CIP-013 from the beginning, when FERC issued Order 829 in July 2016. I participated in, and wrote about, the drafting team’s efforts to develop the standard. And I celebrated in 2018 when FERC approved CIP-013. Plus, I was the first person (that I know of) that advocated for pushing back the compliance date for CIP-013 when it became clear in March 2020 that Covid-19 was going to require the electric power industry - as well as almost every other industry - to drastically alter how they conducted their business.

Not coincidentally, I worked with some NERC entities to understand what CIP-013 compliance means. In the process of doing that, I developed what seemed to be a clear interpretation of what CIP-013 required (I wrote about this interpretation in a number of posts in 2017-2020). Here is my summary of the requirements of CIP-013 (it doesn’t matter whether you’re looking at version 1 or 2 of the standard; the only important difference between the two is greater applicability in v2):

1. CIP-013-2 R1 requires the NERC entity to “…develop one or more documented supply chain cyber security risk management plan(s) for high and medium impact BES Cyber Systems…” R1 goes on to list, in R1.1 and 1.2, items that must be included in the plan.

2. R1.1 requires that the plan “…identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services resulting from: (i) procuring and installing vendor equipment and software; and (ii) transitions from one vendor(s) to another vendor(s).”

3. R1.2 lists six risks (or more specifically, mitigations for risks) that must be included in the plan. These six risks had been cited by FERC at different places within Order 829, as items that needed to be included in the plan. The Standards Drafting Team that developed CIP-013 gathered these all into one Requirement Part: R1.2.

4. R2 requires the NERC entity to implement the plan they’ve developed in R1.

5. R3 requires the NERC entity to review and update the plan every 15 months.

That’s it. The entire Standard (minus the Measures) fits on a single page. When it was developed, I marveled at the sheer simplicity of CIP-013.

Of course, the heart of CIP-013 is R1, since that describes the plan that’s required. I interpreted R1 (and still do) to mean that the NERC entity must do the following:

1. Identify supply chain cybersecurity risks to the Bulk Electric System resulting from the BES Cyber Systems the entity may procure. Where should the NERC entity look to find these risks? Of course, there are lots of lists, the NATF Criteria being one list that is especially relevant to the BES. But the entity doesn’t have to confine itself to a pre-conceived list. One way to identify risks is to read the news.

Here’s an example of that: Right after CIP-013 came into effect in 2020, the SolarWinds attack was discovered (when it was discovered, the attackers had been present in SolarWinds’ development network for 15 months. During that time, they carefully prepared and tested their malware before infecting seven releases of the Orion platform. They even started by introducing a proof of concept for their malware design, to see if it could infect the platform with a benign piece of code; that succeeded. I fully expect the attackers to publish a case study of this engineering triumph someday).

Surely, the risk that a software supplier has an insecure development network is a big risk. One mitigation for that risk would be requiring your software suppliers to fill out the Attestation Form that CISA recently released for compliance with Executive Order 14028.

2. Rate each risk as high or low, based on its likelihood and impact. In this case, estimating impact is easy: BES Cyber Systems are classified as such precisely because the impact of their loss, compromise, etc. is high. This means the impact of any supply chain attack on BCS will always be high. Therefore, the only real variable is likelihood. To rate each risk, the entity must simply ask, “Is the likelihood high or low that this risk will be realized?” If likelihood is high, risk is high; if it’s low, risk is low as well.

Fortunately, if you just divide likelihood into high and low levels, estimating it is easy. For example, someone may point out that, if even a small meteorite crashed into your relay supplier’s factory, the factory might be incinerated; that would of course mean your organization would need to find another supplier of relays. That’s a huge impact, but what’s the likelihood? Probably less than the likelihood of being struck by lightning on a sunny day. This is a low risk.

Once you’ve rated your risks as high or low, you then need to focus on the high ones; obviously, those are the only ones that need to be mitigated. But are you obligated to mitigate every high risk? No. An important principle of risk management is that no organization has unlimited resources available for risk mitigation. Your organization needs to decide which of the high risks you can afford to mitigate, and just focus on those. This means you should assess your vendors just based on the risks you’re trying to mitigate. In your questionnaires, you shouldn’t ask a vendor about a risk if you don’t care what their answer is. You’re just wasting your and the vendor’s time.

3. Once you have developed a list of risks you wish to mitigate, you need to add to that list the six risks in R1.2 (if you haven’t already identified them as important risks independently). You need to do this, not because these are the most significant supply chain cybersecurity risks to the BES (although they are all important risks), and certainly not because they’re the only supply chain risks to the BES. The SDT included those risks in R1.2 because FERC had mandated them at various disconnected places in Order 829. In other words, the SDT was saying, “We want you to identify risks you think are important and mitigate them. But, since FERC wants these six risks to be in your plan, you need to make sure you include them as well.”

Given my interpretation of CIP-013, how did I think it would be audited? It seemed quite logical to me:

a) R1.1 would be audited based on how good a job the entity did of “identifying and assessing” risks. If they had made an honest effort to determine at least some of the most important supply chain cyber risks to the BES, that would be fine.

b) R1.2 would be audited based on whether the entity included the six risks in R1.2.1 – R1.2.6 in its plan.

c) R2 would be audited based on how well the entity implemented its plan – i.e., whether it took steps to mitigate all the risks it had said it would mitigate in the plan.

d) R3 would be audited based on whether the entity had reviewed its plan every 15 months, and whether they had honestly taken steps to fix any problems or deficiencies they found in the plan.

During the runup to CIP-013 implementation in 2017-2020, I wrote a number of posts on what CIP-013 means, in which I elaborated on the above logic. Frankly, I thought that logic was so compelling that it would be widely adopted by NERC entities. After all, why would the CIP-013 drafting team tell NERC entities to develop a plan to “identify and assess” risks to the BES if they didn’t mean it?

But I was wrong. From what I’ve heard, there are few NERC entities that have interpreted CIP-013 to be about anything more than R1.2.1 – R1.2.6. And now, I wonder why I ever thought otherwise. After all, if NERC entities have learned anything from their 15 or so years of experience with NERC CIP compliance, it’s that they need to keep their “compliance footprint” as small as possible. That is, they need to keep their nose close to the grindstone and never stray beyond the strictest possible interpretation of the standards. To do anything more than what’s strictly required doesn’t win you any Brownie points; in fact, it might possibly leave you with a completely avoidable violation – an “own goal”, if you will.

However, I’m not blaming NERC entities for this situation. I’m also not blaming NERC, and certainly not the auditors. I’m blaming these two facts:

First, the standard, which I had admired for its pristine simplicity, was in retrospect too simple. Instead of simply telling NERC entities to “identify and assess” risks, R1 should have given them suggestions on how to do that within R1 itself. For example, R1.1 might have included a set of ten or so “areas of risk” that must be identified in the plan, e.g. “vendor remote access”, “software development process”, “secure shipment”, etc. The entity would be required to scrutinize each of these areas for risks that they should add to their plan. In some cases, they would be justified in ignoring one of those areas entirely; for example, if they don’t allow vendor remote access at all, they obviously don’t need to worry about securing their vendor remote access system.

Doing this would also have given the auditors something to hang their hat on when they audited the entity for CIP-013 compliance, other than simply determining whether the entity had done a good job of developing their plan. Instead, they could have verified that the entity examined each of the ten areas and made a conscious effort to determine whether there were important risks for them in each area. Since they didn’t do that, NERC entities focused entirely on the six items that were clearly required by CIP-013 R1: the six risks in R1.2.

Thus, the first lesson to be learned from the CIP-013 experience is that, in the world of prescriptive requirements like CIP-010 R1 (configuration management) and CIP-007 R2 (patch management), handing a blank slate to both the entity and the auditor and saying “You can figure this out for yourself” – which is unfortunately what the SDT did[i] - is asking for trouble.

Second and more importantly, NERC auditors in general (i.e. for all the standards, not just the CIP standards) aren’t trained to judge how well an entity has assessed and mitigated risks; they’re trained to determine if they did or didn’t do X. While I’m sure some of them, especially CIP auditors, understand risk very well (if for no other reason than that almost every other mandatory cybersecurity standard is based on risk), for many of them it’s a foreign concept. NERC needs to develop methods for auditing risk-based requirements, not just prescriptive ones, and then train the auditors on those methods.

Of course, fixing these two problems won’t be easy. But if NERC CIP is going to make a successful transition to the cloud, these two problems will need to be addressed.

Are you a vendor of current or future cloud-based services or software that would like to figure out an appropriate strategy for the next few years, as well as beyond that? Or are you a NERC entity that is struggling to understand what your current options are regarding cloud-based software and services? Please drop me an email so we can set up a time to discuss this!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

[i] Having participated in many of that SDT’s discussions, I know why they made this mistake: FERC had given NERC a strict deadline to develop and approve the new supply chain security standard. The SDT couldn’t afford to add any provisions to CIP-013 that might stir up controversy and result in extra ballots being necessary (although there were controversies anyway). In other words, FERC’s deadline backfired spectacularly. This points to a big problem with NERC’s standards development process, at least when it comes to cybersecurity: you can have a comprehensive standard that takes a long time to approve, or you can have a minimal standard that gets approved relatively quickly. But you can’t have both.

Tom Alrich's Blog

Thursday, June 13, 2024

Cloud CIP, Part 4: What can we learn from the CIP-013 experience?

No comments:

Post a Comment