Tom Alrich's Blog: The SDT Breaks new Ground

In my last post (which you need to read before this one), I pointed out how impressed I am by the CIP Modifications Standard Drafting Team’s new approach to incorporating virtualization into the CIP standards – especially since I regarded their previous approach as being impossible to get approved, even if it could ever be fully drafted. They outlined this new approach in a draft white paper a couple weeks ago (which isn’t posted but which I can send to you if you email me at the address at the bottom of this post), and in a webinar on June 29. The slides from that webinar are now available here, while the recording is available here.

At the end of that post, I described the two great ideas that are driving this new approach: 1) Do away with the definitions of Cyber Asset and BES Cyber Asset and make BES Cyber System the fundamental concept for applicability of the CIP standards; and 2) Rewrite the prescriptive technical requirements (found in CIP-005, CIP-007 and CIP-010) in a non-prescriptive fashion, so that there is no need to draft – and ballot once, then re-draft, ballot twice, re-re-draft, ballot a third time, etc. – the huge number of detailed changes to these requirements that would be needed to accommodate virtualization.

I concluded the post by saying that, for the first idea, I saw a few potential problems but no show-stoppers, while for the second I did see some potential show-stoppers, “although none of them wouldn’t be surmountable by the SDT.” I now wish to modify those statements, since a) I see potential show-stoppers for both ideas; and b) I’m skeptical about whether the SDT can solve them, since the only real solution will be rewriting NERC’s Compliance Monitoring and Enforcement Plan (CMEP) so that there are different rules for CIP than for the other NERC standards (however, there is a half-solution that doesn’t require CMEP changes. I will discuss that as well, although not in this post).

That’s the bad news. The good news is i) The show-stoppers mostly come down to one problem: How will compliance with these requirements be audited? and ii) The changes to CMEP are already needed anyway. Furthermore, the need for these changes will only continue to grow as new and modified CIP requirements and standards are added. Maybe virtualization will be the catalyst for making these changes (which, of course, will require a big effort by NERC, FERC, the Regions, and the NERC entities – no hiding that fact).

Let’s focus on Great Idea Number 2 (and I’ll hope to discuss Great Idea Number 1 in a future post). In the webinar and white paper, the SDT pointed to CIP-007 R3 as an example of a non-prescriptive requirement that could be a model for rewriting prescriptive technical requirements like CIP-005 R1 and CIP-007 R2. Part 3.1 of CIP-007 R3 reads, in its entirety, “Deploy method(s) to deter, detect, or prevent malicious code.” Doesn’t this sound simple? You couldn’t get more non-prescriptive than this!

Yes it is simple, but the question is: How will it be audited without requiring auditor judgment – and a lot of it at that? For example, you need to “deploy methods” to do one of three things: “deter, detect or prevent” malicious code. But how good do these methods have to be? As Lew Folkerth pointed out in a presentation that I wrote about in a post last year, it should always be assumed that the methods need to be “effective”. Saying a certain chant every morning to protect against malware isn’t an effective method. If you tell the auditor that’s your method, you will probably get a PNC. And I won’t have a huge amount of sympathy for you.

But beyond that, how can this requirement be audited? Let’s suppose an entity deploys solely detective methods, meaning they aren’t likely to ever find out about the presence of a virus on their network until one or more devices have already been infected. Do you think that, in the case where your network consists mostly of Windows or Linux machines for which antivirus software is a perfectly workable option, an auditor isn’t going to ask why you’re satisfied with just detecting malware? And if you tell him to go take a hike, since you’re clearly “deterring, detecting or preventing” malware as the requirement says, do you think that will be the end of the story? Don’t you think he’ll say something like “Well, the risks posed by malware are such that merely detecting it, when there’s a tried and true (and inexpensive) method for preventing it as well, isn’t enough”?

I’m not saying you wouldn’t win this fight – in fact, even if the auditor gave you a PNC for this, I doubt you would ever actually be found in violation, simply because the auditor is going beyond what the language strictly says in this case. If the requirement had been more prescriptive, you could avoid this problem. For example, the requirement could say that prevention should always be the preferred option, and that deterrence or detection should only be relied on by themselves if there is no good prevention option. But of course, this would then leave an ambiguity about what criteria should be used to establish that there is no “good” prevention option, so that would need to be put into the requirement as well – making it even more prescriptive. And so on.

In other words, what’s the best way that NERC entities can be spared the pain of falling into auditing problems like this, when a requirement is written non-prescriptively, like CIP-007-5 R3? The best way is to make the requirement extremely prescriptive! In fact, this requirement’s upstairs neighbor, CIP-007-5[i] R2 (patch management), is prescriptive precisely for this reason. Here’s why…

In CIP v3, the patch management requirement was CIP-007-3 R3, which read “The Responsible Entity…shall establish, document and implement a security patch management program for tracking, evaluating, testing, and installing applicable cyber security software patches for all Cyber Assets within the Electronic Security Perimeter(s).” The two sub-requirements under it (which is what requirement parts were called in those benighted days) read:

R3.1. The Responsible Entity shall document the assessment of security patches and security upgrades for applicability within thirty calendar days of availability of the patches or upgrades.

R3.2. The Responsible Entity shall document the implementation of security patches. In any case where the patch is not installed, the Responsible Entity shall document compensating measure(s) applied to mitigate risk exposure.

While I’m sure this language sounded pretty reasonable when CIP v3 was drafted, in practice there were lots of disputes between NERC entities and auditors – and many violations handed out, I believe – regarding this requirement. The problem wasn’t that it was prescriptive (in v3, pretty much all of the requirements were prescriptive); the problem was that it wasn’t prescriptive enough. If you compare CIP-007-3 R3 to CIP-007-5 R2, you’ll see what I mean by this.

Here’s an example: In CIP-007-3 R3, while the entity was required to assess patches for applicability within 30 days of their availability, they weren’t required to look for them in the first place. This, of course, led to arguments between entities and auditors, who often felt it was a reasonable expectation that the entity should look for patches, not just wait for an email to show up. Of course, since the requirement clearly didn’t mandate looking for new patches, I doubt that any potential violation findings for not doing so were ultimately upheld.

But this experience was very much on the CIP v5 drafting team’s mind as they worked on the CIP v5 patch management requirement (which is of course identical to the v6 one). And how did they fix this problem? They made the requirement more prescriptive by requiring the entity to look for available patches for every piece of software installed on a device within their ESP. Thus, they established once and for all what was required and what wasn’t – and they seem to have greatly reduced or even eliminated interpretation problems for this requirement[ii].

There were other problems with CIP-007-3 R3, which were resolved in a similar fashion:

Since there was no requirement that every applicable patch had to be either installed or the vulnerability mitigated, CIP-003-5 R2.3 added a requirement for this, as well as a 35-day time limit for that.
Since there was no requirement that mitigation activities needed to remain in place until they weren’t needed anymore, this was made explicit in R2.4.

So the CIP patch management requirement, which was already prescriptive in CIP v3, was made much more prescriptive in v5. Why did the drafting team do this? It certainly wasn’t because they didn’t know there was any alternative to prescriptive requirements, since several of the other v5 requirements were deliberately made non-prescriptive (of course, CIP-007-5 R3 was the best example of this. But CIP-011-1 R1 and CIP-003-5 R3 are also examples, as well as others).

No, making the patch management requirement much more prescriptive was a deliberate defensive measure. The drafting team thought that the only way to put auditing problems to rest, like those listed above, was to establish once and for all what was required and what wasn’t. They obviously decided that the auditing problems that came with having a somewhat prescriptive requirement like CIP-007-3 were such a huge time drain for people involved with CIP compliance that it would be better for them to have to comply with a much more prescriptive requirement that at least wasn’t ambiguous.[iii]

And how well has this turned out? I haven’t taken a scientific survey, but I have yet to talk with a NERC entity – with High or Medium impact BES Cyber Systems – that doesn’t put CIP-007-6 R2 at or very near the top of the list of current CIP requirements that cause them headaches – and that require huge amounts of resources.[iv] One CIP manager at a medium-sized utility told me that, of all the documentation they generate for all the NERC requirements (not just the CIP requirements) in their control centers every year, at least half of that documentation is due to this one requirement. This is quite telling, especially when you consider there are probably around 150 total NERC requirements.

So I hope I’ve convinced you that the only way to come close to eliminating auditing problems caused by differing interpretations of a requirement between the auditor and the auditee, given the current prescriptive NERC compliance enforcement regime, is to make the requirements as prescriptive as possible. If you’d like to see all of the CIP standards go “back to the future” and be made as prescriptive as possible, raise your hand…I didn’t think I’d see any hands. So now it seems like we’re between a rock (auditing problems when requirements aren’t prescriptive) and a hard place (super-prescriptive requirements being the best way to reduce auditing problems!).

The CIP Modifications SDT would like to make CIP-007 R2 a non-prescriptive requirement, because – as they rightly point out – this and other prescriptive requirements (the example they used in the webinar was CIP-005 R1, although CIP-011 R1 will also be high on most people’s list of prescriptive CIP requirements) make it almost impossible to incorporate virtualization into CIP. To go back to the two forks in the road that they talked about in the webinar (see my previous post for these), the left fork – which is essentially the path they were pursuing until earlier this year and is the one I criticized as unworkable - requires drawing up and balloting some very complicated and certainly controversial definitions like Electronic Security Zone and Centralized Management System. It also requires making a slew of modifications to CIP requirements to accommodate these things, which will be very hard – if not impossible - to accomplish with prescriptive technical requirements still in place (of course, drafting all these changes will be hard, but that will be the easy part. Each change would have to be balloted multiple times and endlessly debated).

Of course, it was crystal clear in the white paper and the webinar that the SDT has already rejected the left fork. But what about the right fork? That’s the one based on the two Great Ideas I listed above. The key to that fork is making BES Cyber System the fundamental building block of CIP compliance, which eliminates a lot of very difficult questions that would be involved with defining something like “virtual Cyber Assets” or “virtual BES Cyber Assets” (as I used to think had to be done before anything else could be accomplished on virtualization). A system can be composed of physical or virtual components, or a mixture of them. It will be up to the entity to decide what their systems are and identify those that meet the new BCS definition (which will incorporate the 15-minute impact criterion from the BES Cyber Asset definition).[v] The SDT believes that with this change, there will probably be no need to change any of the “non-technical” requirements (which includes all CIP-003 through CIP-011 requirements other than those in CIP-005, CIP-007 and CIP-010). But as for the technical requirements, they will all need to be made non-prescriptive, if they aren’t already so.[vi]

What if we tried to make CIP-007-6 R2 non-prescriptive? What would it look like? If we use CIP-007-6 R3 as a model, and also using the fact that the threat that patch management mitigates is that of software vulnerabilities, it might read something like “Deploy methods to protect against software vulnerabilities[vii].”

Let’s do a thought experiment here: Think of an auditor from your NERC Region. What would they say if they came to audit you two years after the above non-prescriptive version of CIP-007 R2 came into effect, and you told them you had decided that patch management was the best way to mitigate the threat of software vulnerabilities? I think they would be glad to hear that, wouldn’t you? Indeed, it’s hard to think of any program to mitigate this threat that wouldn’t include patch management in some way.

But what if you then told them that your patch management program includes checking for new patches for very important software every month but for other software only once a year? And that, once a patch has been deemed applicable, you’re deploying or mitigating it within one month for very important software, but within one year for other software? My guess is the auditor will say “Well, I like the fact that you’re checking for and applying new patches for important systems every month, but you’re only taking those steps for other systems once a year. I want you to cut the interval for other software down to about three months.”

And what will be your reply to him? Will you simply point out that what he’s requesting goes way beyond the language of the requirement? In fact, you might point out to him that there’s nothing in the language to prevent your patching even important software only once a year (or every ten years, for that matter). What is going to give here? You’re right that the requirement language doesn’t require that you do what he is saying; and he’s right when he says it would be hard to find any cybersecurity expert who would assert that patching software once a year was OK. You and the auditor will be at an impasse.

Assuming this were all to come to pass, who’s at fault here?

You, since you’re clearly not following good cybersecurity practices?
The auditor, since he’s trying to interpret the patch management requirement to say more than it does?
The CIP Modifications drafting team, for getting rid of the prescriptivity of the patch management requirement, causing this sort of impasse to be possible?

Let’s state the problem here. It isn’t that ambiguity or loose wording is causing auditors and auditees to take different “interpretations” of what the words of a requirement mean, which is how most people would state the problem. If that is the problem, then it can only be solved by doing one of the following:

Doing a Request for Interpretation, which only works in a case where the wording of the requirement or definition isn’t ambiguous, but still needs to be “teased out” to deal with a particular situation. But this requires going through a process with multiple drafts and ballots, followed by FERC approval (which isn’t always forthcoming, which the then-Interpretations Drafting Team found out in 2013, when FERC remanded two Interpretations they’d worked on for a couple years. That’s why there is no longer a NERC CIP IDT).
Providing some sort of mandatory “guidance” that will be more or less binding on auditors and NERC entities. This was first tried with the CANs and CARs under CIP v3, which were almost all retracted. And when CIP v5 was approved by FERC and entities started realizing there were a lot of ambiguities, NERC tried to do this using a whole slew of vehicles – FAQs, RSAWs, the CIP v5 transition study, Lessons Learned, Memoranda. Almost all of these were retracted as well, simply because NERC isn’t allowed to provide mandatory interpretations.
Revising the ambiguous standards or definitions, which of course requires a new drafting team, ballots, etc. What NERC finally did in 2015 – when all the guidance attempts were clearly not working - was refer a few of the thorniest issues with CIP v5 to a new SDT (although they left out many other issues). However, the SDT has so far not made any progress on these issues, since they had to work on more pressing FERC mandates such as CIP-003-7. But their new virtualization proposal will very neatly “address” four of the five items that NERC added to their mandate. Those four items all deal with the definitions of Cyber Asset and BES Cyber Asset, and since the SDT is proposing to just eliminate those two definitions altogether, they are killing five stones with one bird. Pretty neat! However, there’s no reason to suspect that all CIP ambiguities can be dealt with by simply eliminating the requirements or definitions in question. Would it were so!

Which is all to say that if ambiguous wording is really the problem, it’s an insoluble one; but it isn’t the problem. I used to believe that the problems with CIP v5 were due to ambiguity, but now I’ve decided they’re much more fundamental. The problem is that cyber security isn’t electrical engineering. NERC was founded by engineers trying to solve problems caused by lack of standardization among the different power market participants (NERC was founded in the wake of the Northeast Blackout of 1965, the proximate cause of which was an improper relay setting).

The solution to these problems was standards requiring very specific actions that could be very accurately audited: You either set your relays according to the standard or you didn’t.[viii] In other words, the only kind of requirements that make sense in the 693 world are measurable ones.[ix] NERC's auditing program was developed to govern audits of measurable requirements, so it countenances nothing but rigid “either you did it or you didn’t do it” judgments.

However, cybersecurity is a statistical process. If one utility doesn’t patch one system in its control center for two months, it isn’t likely that this alone will lead to a cascading BES outage. If ten utilities – all neighbors – don’t patch any of the servers in their control centers for ten years, this could very well lead to a cascading outage, but even that isn’t certain. So where do you draw the line when you’re drafting a patch management requirement?

The answer is you can’t draw a line. What’s needed is for the entity to have in place a good program for patch management, period. The only way to judge whether a program is good or not is for a) the requirement to be written non-prescriptively; and b) the auditor to be able to exercise good judgment, and to be able to provide advice to the entity so that they can improve their cybersecurity program. But NERC doesn’t allow auditors to do this now.

What the CIP Modifications SDT wants to do, in the second Great Idea that is part of their proposal to deal with virtualization in CIP, is to implement a) but not b). And since b) requires developing a CIP-specific version of NERC's auditing methodology, I can see why they wouldn’t want to tackle this. But this means that just moving ahead with their Great Ideas simply won’t work, and the result will be to greatly increase the kinds of conflicts between auditors and auditees that we’ve seen too much of since CIP v5 became enforceable.

However, as I hinted above, there is a partial solution, that wouldn’t require rewriting NERC's auditing program, but would probably allow the SDT to move forward with their virtualization proposal. And this partial solution has already been implemented at least three times in CIP in the past two years – this is what I call “plan-based” requirements.

Since this is already a very long post and since I’m tired, I’m going to break here. I’ll be back in one week (probably not before) with the third post in this series, which will (I think) bring this discussion to its exciting conclusion. But I also want to point out that I could never adequately lay out these issues – of what the fundamental problems are with CIP and how to address them – in my blog, no matter how many posts I devoted to the subject. However, I’m now working on a book, with a co-author, where I am doing just that. It will be a big slog, but I believe I’m about halfway there, if not a little more. That will be my “final” answer to these very important questions.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013. And if you’re a security vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss any of this, you can email me at the same address.

[i] Of course, all of the CIP-007 requirements are the same under v6, the version currently in effect. But since I’m talking about the CIP v5 drafting process here, I want to refer to the v5 requirements.

[ii] While I’ve heard many complaints about the big burden of CIP-007-6 R2 compliance, I can’t remember a single complaint about an auditor interpreting that requirement differently from how the entity does. As I point out below in this post, this “victory” for NERC entities has come at a huge price, which is that the cost (in money and staff time) required to comply with this requirement is much greater than any other current (or past) CIP requirement.

[iii] I want to thank a longtime observer of the CIP drafting teams for pointing this out to me.

[iv] See this post for Lew Folkerth’s observation that – in his opinion (and I don’t know whether this has changed in the last year and a half or not) – any NERC entity that isn’t self-reporting violations of CIP-007 R2 doesn’t understand the requirement; in other words, the requirement is impossible to fully comply with, at least for entities with a lot of cyber assets in scope for CIP.

[v] I have a feeling that some auditors might object to leaving it completely up to the entity to determine what is and isn’t part of a BCS, with no underlying BCA definition. I hope to discuss this further in the fourth post in this series.

[vi] The SDT used the term “objectives-based” a few times in the webinar and the white paper. But a true objectives-based standard has to be measurable, since otherwise there is no way to determine whether or not the entity has measured the objective. But there are no measurable objectives in the field of cybersecurity. And please don’t tell me that a 35-day deadline for assessing patches for applicability, or a 24-hour deadline for removing access to BCS, is a cybersecurity objective! The objectives of cybersecurity are mitigating the various cyber threats, like malicious insiders or someone unauthorized taking remote control of an important system. There is no direct way to measure how well an entity has mitigated a threat, since the lack of realization of a threat may simply be due to luck, which could change at any moment.

[vii] The CIP Modifications SDT hinted in their white paper that they’re considering eliminating patch management as a requirement altogether, and replacing it with a requirement for vulnerability management. However, it seems to me that all the problems we’ve just been discussing with the patch management requirement would simply reappear (and perhaps in spades) in a vulnerability management requirement. Vulnerability management is certainly worth further consideration, but simply implementing it doesn’t mitigate auditing problems.

[viii] I realize I’m probably vastly oversimplifying the PRC standards here – I know they aren’t always clear-cut, either. But in the case of the 693 standards, questions about what they mean are real ambiguities, and can be resolved by one of the three methods just discussed in this post. That isn’t the case with CIP questions like what we’re discussing now.

[ix] Prescriptive standards are definitely measurable, but objectives-based ones are as well, as long as the objective itself is measurable. However, as I’ve already said, cybersecurity objectives aren’t measurable.

Tom Alrich's Blog

Monday, July 9, 2018

The SDT Breaks new Ground – Part 2

1 comment: