Tom Alrich's Blog: April 2016

Thursday, April 28, 2016

Still a Few Bugs in the System

I attended the one-day NERC Technical Conference on CIP Revisions last week in Atlanta. It was an excellent event, and was quite revealing – both for what was said and what wasn’t said. The slides are available here, although don’t be surprised at how short they are; there was lots of discussion that went well beyond the slides. Fortunately, the recording is available here.

The conference was called to discuss the Standards Authorization Request for the Standards Drafting Team drafting the next CIP version[i]; essentially, the SAR is the “agenda” for what the SDT will do. I wrote a post on what is in the SAR recently, so I won’t repeat this. I want to focus now on what’s not in the SAR.

Frankly, my biggest concern going into the meeting was that there would be no discussion allowed on anything not currently in the SAR (since the agenda didn’t have any place for “other business”). However, when I asked Tobias Whitney if the SDT would be limited to just dealing with the SAR items, he said no.

This was the best news I’ve heard in quite some time. I also recently wrote a post on what’s not in the SAR; it consists of 29 items, and is far from complete (in fact I’m going to discuss one of the items I didn’t include below. I’m sure I could sit in a room with three or four NERC CIP affindicios – essentially, people who don’t have a life - and identify well over 100 items that really should be on the SDT’s agenda).

Of course, there is simply no way the SDT could properly address every possible problem with CIP v5 and v6 in anything less than five or ten years, so I’m certainly not proposing that the agenda be hugely expanded now. But there is a cost associated with not addressing these issues: When CIP v7 rolls all nice and shiny off the assembly line in a few years, these items will be as unresolved as they are today.

This point was borne home to me by a Q&A discussion that started during Scott Mix’s presentation at the conference. It regarded an issue that I know is huge for a number of entities, and especially those in Florida: shared substations. The issue is, who is responsible for compliance when

A substation (or a generating station) is owned by more than one NERC entity, or
One entity owns BES Cyber Systems that are located at another entity’s substation?

The problem is that the CIP v5 and v6 standards (with CIP-002 being the most important for this question) don’t provide any clear guidance on this issue; moreover, it’s not in the SAR for v7. And my guess is that, despite the huge importance of this issue, it will not be addressed by the new SDT, because it will take a lot of time they simply don’t have. So, unless there is an RFI on this issue (and I’m not sure it is “RFI-able”), this problem will continue to fester until CIP is completely rewritten in some future version after v7.

What “guidance” is in CIP v5 and v6 now? The closest thing I can see is the first sentence of Section 4.2, which reads “For the purpose of the requirements contained herein, the following Facilities, systems, and equipment owned by each Responsible Entity in 4.1 above are those to which these requirements are applicable.” Does this shed light on the two questions listed above?

Regarding the first question about shared assets, it really doesn’t help. If a substation is jointly owned by two entities, it (usually) isn’t divided physically into one part owned by A and the other by B. So both parties “own” all the BCS in the substation, unless otherwise provided for.

Regarding the second question about BCS owned by one entity but located at another entity’s substation, I pointed out during this discussion at the conference that none of the three words “Facilities, systems and equipment” refer to a substation (Facilities is a NERC defined term that refers to the lines, transformers, etc. that may be located at a substation, but not to the substation itself. Neither systems nor equipment refers to a substation, either). However, a BCS would definitely be a “system”, so this might be taken to mean that ownership of the BCS is all that matters for compliance.

Nobody jumped up to declare this the final solution to the problem; nor had I expected anyone to do so. If you have to use reasoning like this, you’re relying on a pretty weak reed. This is because the problem really relates to what I have called the Original Sin of CIP v5 (in this post from 2014 - see the section titled “Have an Apple, Adam?”): the fact that CIP-002 R1 and Attachment 1 were written from two opposite points of view, and the contradictions were never resolved. For an explanation of why I say this is at the root of the ownership problem, see this end note.[ii]

At the meeting, Steve Noess of NERC tried to be helpful and point out that NERC (and the regions) will look at the wording of any joint operating agreement between the owners of the substation to determine who has compliance responsibility. This might help, assuming there is such an agreement and it does actually assign compliance responsibility for the BES Cyber Systems to a particular party. However, this statement has no legal force, and if an entity were fined on this basis and appealed to the courts, the fine would likely not be upheld.

So this problem really won’t be resolved unless the SDT takes it up; but as I’ve said, I doubt the SDT has time to deal with it, without delaying delivery of v7 by some months. This is just one of many problems that NERC entities are going to have to live with, until there is a complete rewrite of NERC CIP.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] And I sincerely hope NERC will simply admit that this will be CIP version 7, rather than continue to call this “CIP Revisions” - which was of course also the name of the SDT that drafted CIP version 6. Scott Mix’s presentation actually had the words “CIP Version 6” in the title. This is literally the first time I’ve ever seen an admission from NERC that there actually is a version 6. I also hope the v7 SDT doesn’t make the other mistake the v6 SDT did, which was not to revise the three v5 standards that weren’t actually changed in v6. The fact that entities will have to comply with seven v6 standards and three v5 ones has caused – and continues to cause – a great deal of confusion. This time, let’s call it v7 from the start and make sure every standard is revised, if nothing else just in its version number.

[ii] Technically, no assets – substations, generating stations or control centers – are “in scope” for CIP v5; there are only “assets that contain” High, Medium and Low BCS respectively, and it’s the BCS that are in scope. This means that the fact that a substation may be owned by two entities doesn’t mean anything; what should matter is who owns the BCS at the substation. Of course, if the BCS are jointly owned as well, then the problem remains.

On the other hand, assets clearly are in scope when it comes to Low impacts; there, what is in scope isn’t the BCS but the “asset containing Low impact BCS”. So this means that, if CIP-002-5.1 is going to be consistent (!), “Facilities, systems and equipment” in Section 4.2 must somehow include assets (it would definitely include them if Facilities hadn’t been capitalized. That’s another issue that should be on the SDT’s agenda, and – unlike most of the others – is almost trivial to fix). So assets really are in scope, and therefore who owns a substation does matter. But of course this contradicts what I said in the previous paragraph….

As I’ve said a number of times, the wording of CIP-002 R1 and Attachment 1 is a hopeless mess that can only be fixed by a comprehensive rewrite, not tweaking a few words here and there. But this is another issue that I simply don’t think the SDT will feel it has time for, and I agree with them. It's something the NERC community will simply have to live with until CIP is completely rewritten; I hope that will be in a non-prescriptive, risk-based format.

Monday, April 25, 2016

Battling the Hydra

In January, I wrote a post discussing a press release just put out by a group called the Foundation for Resilient Societies. The essence of this release was that communications between control centers and substations were being run unencrypted over the public Internet, and therefore pose a huge vulnerability for the power grid. Meanwhile, as the group asserted, FERC isn’t ordering NERC to put controls on substation communications, specifically encryption. Therefore, FERC’s inaction means the US grid remains substantially at risk.

I said in the post that my main problem with this was that I don’t know of a single utility that is using the public Internet to communicate with substations, with or without encryption; so the entire argument in the press release is based on a false assumption. I’m sure others made this argument in other venues as well.

Of course, I never thought that post would stop the Foundation from pursuing their campaign, and sure enough on Feb. 22 they filed an administrative Request for Rehearing with FERC, which asked them to revise Order 822 to require controls on substation communications. But I did still hope that people in the industry would realize these were not serious arguments.

This is why I was surprised to read, in the April Transmission and Distribution World, a short article entitled “Deficient Cybersecurity Standards Leave U.S. Electric Grid at Risk”. This article states in the first paragraph that FERC has recently approved a “NERC cybersecurity standard” that exempts “significant points of vulnerability, including communications between control rooms[i] and grid substations.” In the second and third paragraphs, the article mentions the Foundation’s FERC filing, and states that “Industry standards require encryption of credit card information transmitted over the Internet, but the same is not true for communications between grid control centers and substations. When hackers attacked the Ukrainian power grid, they attacked control centers, service call centers and substations.”

Note: T&D World had reported on the Foundation's filing with FERC in their March issue. Of course, the statements are very similar between the two articles.

So it seems this myth is like the Hydra, the multi-headed monster of Greek mythology. When you would cut off one head, another two would grow back. In this post, I’m going to take a broader approach than I did the first time, in the hopes of either killing the Hydra (by perhaps poisoning the monster itself) or at least cutting off more heads than can grow back.

As I said, the assertion that any substation communications run unencrypted over the public Internet is almost assuredly completely false. But let’s look at how utilities typically do communicate with their substations, to figure out where a grain or two of truth (or at least plausibility) might be found in this argument.[ii]

First, I’m sure the majority of substation communications are still serial, not routable. I won’t say serial communications are hack-proof, but I will say I have never heard of a successful serial hack (other than one proof of concept by a researcher). So, as we look for vulnerable communications, we need to stick to the minority that is routable.

If not the Internet, what channel carries the routable communications with substations? I believe Frame Relay and SONET are the prevalent technologies here. Neither one of these, of course, touches the public Internet in any way, and I have never heard of a successful attack on communications using either of these technologies.

But let’s say one of these could be hacked. Would this be a threat to the Bulk Electric System? If the substation in question were a distribution one, the answer is probably no. There might be a localized outage (as there was in the case of the Ukraine attack – multiple ones, since multiple distribution substations were attacked), but there wouldn’t be a cascading BES outage (as I discussed in this post).

So what if the substation were a transmission (BES) one? For good measure, an important substation that would be Medium impact under CIP v5? First, could a hack of one substation lead to a hack of lots of others? The answer to that is almost certainly no. Unlike the idea that some people seem to have, substations aren’t connected to some vast flat network, in which an attack on one can lead to easy penetration of many others. Communications between control centers and substations are very much hub-and-spoke, not meshed. Were the control center to be compromised, that would be another story, and for that reason the most stringent controls in NERC CIP are applied there (and FERC has just ordered controls on communications between control centers).

Then could an attack on a single BES substation cause a cascading outage through direct electrical effects? Not by itself, I’ve heard repeatedly; there are too many other controls in place to prevent this from happening. This means that an attack on the communications between a control center and a single substation can’t cause a cascading BES outage, either through cyber or physical means. Of course, were a hacker to attack multiple BES substations simultaneously, that in itself could conceivably cause a cascading outage. But that brings us back to the question of how that could possibly be done, given that there isn’t any obvious way to hack into a single substation, let alone a number of them simultaneously.

So it has to be said that the possibility of a successful cyberattack on the communications between a control center and a substation (transmission or distribution) is quite low – especially an attack that could cause a cascading outage (the Ukraine attacks did cause a substantial loss of load, but that was all restored within four to six hours[iii]).

However, note that I’m not saying the probability of success is zero; sooner or later I’m sure even serial or Frame Relay communications could be compromised. So, if FERC were to order controls on substation communications, would that be worthwhile? After all, there would be a small increase in security.

In the case of substations, that small increase in security might well be offset (or more) by a marked decrease in reliability. This is because communications between a control center and a substation are extremely sensitive to latency. If a circuit breaker needs to be opened or closed, this needs to be done with no delay at all – and barring that, within as few cycles as possible. And encryption always imposes some small amount of latency.

Note I’m not saying that encryption would never be possible for substation communications, but it is certainly true it shouldn’t be ordered without making sure it won’t literally cause more harm than good (and note this argument doesn’t apply to control center to control center communications, since that is usually just exchange of information. Any decision on what needs to be done as a result of the information will probably be made by a human, for whom a few cycles won’t make much difference either way).

But let’s now pretend the latency problem doesn’t exist; would it then be a good idea to impose cybersecurity controls on substation communications? After all, they will certainly provide some small increase in security.

I have two answers to this question: one in the context of the current prescriptive NERC CIP standards, the other under the assumption that sooner or later they will be replaced by risk-based standards. In the case of the current NERC CIP, these controls should not be prescribed. Whatever small benefit they might incur would be far outweighed by a huge increase in compliance costs for NERC entities.

So how about under a risk-based approach? That is, suppose we had a set of CIP standards that consisted of 1) a requirement to get a comprehensive threat and vulnerability assessment for the entire enterprise (not just the OT systems) and 2) a requirement to develop and implement a cybersecurity improvement plan, based on the results of this assessment?[iv] The standards would come with some sort of guide to areas that need to be examined in the assessment; one of those might be the question whether encrypting routable substation communications would produce a net benefit, in the case of that entity.[v] If it did, the entity would probably need to implement that encryption, unless there were other controls whose net cybersecurity benefits outweighed this.

To summarize, I continue to see no real merit for the Foundation for Resilient Society’s argument that FERC should order encryption of substation communications. However, I strongly suspect I haven’t even given this Hydra a glancing blow, let alone killed it. I’m no Hercules.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Both this article and the original Foundation press release mistakenly use the phrase “control rooms”. What communicates with substations is a control center, not a control room, which typically controls a particular plant or substation.

[ii] As you’ll see, I make about four or five fairly implausible assumptions below, in order to make the Foundation’s argument at least have some validity. I kind of wish I didn’t have to make their argument for them!

[iii] I found this out in the FBI/IS-CERT briefing on the Ukraine attack in Chicago this morning. Note that the Ukraine attack certainly wasn’t on substation communications. The communications themselves were already compromised because the attackers had complete freedom to move around the IT network, and they took control of the HMIs with remote access to the substation relays.

[iv] Of course, this is a big oversimplification.

[v] One of the big benefits of the risk-based approach is you no longer have to make decisions on which controls are worth imposing and which aren’t, where the controls (requirements) apply to every NERC entity subject to CIP – as is the case today. The controls required in the risk-based approach are those that produce the greatest cybersecurity benefit for that entity, in their role as actor on the grid. In other words, the cyber controls that will produce the greatest reliability impact.

Friday, April 22, 2016

On Results-Based Requirements

Warning: This post does not comply with this blog’s usual standards for unrelenting negativity. Anyone seeking a truly negative experience is referred to the current Presidential campaigns.

It seems that half my posts start out with thanks either to the EnergySec newsletter or to Lew Folkerth, Principal Reliability Consultant with RF. This post breaks new ground in that I owe thanks both to Lew for an inspiring new article, and to the EnergySec newsletter for pointing it out to me in the first place.

Lew’s article is in his regular “Lighthouse” column in RF’s March/April newsletter. The article discusses CIP-007-6 R3, Malicious Code Prevention. Lew starts by pointing out that this is a very minimalist requirement: “Deploy method(s) to deter, detect or prevent malicious code.” Lew admits that, when he first read this requirement, he thought it was poorly written. Indeed, when you look at the equivalent requirement in CIP version 3, CIP-007-3 R4, it seems to be missing something. The latter requirement mandates that the entity implement “anti-virus and malware prevention tools” on every device in scope, and that they have a process for updating signatures.

Of course, the big issue with the v3 requirement was that many devices – routers and switches, for one – don’t provide any way to install or run antivirus software. In this case, the entity was required to “document compensating measure(s) applied to mitigate risk exposure.” They were also required to file a TFE, making this probably the number one cause of frustration with the TFE process (Why should you have to document that you can’t install antivirus on a Cisco switch?).[i]

After mentioning his initial reservations about CIP-007-6 R3, Lew points out that the v5 standards were intended to be “results-based and non-prescriptive”. I have heard this in many meetings as well. I have always looked on it as an attempt by the speaker to show he or she has a sense of humor, since I regard v5 (as well as the previous CIP versions) as prescriptive to a fault, and anything but results-based.

Look at CIP-007-6 R2 Patch Management, which requires (among other things) that all new patches be evaluated within exactly 35 days – meaning an entity is subject to fines for every extra day it takes for this, for every month that this happens and for every in-scope device affected. What is the result that this requirement is aiming at? It’s “evaluation of new patches within 35 days”. I don’t think I’ve ever read any cyber security book or article that says that 35 days is an absolute limit for this activity, due to the laws of physics or of information science.

However, Lew clearly believes that CIP-007 R3 is one CIP v5 requirement (at least) that actually is results based and non-prescriptive; and I certainly agree with him on that. He goes on to lay out a really good methodology for complying with the requirement, and for demonstrating this to the auditors. I’ll let you read the article to see that.

The biggest takeaway for me from Lew’s article was that it would certainly be possible to make all of CIP truly non-prescriptive and results-based (although it would take more than simply rewriting each of the current CIP standards so that it resembled CIP-007 R2. There would have to be broader changes as well). Unfortunately, I think it will be a while before this happens. I attended NERC’s Technical Conference on the CIP v7 SAR in Atlanta this week, and I can promise there is currently no movement to make such a change in v7. And if you’ll look at my two previous posts, I’m not recommending such a radical change now, either (although I’ve tried to make it clear that I think v7 should be the last prescriptive CIP standards. Future versions should move to a risk-based approach).

But interestingly enough, I think that CIP v5 and v6 - and v7 when it comes into effect in three or four years – will end up as fairly non-prescriptive and results-based anyway. This is a corollary to my frequent assertion that v5 (and v6) won’t be enforceable in the strict sense. It’s becoming clear to me that CIP v5 audits will be very different from v3 ones, for this very reason.

Why do I think audits will be different? Because the time that an auditor can spend on a particular audit is limited, and they are now being explicitly directed not only to look for technical violations but to look for “opportunities for improvement” in cyber security practices. Of course, these “opportunities” will not result in potential violations (since they’re focused on practices not specifically required by CIP), but in “areas of concern” in the audit report. If you’re a CIP auditor (who is also a cyber security professional) and you’re auditing an entity that clearly has been trying to do the right thing for both CIP and cyber security, are you going to devote your limited time to finding cases where someone took more than 15 months to have their annual training (especially considering that, if this finding results in a violation and fine, it would very likely not be upheld if challenged in the courts)? Or are you going to – for example – look for ways the entity can improve their level of cyber security in general, going beyond what is strictly required by CIP?

I contend that auditors will do the latter. And frankly, I think that’s great. For one thing, the auditors will be happy as clams that they can be essentially cyber security consultants, not misguided folks out to ding well-meaning entities on purely technical violations. For another, audits will no longer be a dreaded exercise in “gotcha”, but an opportunity for entity staff to learn about how they can better secure their OT environment.[ii]

I know you don’t turn to this blog expecting to see a lot of optimism about CIP v5; and I admit I didn’t expect this post to end optimistically when I started it a couple of hours ago. But there you have it: I’m predicting that – for entities who are honestly trying to do the right thing for both cyber security and CIP compliance – CIP audits will turn into a positive learning experience. All courtesy of the fact that, in my personal opinion, wording problems make CIP v5 unenforceable.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] The words “where technically feasible” weren’t in the requirement, so the fact that a TFE was required – even though compensating measures were also required – seems to indicate that the drafting team for v3 didn’t believe at the time that it would work to simply have entities figure out compensating measures on their own, and auditors judge those measures after they had been implemented. In a TFE, the entity has to submit proposed compensating measures to their region before they implement them, and get them approved. Essentially, you can look at the v5 requirement as requiring the same things as the v3 one, but now trusting the entities to come up with good “compensating measures”, and the auditors to adequately judge those after the fact.

[ii] I do want to point out that I’m not advocating that entities stop trying to comply with the technical requirements of NERC CIP! They still need to do their best, but my guess is auditors won’t spend most of their time looking for technical violations and writing them up as PVs when they find them. And I think the tone of the audits will change. But an entity that isn’t really trying very hard on CIP will be treated quite differently.

Monday, April 18, 2016

What Isn’t in the SAR

In anticipation of the NERC Technical Conference on April 19 on the Standards Authorization Request for the next CIP version (which I insist on calling CIP v7, although I’m not sure NERC will), I have already written two posts: here and here. In the first post, I discussed what NERC included in the SAR, which of course will provide the “agenda” for the Standards Drafting Team as they embark on their multi-year effort. I promised a post on items that are not in the SAR, but perhaps should be. This is that post.

As my post from yesterday points out, I group these items into two categories: problems with the current standards that can be addressed without re-thinking the fundamental concepts of CIP v5[i] (especially CIP-002-5.1) and problems that require this re-thinking in order to be solved. I suggested in that post that it is unlikely the SDT will want to address any of the latter problems, since having to engage in a fundamental discussion will probably add 6-12 months to the whole standards development process. Nevertheless, I will list both types of problems below, starting with the first type.

Note that I won’t list here any of the items that are already in the SAR. Also note that I’m sure there are lots more of these problems than what is listed below – especially for the “non-fundamental” changes. I got a lot of these from regional meetings and webinars, and I certainly didn’t attend more than a small fraction of those.

Problems that Don’t Require Fundamental Changes

1. Section 4.2.2: This section states that “All BES Facilities” are in scope for CIP v5. The only problem with that is that Facilities is a NERC-defined term, and it pretty clearly doesn’t apply to Control Centers, substations, and at least multi-unit generating stations. So are we correct to assume that none of these are in scope? If so, only a few things like UVLS and UFLS will be in scope. I’m sure most entities would be quite happy if that happened, but I’m taking a wild guess that this was not the intention of the SDT. Fortunately, this problem can be fixed very easily: use a lower-case f.

2. Cyber Assets Located at Assets Owned by Other Utilities: The CIP-002-5.1 RSAW requires the entity to identify and classify BES Cyber Systems that it owns, that are located at another entity’s assets. Unless the other entity has taken responsibility for compliance for these BCS, their owner needs to comply for them, as if they were located at one of its own assets. There is only one problem with this: Section 4.2 says that facilities “owned” by an entity to which CIP v5 applies are in scope. In other words, it doesn’t seem NERC has the authority, by the current wording, to bring these BCS at non-owned facilities (little “f” of course!) into scope. Either they should remove them from the RSAW, or change the wording of 4.2.

3. List of BES assets: The entity needs to develop a list of its BES assets that fall into one of the six types listed in CIP-002 R1, but this is never stated in R1. Since this is the list of locations at which BCS can be located, it is important for the entity to have this list to show it considered all required locations. See this post.

4. BES Cyber System (BCS) Identification Methodology: The entity needs to develop and document a methodology for identifying BES Cyber Systems. However, nowhere in R1 is the entity required to identify BCS (they are required to classify them in R1.1-R1.3, although the word “Identify” is misused there to mean “Classify”). See this post.

5. Phone Systems: One region has been insisting that VOIP systems in Control Centers should be identified as BCS; as far as I know, the other regions don’t agree with this – and I certainly don’t. A NERC FAQ stated that “support systems” wouldn’t be BCAs, but in the absence of a definition of support system this doesn’t help clarify the issue. This issue could either be resolved in a fundamental way (see this post and this one), or a non-fundamental way – by perhaps simply stating in the BCA definition that phone systems aren’t covered.[ii]

6. HVAC Systems: Similarly, some have wondered whether HVAC systems should be in scope for CIP v5. I don’t think these should be in scope, either, although I don’t think this requires some fundamental change.

7. Grouping BCS by the Requirement: It is never stated in CIP-002 (or elsewhere) that the entity can group BCAs into BCS in only one way; it is permitted to group them differently for each requirement if desired. There should be a statement in R1 (or the BCS definition) that this is permissible, rather than having to rely on the fact that it isn’t explicitly prohibited. See this post.

8. “Assets Containing a Low Impact BCS”: This wording in R1.3 is of course in direct contradiction with the parenthetical expression immediately following, which says “a discrete list of low impact BES Cyber Systems is not required”. In practice, both entities and auditors all seem to understand that what this means is really “low impact asset”. However, IMHO it is contradictions like this that make CIP v5 unenforceable in the strict sense. This is one of those items that can be addressed either through fundamental changes (where the whole relationship between assets, Facilities and BCS will be straightened out) or otherwise. In the non-fundamental approach, it would probably be OK to simply state that, if there are any Cyber Assets performing control functions located at an asset that does not otherwise contain High or Medium BCS, it will be up to the entity to demonstrate they are not BCAs. Getting down to the fundamentals would be a much better way to solve this as well as other problems, though.

9. “Assets Containing Low BCS”: I’m now going to seem to contradict what I just said. There can be assets that aren’t Low impact that do contain Low BCS. For example, a generating station that meets criterion 2.1 can have Low and/or Medium BCS. If the above problem is resolved in the non-fundamental way, then there will need to be a separate list of assets that aren’t Low Impact but nevertheless contain Low BCS. Again, it would be much better to get to the fundamentals and fix this for good.

10. Attachment 1, Criterion 2.6: This criterion reads “Generation….and Transmission Facilities..” Since “Facilities” is only applied to “Transmission”, this means that an entire plant is Medium impact, even if only one of its units is designated as critical to IROLs, since a unit is a Facility while an entire plant is not (on the other hand, in a substation only particular lines, transformers or other Facilities are covered by this criterion, not the whole substation). However, in the Guidance the SDT did refer to Generation Facilities, implying they meant to cover just a unit, not an entire plant. The best solution is to change the criterion to say “Generation Facilities” not “Generation”.

11. CIP-003-6 R1 – Cyber Security Policy: R1 mandates an annual “review” of cyber security policies, but doesn’t require, if the review identifies changes that need to be made, that these changes be implemented. This should be corrected.[iii]

12. CIP-004-6 R2.1 – Training: There has been some confusion about whether R2.1 requires that each of the nine training topics needs to be included in the training for each role, or not. While I think the wording is fairly clear, it obviously needs to be made even clearer.

13. External Routable Connectivity: The current SAR has Low Impact ERC (LERC) on the agenda, but not ERC itself. The confusion about LERC is a mirror image of the confusion about ERC, so the answer to the one should lead to the answer to the other as well. I have stated a couple times that I no longer believe there can be a technical definition of LERC or ERC – at least, one that could be understood by mere mortals without a couple PhDs in data communications and computer science, along with an EE. The only way I see this working is if the SDT just comes up with a set of use cases, e.g. “In this situation, there is ERC. In that situation, there is no ERC.”

14. PCAs, PACs, EAPs and EACMS: Protected Cyber Asset, Physical Access Control System, Electronic Access Point and Electronic Access Control and Monitoring System are all defined, but the entity is never told to identify them. This should be corrected.

15. EAPs: The definition of EAP refers to an interface on a device like a router or firewall, not to the device itself. However, Cyber Assets that contain an EAP interface need to be designated as EACMS (I learned this from an article by Lew Folkerth from the April-May 2015 RF newsletter). It needs to be stated in the requirements or at least the Guidance.

16. Scripts: There has been a lot of debate regarding the status of custom scripts for CIP-005-5 R2. A FAQ indicated that these need to be treated as Interactive Remote Access, meaning encryption and two-factor authentication are required, and there needs to be an Intermediate System (which might be running the script itself). This should be stated in the requirement.

17. “Component-level” Requirements: As discussed in this post, the majority of requirements in CIP-003 through CIP-011 apply on the level of components of a BCS, not the BCS itself. However, this is never stated in the Applicability section. I believe this could be remedied fairly easily by stating “Components of BCS” rather than “BCS” in the Applicability section for all the requirements where this is the case.

18. “Nonprogrammable communications components located inside both a PSP and an ESP”: This phrase shows up only in the Applicability section of CIP-007-6 R1.2 (for Highs and Mediums), and nowhere else in CIP v5. The Guidance makes clear this applies to dumb hubs, patch panels, etc. (therefore, unnecessary physical ports need to be protected in these devices, as they need to be protected for BCS). Of course, these devices are not Cyber Assets at all, which makes R1.2 the only v5 requirement that applies to non-Cyber Assets (which of course goes against the whole purpose of CIP-002 through -011). Since this is really a physical security requirement, it probably belongs in CIP-006.

19. CIP-007 R2: “Software” is never mentioned in R2, but of course that is the whole point of the patching requirement. Every piece of software on every component of each BCS needs to be inventoried and the entire patch management process – identification of a source for patches, evaluation for applicability, etc. – needs to be applied to it. This should be made clear in the guidance, if not the requirement (this information came from Lew Folkerth’s presentation at RFC’s April 2015 CIP v5 workshop).

20. Firmware: Firmware patches need to be tracked and treated just like pure “software” patches in CIP-007 R2. While this is allegedly implied by the current wording, a lot of entities didn’t realize that. It needs to be made explicit in the requirement.

21. “Security patches”: The Guidance for CIP-007-6 R2 points out that some patches are exclusively “functionality related” and therefore are not “security patches”. Security patches are the only ones the entity needs to consider for application. However, what about functionality upgrades (typically for firmware) that improve security by for example allowing complex passwords? The Guidance gives contradictory information on these. See this post.

22. CIP-007-6 R4.2 – Alerting: The Guidance and Technical Basis implies that technical means are needed for real time alerting. However, the requirement allows for procedural means as well. This needs to be reconciled (from WECC’s Advanced CIP Training in Sept. 2015).

23. CIP-010-2 R1.1.3 – “custom software”: “Custom software” installed on a BES Cyber System has to be tracked under the configuration management program. But what is custom software? Does it include every three-line script that has ever been put in place? And what about scripts that are part of a third-party software package? This needs to be fixed, perhaps with a definition of “software” or “script”.

Problems that Require Fundamental Changes

24. Section 4.2: This section states that “Facilities, systems and equipment” owned by entities subject to v5 (under section 4.1) are in scope. I have never been able to figure out what this means. Does it mean an entity has to evaluate every PC owned by the Accounting department, and every monkey wrench owned by Operations, to determine whether they’re in scope? I think something like “Facilities, assets and control systems” would be much better. But this needs to be done in the context of helping CIP-002 decide what it wants to be when it grows up – so it’s a fundamental issue.

25. Breaking up CIP-002 R1: In CIP versions 1-4, describing how to identify and classify Critical Cyber Assets required three or four requirements. In CIP v5, the required steps (none of which are explicitly identified in the first place) are all compressed into R1, and this is one of the big reasons for the confusion regarding R1. I think there should be separate requirements, perhaps reading a) Identify assets that fit into one of the six types in R1 (these are the locations at which BCS can be found, not the assets that get “run through” the criteria in Attachment 1. See this post); b) At assets/Facilities meeting Medium or High criteria[iv], identify BES Cyber Systems and classify them according to the rating of the asset/Facility[v]; and c) List BES assets that don’t “meet” one of the High or Medium criteria, and that have control systems, as Low impact.[vi]

26. “Affect the Reliable Operation of the BES”: At the heart of the definition of BES Cyber Asset are the words “affect the reliable operation of the BES”; yet there is no explicit guidance provided on how this will be determined, although the discussion of the BROS in the CIP-002 Guidance and Technical Basis does provide some help. If “affect the reliable operation of the BES” means “aids in fulfilling one or more BROS”, this needs to be made explicit in the definition.

27. “Facilities”: Criteria 2.3 to 2.8 all apply, at least in part, to Facilities, a NERC-defined term. In the case of 2.3, this means a unit in a generating plant. In 2.4 to 2.8, it means a line, bus, transformer, etc. typically found at a substation. This means that – with the exception of what seems to be an error in criterion 2.6, as noted below – none of these criteria apply to assets, either generating plants or substations. Of course, the majority of NERC entities and – probably – auditors understand these criteria as applying to assets. In general, there is nothing wrong with doing this, except that an entity might end up considering more BCS as being Medium impact than are actually required to be. But the difference needs to be made clear to the entities, possibly in the Guidance. This needs to point out that an entity can classify its BCS at a substation in two ways: either designate all BCS at a substation that "meets" one of criteria 2.4 to 2.8 as Medium impact, or separate out the Medium from the Low impact Facilities at the substation and classify their associated BCS as Medium or Low respectively.[vii]

28. Transmission vs. Distribution Facilities: It is well understood that in CIP-002 Cyber Assets associated with purely Distribution Facilities are not in scope for v5, even though the substation itself may be a Medium impact one. But if entities are going to officially be given the option of treating an entire substation as Medium impact, they need to be told how to separate out the Transmission from the Distribution systems. This presumably would require determining which Facilities fall under the BES definition and which don’t, although even that isn’t necessarily easy, as discussed in this post. In any case, this needs to be explicitly stated, at least in the Guidance.

29. “Associated with”: In order to classify a BES Cyber System as Medium or Low impact in a substation, the entity needs to know which substation or Facility it is "associated with", since Section 2 of Attachment 1 says that Medium BCS are those that are “associated with” any asset/Facility that meets one or more of the Medium criteria. But "associated with" is not defined (note that I’m somewhat torn between saying this is a change of the first vs. the second type. It could in theory be possible to simply define “associated with” without requiring other fundamental changes – but a fundamental change might eliminate the need for “associated with” altogether. This item should really be listed in both categories).

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] In this post (as in most others), when I refer to CIP v5 I’m implicitly referring to v6 as well.

[ii] There are of course other means of resolving this dispute as well, such as mortal combat with rapiers.

[iii] This point was made at an EnergySec webinar last year.

[iv] Of course, as currently written the criteria don’t officially apply to assets or Facilities, but to BES Cyber Systems, even though everyone interprets them that way. Again, this is one of the problems that keeps CIP v5 from ever being enforceable, even though it may not be causing day-to-day problems.

[v] Naturally, there is at least one exception to this. In criterion 2.1 generating plants, some or even all of the BCS may be Low impact, even though the plant is technically “Medium”.

[vi] Unless the entity can demonstrate that none of these control systems would be BCS.

[vii] Understanding that “Facilities” isn’t a synonym for “assets” is important in other cases, as well. For example, the argument NERC used in their Lesson Learned on far-end relays relies on the reader understanding this distinction. See this post.

Sunday, April 17, 2016

Should CIP v5 and v6 be Rewritten?

I recently wrote a post discussing what is in NERC’s recent Standards Authorization Request (SAR) for the next CIP version (which I certainly hope will be called v7; no more talk of “revisions” to v5 or v6). I said I would soon write a post on what isn’t in the SAR, but perhaps should be. That is, I’ll list changes that could be made to CIP v5 and v6, even though these aren’t called out in the SAR. I hope to have that post out right after this one – and hopefully in advance of NERC’s CIP Technical Conference in Atlanta on April 19, which I am looking forward to attending.

However, I recently realized that, before I do that post, I need to address the question whether the team drafting the new version should go back to fundamental principles and “rewrite” CIP v5 or not. This might seem like an odd question, but it was something I was advocating until five or six months ago, and I have heard that at least one large NERC entity is currently pushing this very course of action.

Why would CIP v5 need to be rewritten? That’s an easy question for me to answer. It’s because there are two types of problems with CIP v5[i]:

Problems that can be addressed without rethinking any of the fundamental concepts in v5. For example, the term “custom software” in CIP-010-2 R1 isn’t defined, and has caused a lot of confusion for NERC entities. This problem can be fixed by coming up with a definition.
Problems that can’t be addressed other than by opening up the fundamental concepts in v5. For example, the fact that CIP-002-5.1 R1 and Attachment 1 were written simultaneously from two different points of view[ii] – and that these were never reconciled - leads to confusion in a number of areas. One example of this was the big controversy over the far-end relay issue, which was mostly due to the widespread (and mistaken) belief that CIP-002 Attachment 1 classifies assets (i.e. “big iron” – control centers, substations, etc) as High, Medium or Low impact.[iii]

From everything I have seen so far, including the SAR, the Standards Drafting Team is only being tasked with addressing problems of the first type, not the second type. And I can certainly understand this; going back to debate fundamental concepts like the asset identification and classification process could easily add six to twelve months to the SDT’s work. Since I’m currently estimating that – even without this fundamental debate – it will be at a minimum three years, and more likely four or even five years, before CIP v7 comes into effect, this is no small consideration.

But what is lost by not addressing the fundamental problems? For one, these problems are creating confusion, just like the non-fundamental ones are; getting them resolved will make it much easier for entities to comply with CIP v7 (which will otherwise include the same contradictory wording found in CIP-002-5.1) and for auditors to audit them. This was evident to me at the RF CIP workshop last week in Columbus, Ohio, where there were discussions about some fundamental questions that should have been settled three years ago. They shouldn’t still be causes of confusion now - less than three months before the compliance date.[iv]

But there is a bigger issue here: I have said previously that CIP v5 (and v6) will never be enforceable in the strict sense, unless it is rewritten to address these fundamental issues. And what do I mean by enforceability in the “strict sense”? I mean that, should a violation of CIP v5 be challenged in the civil courts, I simply don’t see how the violation (and its associated fine) could be upheld. At that point, CIP v5 and v6 (and v7, if the SDT doesn’t fundamentally rewrite CIP-002) would turn into nice guidelines to follow, not enforceable standards. What would happen at that point is anybody’s guess.[v]

Up until five or six months ago, I was advocating that CIP-002 be rewritten from scratch. However, some of you may have noticed that I have changed my tune now: I now think that the fundamental problem with NERC CIP is that it is a set of prescriptive standards, and prescriptive standards don’t work for cyber security – risk-based standards work. For that reason, the fact that rewriting CIP v5 might make it enforceable no longer excites me, since it will remain a set of prescriptive standards.

However, I recently heard that one or more large NERC entities are advocating for a complete rewrite of CIP v5, presumably to address both the clarity and enforceability problems. I certainly don’t want to discourage them. CIP v5 and v6 will clearly be around for a while, and if there is a will on the part of NERC entities – and the SDT – to try to make these standards both clear and enforceable, I will certainly support that effort.

I also realize that perhaps I have been exaggerating the amount of work that will be required to rewrite CIP-002. The biggest problem with that standard is the fact that CIP-002-5.1 R1 and Attachment 1 are written from two different points of view, and haven’t figured out what they want to be when they grow up. However, as I stated in this post, the NERC entities and regions have come to a remarkably consistent consensus on how to “comply” with this wording; they are just about universally following one of these two points of view, which happens to be pretty much the approach used in CIP versions 1-4.

In this approach, the entity starts with the “big iron” – control centers, substations, etc. - then classifies these High, Medium and Low impact. Once they have done that, they identify BES Cyber Systems at the High and Medium assets; the BCS take the classification of the asset. They come out with the three things that are required by R1: lists of High and Medium impact BCS and a list of Low impact assets (aka “assets containing Low impact BCS”, in the rather strange circumlocution adopted to try to bridge the unbridgeable gap between the two points of view in R1 and Attachment 1).

So the problem isn’t that entities and auditors don’t understand how to comply with CIP-002-5.1 R1; the problem is that the way they are complying with it doesn’t fit with the words of R1 and Attachment 1 (more specifically, it doesn’t fit with some of those words. It does fit with others). In one sense, the solution is simple: simply rewrite CIP-002 so that the words fit what everyone is actually doing anyway. This would be one giant step toward making CIP v5 and v6 enforceable in the strict sense. And I don’t think this would take much time.

But I need to throw a caution in here: It is very possible that CIP v5 and v6 will be unenforceable in the strict sense, no matter how much time the SDT spends resolving the fundamental problems in CIP-002-5.1. My reasoning for saying that can be found in these posts: here and here. If the SDT does decide to address these fundamental problems – as I believe they should – they shouldn’t do so with the idea that this will make CIP v5 enforceable in the strict sense; I believe that ship has already sailed.

Note April 18, 2016: It just occurred to me that rewriting CIP would make all the sense in the world if it could be rewritten as a risk-based standard. I have just been assuming that the consensus needed to do that is still years away. However, it is definitely the most logical thing to do: Simply leave v5 and v6 in place as they are, warts and all, and start work on a completely new v7. But I'd say that's the stuff of dreams at this point.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Note that, when I mention “rewriting” v5, I’m implicitly saying the same thing about v6. Since the fundamental problems are mainly found in CIP-002-5.1 (which is part of v5, of course), that is the only standard that would have to be substantially rewritten. However, there would be further changes required in all of the standards, both the v5 and v6 ones.

[ii] One point of view is that Cyber Assets become BES Cyber Assets if they have an inherent impact on the BES. The other point of view is that they become BCAs only if they impact a critical asset or Facility, which then has an impact on the grid. The latter is more or less how asset identification worked in CIP v1-v4. The former was an idea that came up when the team that drafted v2-v5 was starting work in 2009, embodied in this Concept Paper. Different parts of CIP-002-5.1 R1 and Attachment 1 ended up embodying both these points of view, and they were never reconciled. I discussed this in at least two previous posts: this one (the section titled “Have an Apple, Adam?”) and this one. I’ll admit I’ve never explained myself fully on this issue; that may need to be part of a book, not just a blog post.

[iii] As I said in this post, the problem with CIP-002 R1 and Attachment 1 isn’t that entities and auditors don’t agree on how to classify BES Cyber Systems as H/M/L, but that the asset classification model they all agree on doesn’t correspond to most of the wording in CIP-002. The best way to fix this problem is to rewrite R1 and Attachment 1 so that their wording follows the model that people are actually using (which IMO is quite good, and has the added benefit of being very similar to the model in CIP versions 1-4).

Note 4/21: Kevin Perry, Director of Critical Infrastructure Protection for SPP, takes exception to my reference to "auditors" above. He points out that SPP's message since CIP v5 was approved has always been that the Attachment 1 criteria are for classifying BCS, not assets. I didn't mean to say this wasn't their official position, nor that it wasn't the position of other regions, but that the procedure they advocate that entities follow - first identifying "assets likely to contain High or Medium BCS", then running these through the criteria - amounts to pretty much the same "big iron / little iron" approach as v1-3, and is understood by most entities to be basically the same approach. What I'm advocating is that the wording of R1 and Attachment 1 be changed so that it does actually reflect the v1-3 approach, since I think that one was pretty good and since just about every entity (if not every one) is actually following it anyway.

[iv] This was also evident by the fact that, when attendees were asked to raise their hands if they were ready for the v5 compliance date, only a small percentage did so. This was two weeks after April 1, which of course was supposed to be the compliance date, until less than two months ago. It seems very likely that a large number of NERC entities wouldn’t have been ready on April 1; it remains to be seen how many will truly be “ready” on July 1.

[v] Here are a couple of my guesses: 1) All the work that NERC and entities have done on v5 and v6 gets thrown out, and the industry goes back to v3, the last enforceable set of standards; or 2) Congress is so alarmed by the fact that there are no longer any enforceable cyber standards for the industry that they take responsibility for cyber regulation away from FERC and NERC and give it to some other agency, like DHS or even the military. I would say the second of these is more likely.

Wednesday, April 13, 2016

Can a Distribution Disturbance Alone Cause a Cascading BES Outage?

I freely admit I’m out of my league on this one. In my post yesterday, I stated – based on a conversation with a couple longtime reliability compliance professionals – that it was close to impossible for an outage on the purely distribution side of the grid to cause a cascading outage on the transmission side.

I continue to believe this is the case, but I did receive an email this morning from an Interested Party who has contributed to many of my posts over the years. He isn’t exactly saying that my statement was wrong, but he is pointing out conditions that might lead to a more widespread, prolonged BES outage than I’d thought possible, assuming there was an initial substantial loss of load on the distribution side. Here is what he says:

“I do not fully buy into the idea that an attack against the distribution system could not impact the BES. Understand that the Ukraine attack was concurrently directed against four distribution companies. There is no reason to believe a similar attack in the US would not target multiple distribution companies at the same time. The Transmission system impact of the attack will depend upon the current operating conditions and the amount of load shed. Even if the resultant impact is only the tripping of some generation, bear in mind that it takes a while to get generation back up after it trips. Fossil steam plants can generally get back up within 18-24 hours of available station services power. Renewable and GT/CT is pretty much instantaneous after allowing for grid synchronizing. Trip a nuke and it is days before the NRC allows the unit to be restarted. Whether fast recovery generation can restore load while the slow recovery units are brought back online will depend on Transmission congestion and total load conditions. So, yes, there are distribution outages all the time, but they are not typically widespread except during a severe weather event that damages the Transmission and distribution infrastructure. Rather, you typically lose a substation and inconvenience a couple thousand people until power can be restored.”

In other words (and these are mine), depending on the type of generation that would go offline during a widespread distribution system outage, there could be a substantial and prolonged impact on the transmission grid. This isn’t the same as a cascading outage, of course, but it does constitute a potential substantial effect on the BES.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.