Tom Alrich's Blog: January 2015

Monday, January 26, 2015

Two (more) Changes in My CIP-002-5.1 R1 Methodology

A recent post described – at a very high level – my “methodology”[i] for complying with CIP-002-5.1 R1 (which I usually refer to simply as “R1”). When I wrote that post, I didn’t think there would need to be many changes to the methodology. However, I have already made one change in the methodology and now have two more to make – one substantial, one less so.

Besides describing these changes in this post, I will make them in the original post as well (as I did for the first change). In fact, since I know there will be more changes in the future, I will do this from now on: put out a post describing the change, then edit the original post so it reflects it. This makes the original post a “living” document that will hopefully always describe my most recent thinking on R1 methodology.

I. BCS Identification

If you haven’t read the original post (but if so, why are you reading this one?), I’ll point out that my “methodology” is heavily laden with a series of decisions the entity must make in order to comply with R1. Perhaps the most important of those decisions is exactly how BES Cyber Systems will be identified in the first place (i.e. before they’re classified High, Medium or Low impact).

In that post as well as a previous one, I described two primary methods for identifying BCS: “top-down” and “bottom-up”. My post stated that the best practice is to combine the two methods, since I believed that, in all cases, some BCS could be missed if only one of the two methods were used. However, since that post I have heard from two different sources - one a CIP auditor - that the top-down approach doesn’t really buy much in substations, although it does in control centers and generating stations.

The reasoning for this makes a lot of sense: in control centers and generating stations, there are certain well-understood functions that are performed by the asset as a whole; these functions each have systems associated with them. For example, BA control centers almost always have systems including production SCADA/EMS, Outage Management System, ICCP, Historical Data Retention, Operations Engineering Support System, etc. Generating stations have a digital control system, soot blow down system, control air management system, etc.[ii]

The entity only needs to confirm that the loss, misoperation, etc. of any of these systems has a BES impact within 15 minutes; if it does, the system is a BCS. So for these two asset types, starting with the top-down approach is best. Of course, the entity still needs to perform the bottom-up analysis, in which it considers each of the Cyber Assets at or associated with the asset[iii] - that haven’t already been identified as components of BES Cyber Systems through the top-down analysis - to determine whether or not they meet the definition of BES Cyber Asset, including having a 15-minute impact on the BES. Every BCA so identified should then be included in a BES Cyber System.[iv]

Substations are different. Substations don’t inherently perform particular functions – they can be all over the map, and can include some mix of Transmission (in scope for CIP) and Distribution (not in scope) functions[v]. There is no inherent set of functions that most or all substations perform. You really have to look at each individual Cyber Asset and consider whether it has a 15-minute impact on the BES, then perform the rest of the bottom-up analysis. But the top-down analysis isn’t likely to identify BCS that aren’t identified in the bottom-up approach, and therefore doing both analyses doesn’t buy you anything.

However, you may say, “What about the BES Reliability Operating Services (BROS), which are an integral part of the top-down approach? Do we just forget about them for substations?” No. Just because the entity doesn’t use the top-down approach for substations doesn’t mean the BROS don’t come into play in the BCS identification process. Since the heart of the BES Cyber Asset definition is that the loss of the Cyber Asset would “adversely impact” the BES within 15 minutes, a good way to identify BCAs is to consider whether a Cyber Asset has a 15-minute impact on one or more BROS. If so, the Cyber Asset is most likely a BCA.[vi]

So I will revise my R1 methodology post to reflect what I’ve just said. However, I’ve just identified another post that needs to be modified. This is a post I wrote on the meaning of “affect the BES” in the BCA definition. In that post, I stated that there was no point, when doing the bottom-up analysis, to consider the BROS. I said this because I was assuming that all entities would start their BCS identification with the top-down analysis, so they would have already identified all Cyber Assets that fulfilled a BROS - they are components of BCS. Since I’m now saying the top-down approach doesn’t help for substations, this means the BROS should be considered (again, not exclusively), as substation owners/operators identify their BCAs through the bottom-up analysis. I will modify this other post as well.

II. “Transmission Facilities”

In my R1 methodology post (item 3 under Task 2), I indicated that one of the definitions each entity needs to develop is one for “Transmission Facilities”. This term is used in several of the Medium impact criteria, yet even though both “Transmission” and “Facility” are NERC-defined terms, I had heard that trying to combine these two definitions didn’t yield anything very helpful. And I heard this was causing problems for Transmission entities as they tried to sort out Transmission from Distribution cyber assets in their substations. In addition, I heard the new BES definition (which essentially defines Transmission) wasn’t too helpful in sorting things out. I had discussed this issue in a previous post.

However, I have since heard from a couple knowledgeable persons that it really isn’t all that hard to separate Transmission from Distribution cyber assets (and Facilities) in substations, using the new BES definition. Since I haven’t heard any further comments to the contrary, I am officially declaring this a non-issue (leaving only 4,368 known issues with R1 and Attachment 1, by my latest count), and will remove it from the R1 methodology post.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I use quotes here because, as explained in the original post, it is impossible to write down – in a document with fewer words than the Bible – a single methodology for complying with R1; there are far too many branches and options required. But this doesn’t mean NERC entities, with High or Medium impact assets under CIP v5, don’t have to follow any particular methodology when they comply with R1. They have to follow some methodology, and it has to be documented. My post should be seen as more or less a “template” for developing the methodology, although a large part of the contents – various definitions and interpretations – need to be determined and inserted by the entity; there is no way they can be dictated in advance, given the ambiguities and contradictions in the wording of R1 and Attachment 1.

[ii] All of these examples of systems were suggested by the auditor.

[iii] If the asset is a High impact Control Center, the applicable wording is “used by and located at”. If it is a Medium impact Control Center or a Medium impact generating station, the wording is “associated with”.

[iv] The exception to this rule is for large plants (usually coal) that are in scope with v5 because of criterion 2.1. In these, it is usually impossible to apply the true “bottom-up” approach, because of the huge number of devices (sometimes in the tens of thousands) that may meet the definition of Cyber Asset. Since my post on R1 methodology in theory just applied to substations (although I think it also works for generating plants that don’t meet 2.1), I still haven’t addressed the “2.1 plant” methodology. I hope to in a future post.

[v] It occurred to me that this is why CIP Versions 1-3 fit so badly in substations. A Critical Cyber Asset was defined as a Cyber Asset “essential to the operation” of a Critical Asset. Since, strictly speaking, a substation considered as a whole doesn’t perform any particular operations, there really aren’t any Cyber Assets that meet that definition. Version 5 tried to address this issue by writing all the criteria that apply to substations (2.4 – 2.8) with the word “Facilities” in the subject – meaning the lines, transformers, busses, etc. that are located at the Transmission substation. These are what becomes Medium impact, not the substation itself. Of course, many Transmission entities and even Regional Entities seem to be interpreting the word Facilities to mean the substation itself, even though that is almost certainly not what was intended (although as I said in my methodology post, there’s nothing wrong with doing this – as long as you accept that you’ll probably identify more Medium BCS than if you used the pure “Facilities” approach). I have discussed this issue in several posts, including this one.

[vi] Of course, the converse isn’t true: If the Cyber Asset doesn’t have a 15-minute impact on a BROS, it doesn’t mean it isn’t a BCA, since its impact could be in another area than reliability. For example, the fire suppression system in a substation doesn’t fulfill any particular BROS, but were it to fail to operate when needed (in the event of a fire), its failure to operate would presumably have a 15-minute impact (e.g. one or more lines might be tripped because their associated relays burned up).

Friday, January 23, 2015

Are Networking Devices BES Cyber Assets?

There is a discussion going on in NERC circles about whether networking devices should be declared BES Cyber Assets or not. At first glance, it seems almost an open-and-shut case that they should be. After all, the BCA definition includes Cyber Assets whose loss, etc. would impact the BES within 15 minutes. It would seem that a switch that ties together the whole network in a substation or generating station would certainly fit that bill, right?

At least one CIP auditor doesn’t think so. He makes his argument by drawing a distinction between networking devices on the ESP and those that are inside the ESP. For the latter, the argument is very simple (and there was a similar argument in CIP v3): Since the ESP needs to include all routably connected BES Cyber Assets/Systems, if you consider the device (e.g. a switch) on the ESP to be a BCS, then you need to redraw the ESP to include it. Then the switch on the redrawn ESP becomes a BCS, and you have to redraw the ESP again, etc. Ergo, a switch on an ESP perimeter can never be a BCS. In fact, it may very well be an Electronic Access Point.

So how about a switch that’s inside an ESP? There isn’t a compelling logical argument against making this switch a BCA/BCS, but the auditor asserts there’s no compelling logical argument to make it one, either. It’s better to user the simpler approach and not declare it a BCA. Of course, any switch within an ESP (and not otherwise part of a BCS) will have to be a Protected Cyber Asset, and will thus be subject to almost all the same controls as a BCS.

Here’s another question: How about a switch that’s within a BES Cyber System? For instance, if an entity declares their whole EMS is a BCS, should a network switch (again, one that’s not on the ESP boundary) be declared a BES Cyber Asset? As I pointed out in this recent post, if a Cyber Asset is part of a BCS, you don’t need to take the additional step to declare it a BCA (or a PCA). Since all of the v5 requirements apply at the BCS level, a switch will be protected by the standards in any case.

The auditor also does want me to point out that “What we are talking about are traditional networking devices like routers, switches, and firewalls, along with multiplexors, microwave, and the like - basically the LAN/WAN equipment that serves as the communications backbone. Not included are end devices like port servers, terminal servers, Digi devices, and so forth that simply convert the data stream between TCP/IP and serial. Those are not networking devices in the traditional sense and, as end devices that only appear in the LAN, should be identified as BCA if they have a sub-fifteen minute impact on BES reliability as described in the BCA definition.”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Thursday, January 22, 2015

A Consultant Criticizes NERC CIP

Last year, I wrote two posts (here and here) about what I see as a great sport engaged in by many in the press (and the consultants who egg them on): attacking the electric utility industry for real and imagined failings in their efforts to secure their infrastructure against cyber and physical attacks. I have now found another prime example of this sport, this time engaged in by a longtime practitioner, consultant Joe Weiss. I am referring to his recent blog post [i], which makes the case that the NERC CIP standards aren’t making the grid more secure or more reliable. More importantly, Mr. Weiss blames the industry for both developing and circumventing these standards.

I wish to say at the outset that I certainly don’t think all attacks on utilities for not having proper security in place are unjustified. And I certainly don’t think that attacks on the NERC CIP standards are unjustified; indeed, I think I’m listed in Guiness as the all-time leader in number of complaints about CIP version 5. But as I said in the two posts last year, the attacks need to be based on facts, and they need to make sense logically. Most of the points Mr. Weiss makes in his post don’t meet one or both of these criteria. Because these points are ones that have often been raised by others, and because they all have quite interesting implications, I will spend some time addressing all of them.

I also want to point out that Mr. Weiss bases his post in part on a doctoral thesis (publicly available and linked in the post) by Marlene Ladendorff, titled “The Effect of North American Electric Reliability Corporation Critical Infrastructure Protection Standards on Bulk Electric System Reliability”. Some of the “facts” cited by Mr. Weiss come from the thesis; others come from other sources (not all identified). I have not had the time to go through the thesis, so I will stipulate that Mr. Weiss has accurately represented Ms. Ladendorff’s findings.

...and eating it, too

My biggest problem with Mr. Weiss’ post is that he repeatedly tries to have his cake and eat it, too. That is, he bashes the utilities (or the CIP standards) for doing something, then turns around and bashes them for doing just the opposite. He is like the two ladies at a Catskills resort, in an old joke. The first says, “The food here is terrible.” The second says, “Yeah, and the portions are so small!”

1) The second paragraph of his post provides a perfect example of this. He says, “the exclusions in the NERC CIPs provide a road map to attackers as they identify what is in-scope, and just as important, what is out-of-scope and consequently not addressed.” Let’s break this down. First, he’s saying the CIP v5 bright-line criteria (for High or Medium impact assets) give attackers a “road map”. That is, they let them know what the most important assets are so they can presumably attack them. However, in the second part of the sentence Mr. Weiss complains about just the opposite. There, he says the criteria implicitly give attackers a list of assets that don’t meet these criteria, and are therefore not going to receive protection under CIP v5.

Do you see the problem here? He’s saying that attackers will use the BLC to find the best targets to attack (Highs and Mediums) – and will presumably attack them. But they’ll also use the BLC to find the targets that are easiest to attack (Lows - since the requirements that apply to them are much lighter) – and will also attack them. So the “road map” that NERC is giving to the attackers simply says, "Attack all BES assets!"[ii] Some road map.

2) Here’s a more important example. Mr. Weiss alludes at least three times to the fact that some entities literally removed routable connectivity (especially to substations) in order to reduce their compliance burden under CIP v1 – v3 (since Critical Assets that didn’t have external routable connectivity wouldn’t therefore have Critical Cyber Assets)[iii]. I don’t dispute this assertion at all; it is certainly true (although the number of entities that simply put off plans to implement routable connectivity was certainly much higher than the number that literally ripped it out). And it is also quite unfortunate, since there was probably some negative impact on reliability and security because of this practice.

However, later in the post he makes a completely different argument. He says that the requirements of NERC CIP (presumably v5) meant that “utilities with hundreds to thousands of substations will most likely connect their protective systems to external networks (usually over the Internet) to support a compliance requirement that can actually compromise security.” OK, so in the first case, CIP was bad because it gave utility companies an incentive to remove routable connectivity. Now it’s bad because it gives them an incentive to implement that connectivity! Can’t win for losin’, as they say.

3) A third example of having-your-cake-and-eating-it-too: Mr. Weiss complains “Depending on the cost of the fine compared to the cost to install NERC CIP compliance, some utilities have made the decision to pay the fine rather than make the security improvement.” I don’t doubt that there are some utilities who are doing just that, although I also doubt it’s very many and I’m sure in the long run it’s a very bad idea to do that.

Yet he later states, “Since the NERC CIP guidance requires anti-malware and anti-virus protection, some utilities are mandating protective relays to have malware protection even though adding this function will reduce the effectiveness and function of the relay.” So it seems these same utilities who are doing everything they can to avoid compliance are now going way overboard and actually jeopardizing their own operations by taking the requirements far too seriously[iv]! Now, that is devious. No wonder he’s outraged.

Other Items

Most of Mr. Weiss’ other arguments fall apart when you look at them closely:

1) Early in the post, he says “Electric distribution is excluded (majority of Smart Grid falls under this exclusion).” This is a common criticism of NERC CIP, from people who don’t know any better. But that doesn’t include Joe Weiss, so I’m surprised he’d say this. The CIP standards (and all the other NERC standards) only apply to the BES because that’s what FERC has authority over (of course, FERC’s authority is what makes the NERC standards more than just nice guidelines). Electric distribution is the domain of the state PUCs[v].

So what is Mr. Weiss advocating to fix this problem? Do we need to have a single central regulator for all electric generation, transmission and distribution? Lots of luck getting that through Congress. And should NERC and FERC just drop the idea of cyber security regulation altogether until this happens? At least then there would be consistency on both the BES and the Distribution sides: there would be no regulation at all.

2) Mr. Weiss cites an example from the thesis stating that “an exercise was cancelled by (a utility’s) compliance group, citing potential non-compliance issues with one of the CIP standards as the reason. The logic behind the compliance groups’ (sic) action was that if a potential weakness was found, it may (sic) need to be reported and the entity risked receiving a fine from NERC.” I know exactly what Mr. Weiss and Ms. Ladendorff are talking about, and I agree there are probably at least a few legal departments at utilities who take this attitude: we don’t want to find out what we’re doing wrong, because then we’d have to report it.

On the other hand, this is a very short-sighted strategy, not only from a cyber security but from a legal / compliance point of view. If an entity is out of compliance with a NERC requirement (not just CIP, of course), they need to self-report it immediately. If they don’t, and the NERC Regional Entity discovers this lack of compliance (either through an audit or perhaps as part of an Investigation), things will go much worse for the entity than if they had reported it in the first place. By deliberately not allowing non-compliance to be discovered, this legal team is setting their employer up for a much bigger fall further down the road.

I haven’t personally heard of any case where something like this has happened, although I certainly don’t dispute that it may have. This is certainly a strike against the NERC CIP standards, but it is also a strike against any mandatory regulations of any sort. If an entity has to report when it finds itself to be in violation of any regulation, there will always be a few misguided lawyers who think it’s in the entity’s interest not to know about a violation in the first place. This is an argument against any sort of regulation (or laws, for that matter. If I think I’ve misrepresented something on my taxes, should I investigate to find out if that is really the case -at the risk of then having to revise my filing - or should I not bother to look further and hope the IRS doesn’t either? I don’t have a ready answer for that question, but please don’t tell the IRS that); it is not an indictment of NERC CIP in particular.

3) Mr. Weiss summarizes some other examples from the thesis by saying “’some of the transmission owners….are gaming the system in order to prevent the application of the CIP standards.’ To accomplish this, some companies modified their networks to avoid compliance issues with CIP-003 through CIP-009.[vi]”

This sounds particularly devious, doesn’t it? TO’s are modifying their networks to avoid CIP compliance issues! Hmmm…I thought that was what compliance was all about. For example, the standards say (by implication) that your control network(s) shouldn’t be directly connected to your corporate network – so you modify the network by breaking that connection. Is that a bad thing?[vii]

4) Mr. Weiss states (again referring to the thesis), “Participant 2 in her study found that a company had the most sophisticated network protection he had seen. However, NERC staff reviewed their architecture and wanted them to tear it out. It took the company 6 months to convince NERC that this was the best protection they could do for the control systems the company was operating.”

Here, it seems the NERC staff was getting a little carried away in their zeal to enforce strict compliance with the letter of the requirements, and was trying to get an entity to remove a network protection scheme that was the best that could be implemented under the circumstances. This of course is unfortunate, but clearly neither the utility nor NERC can be accused of lack of zeal for doing the right thing in this case. What fault there is seems to be in the CIP standards, and there the fault is that they are too prescriptive. I completely agree they are too prescriptive, but nothing in this quotation squares with the general tenor of Mr. Weiss’ post – namely, that NERC, the utilities, and the CIP standards themselves aren’t doing anything to increase security.

5) Mr. Weiss complains early on that “the ‘brightline’ criteria exclude smaller facilities.” The BLC apply to all BES facilities, as High, Medium or Low impact. I believe what he is trying to say is that the Low impact requirements aren’t rigorous enough for his tastes; if so, he certainly wouldn’t be the first to feel that way. But he needs to say it explicitly, and also say what would be an adequate set of requirements for Low facilities, consonant with the idea that we can’t devote the entire GNP to complying with NERC CIP.

6) There is one paragraph of the post that I simply don’t understand: “Another example of the inconsistency of the NERC CIP guidance is that when it comes to grid reliability (sic) is the use of ‘black start’ facilities. Black Start facilities are those necessary to restart the grid after a complete grid outage. This function is considered critical by grid planning and operations organizations as well as organizations within NERC. During the review of the NERC CIP Revision 5 process, ISO New England raised a concern that adopting a new requirement for specific controls for Low Impact assets could have unintended consequences, such as the withdrawal of black start resources. This would make the grid less reliable.”

What is Mr. Weiss trying to say here? I at first thought he was saying it was bad that blackstart facilities had been removed as Medium (and made Low) impact in the BLC. But it now seems to me that he may not know that they were removed (even though that happened three years ago, during the drafting process), and he seems to be arguing that forcing blackstart assets to meet Medium requirements means that more will be withdrawn, thus negatively impacting “reliability” (although not having blackstarts doesn’t actually impact reliability, since blackstarts don’t prevent outages. It does impact resiliency, since blackstarts are needed to rapidly recover from a widespread outage).

And if Mr. Weiss does know that blackstarts were removed from the Medium criteria (as I said, the wording is ambiguous) and made Lows, then I don't understand his reporting of what the New England ISO supposedly said: that placing too onerous requirements on Lows means that blackstarts will be withdrawn. The way CIP v5 works now, every BES asset (with at least one BES Cyber System) is in scope as either High, Medium or Low. If the Low requirements prove too onerous for blackstarts, then they will have to be removed for all Low assets - meaning we'll go back to just the Low requirement in the original CIP v5 (which FERC was so unhappy with): there must be four policies in place at each Low asset. Is this what Mr. Weiss is advocating?

7) Mr. Weiss states, “Some of the security hardware can affect control system performance. A NERC report identified that a device locking tool used to meet NERC CIP requirements caused a disturbance that resulted in the loss of SCADA services. This is obviously making the grid less reliable and secure.” What is this saying? It seems to be that some device manufacturer developed a device locking tool that actually had negative effects. OK, whose fault is this? The utility’s? NERC’s? The CIP standards’? It seems to me he should file his complaint with the company that made the device.

Alternatively, whatever requirement the locking tool was addressing could just be removed from the standards, along with every other requirement that might possibly lead to implementation of measures that could cause a "disturbance". This would probably result in 10-20% of the CIP v5 standards being removed. Is this what Mr. Weiss wants?

8) Mr. Weiss’ concluding argument states, “Perhaps the most important point is there have already been four major cyber-related electric outages in the US (more than 90,000 customers). If the NERC CIPs were fully implemented, they would not have prevented any of these outages.” First off, I would very much like to hear about these four outages. I certainly never have heard of them before, and Mr. Weiss doesn’t point to any further information.

Second, once Mr. Weiss has given us information on these outages, I would like to know how he draws his conclusion that NERC CIP wouldn’t have prevented these outages. Of course, when he says the outages are “cyber-related”, he’s not necessarily saying these were the results of actual cyber attacks or malware. For that matter, the 2003 Northeast blackout had a couple “cyber-related” causes that NERC CIP wouldn’t have prevented either. This certainly doesn’t mean that CIP is ineffective.

Summing Up

You might get the idea that the only thing I like about Joe Weiss’ post is the font it appears in. Believe it or not, I regard the post as a flawed one that could actually have had some validity. He makes some perfectly legitimate points about entities removing connectivity to avoid having to comply, about Legal departments not wanting to see any evidence of non-compliance, about Distribution not being included, etc. But in his zeal to strike out against NERC, most utilities, and above all the CIP standards, he has simply thrown any and all arguments that come to mind into a single pot, with the hope that they’ll magically form a coherent stew. They don’t.

Note 1/23: This post originally had a sentence mentioning Senator Joe McCarthy. I realized this morning that, while it was not my intention to compare Mr. Weiss to McCarthy and the wording didn't state that, some readers might have drawn that inference. I sincerely apologize to Mr. Weiss for having included that sentence in the first place.

Note 1/25: I just modified the section marked "6)" above. It tries to make sense of Mr. Weiss' paragraph regarding blackstarts. When I wrote it, the only possible interpretation I could see was that Mr. Weiss didn't know blackstarts were no longer included in the criteria for Medium impact. However, I just realized this may not be the case, and Mr. Weiss was actually arguing for lesser requirements on Low impact assets. That doesn't make sense either (especially with what I have heard to be his opinion on the Low requirements), but I want to show I considered that possibility as well.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I want to thank Bob Radvanovsky for posting a notice of this on LinkedIn.

[ii] Of course, the criteria don’t list individual assets, nor do NERC or the regions publish such lists; the attackers will presumably have to go elsewhere to find out where to direct their attacks.

[iii] Quoting Mr. Weiss, who quotes the thesis, “Some entities were trying so hard to keep equipment out of scope that they spent money to ‘rip out fiber and CAT-5 [networking cable] and replaced it with serial [cable] to get away from routable protocols’ that would have brought networks into the compliance scope. Entities calculated that it would be cheaper to replace fiber and CAT-5 network cable with serial cable in order to remove equipment from the CIPs scope. Doing so eliminated the requirement to comply with CIP standards for those networks and equipment.”

[iv] CIP v5 makes it very clear that there is no requirement to load anti-malware software on a device that isn’t capable of loading or using it. In fact, in v5 the entity doesn’t have to take a Technical Feasibility Exception for this, as they did in v3.

[v] Actually, the PUCs only have authority over the IOU’s in their states, not the coops and municipals. So you could say that nobody regulates those entities, other than presumably their members or citizens.

[vi] The sentence in single quotes is presumably from the thesis. The second sentence is presumably Mr. Weiss’s.

[vii] There theoretically could be network modifications that might be taken to serve no purpose other than avoiding having to comply. But Mr. Weiss doesn’t say that is the case, and my brief review of the other examples in the thesis that he cites didn’t turn up any such modifications other than two cases which he addresses separately (and which I also discuss in this post). However, my point remains: entities are supposed to modify their networks to comply with the CIP standards. There is nothing at all sinister about modifications per se.

Sunday, January 18, 2015

Another Reason Why the Compliance Date Needs to be Pushed Back

If you haven’t read this blog lately, you may not know that I am now calling for the compliance dates for CIP Versions 5/6/7 to be pushed back – hopefully by a year, but at least by six months. I don’t rate the chances of this happening as very high currently, but I do think momentum for this proposal will grow sharply as this year goes on – and entities begin to realize how far they actually are from being able to affordably meet the April 1, 2016 date.

Note that the word “affordably” is important. There is a limited supply of CIP-experienced consulting resources available, and I know that many if not most of them are already effectively committed for the remainder of the runup to compliance. An entity that is late to the game can always find competent network or IT security consultants and pay for them to get intensive training in CIP; I know at least one entity that is doing that now (although I think they believe their millions are actually going toward implementation of CIP v5 compliance. That can’t happen until the consultants they’ve hired actually understand it). But this approach dramatically increases the cost of compliance; a large percentage of this increased cost could be alleviated if NERC entities were given more time to comply. As someone who witnessed – and, truth be told, benefited from – the spending frenzy in the final days of the Y2K runup, I would very much like not to see so much wasted money this time around, even if a few of those wasted dollars might end up in my pocket.

The compliance date needs to be pushed back because many – and from what I hear, most - entities are far from where they need to be at this point, if they are to comply by 4/1/16. I listed three reasons for this in my first post on this subject, but last week the head of CIP compliance for a large generation entity pointed out another reason to me.

Several of the criteria in Attachment 1 require the entity to classify a Facility as Medium impact if they have received a notification from an authority like a Transmission Planner that the Facility is important for some particular reason. In the case of Criterion 2.3, the reason is that a generating unit is necessary to avoid an Adverse Reliability Impact on the grid. In Criterion 2.6, the reason is that a generating plant or Transmission Facility is “critical to the derivation of…IROLs..and their associated contingencies.” Criteria 2.7 – 2.9 depend on similar notifications.

The CIP compliance manager’s complaint to me was straightforward: the entity just received, within the last month, unexpected notice that three of their plants were critical to derivation of IROLs, so they are now Medium impact. And what’s the problem with this? Well, it’s now less than 15 months ‘til the compliance date, that’s what. There is a lot of work required to bring a generating station into compliance; they are going to have to really sweat it out to make the 4/1/16 date. Moreover, they will almost inevitably spend a lot more money to comply than they would have if they had been given notice in a more timely fashion, say early last year, when there would have been 24 months to comply.

I haven’t heard complaints from Transmission entities about receiving late notices under 2.6, so this might possibly be a problem limited to Generation. In any case, that doesn’t alleviate the injustice that’s been done to these entities (and of course, generating entities don’t normally have cost recovery like utilities do on the T&D side. Every extra dollar they spend due to having received the notice so late comes right from their pockets).

The clock keeps ticking – 14 ½ months remain for compliance as of last Thursday. I’m not going to let this go.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Wednesday, January 14, 2015

Roll Your Own, Part VIII: Will there be PVs Issued for CIP-002-5.1 R1?

In a recent post, I expressed the opinion that NERC should declare CIP-002-5.1 R1 an “open” requirement, meaning that entities who make a good faith effort to comply shouldn’t be issued Potential Violations if they get something wrong. I said this because there are so many ambiguities and contradictions in the requirement – and because NERC has not come across with the guidance that would be needed for R1 to be truly “auditable” (I did try to make clear, however, that this only applies to this one requirement – the other requirements in CIP v5/6/7 are clear enough that this is not needed for them).

I didn’t stop there, though. I continued to say that R1 would effectively become an open requirement whether or not NERC takes me up on my suggestion to make it so. This is because I really can’t see auditors wanting to waste their time writing up violations that would never hold up if challenged in a court of law (which, of course, NERC entities can do).

A respected CIP auditor with one of the NERC regions took issue with this. His argument runs like this:

He points to paragraph 320 of FERC order 706 (which approved CIP Version 1), which says “We will not allow a ‘safe harbor’ for good faith compliance as requested by AMP Ohio. We do not believe that blanket waivers from an enforcement action are appropriate in this context and have previously denied other requests for safe harbors from enforcement. Rather, we believe that demonstrable good faith compliance is a legitimate mitigating factor in an enforcement action.” In other words, even if NERC wanted to make R1 an open requirement, FERC would never allow it.
He states that, while he agrees he wouldn’t write PVs in cases where he doesn’t think his region would prevail if there were an appeal, he wouldn’t hesitate to write one in a case where he thought the entity had violated the clear meaning of the requirement. To support this argument he pointed to paragraph 72 of Order 706, which says that “compliance will in all cases be measured by determining whether a party met or failed to meet the Requirement.”

Regarding the auditor’s first point, I never believed there was a significant probability that NERC would take up my suggestion. I likened the probability to that of the Cubs winning the World Series this year – enough said about that. This brings us to point 2. How do we differ on that one?

People who have been reading this blog for a while know that I started a series of posts in September called “Roll Your Own”, of which this is the eighth installment. These posts discuss the need for NERC entities to come up with their own definitions and interpretations in CIP v5, in the many cases where NERC hasn’t provided adequate guidance. Does the auditor’s argument undercut my advocacy of rolling your own?

Not at all. The fact is that NERC entities can’t keep waiting for NERC to come out with guidance on the CIP v5 standards (if they still are waiting – I hope not, but I suspect many are). They have to have something to fill in the gaps. The only option they have is to consider all the guidance on a particular issue that is out there (say, on the definition of “programmable” – which includes a draft NERC Lessons Learned[i] document released last week[ii]) at the time they need it, come up with their best definition or interpretation, then make sure to document it, along with how they came to these conclusions.

Of course, the entities need to do their best to adhere to the wording of the requirement in question. But if the requirement or definition isn’t clear enough for compliance in the normal sense (i.e. following the requirement exactly), and if NERC hasn’t produced guidance on this issue or what they have produced is inadequate, the entities have no choice but to roll their own definition or interpretation; in fact, the very auditor who wrote in to me on this issue is the same one who previously agreed there is no other option for NERC entities.[iii]

Does the auditor’s argument negate my prediction that there will be no PVs issued for good faith CIP-002-5.1 R1 violations? Well, I’ll admit this may have been an exaggeration (not that I ever exaggerate, of course – except in the preceding seven words). There could well be a few PVs issued for mistakes made in good faith, by an auditor who truly believes certain wording in R1 is crystal clear, even though it is actually ambiguous. But I continue to believe that entities who make a good faith effort to comply with R1 (including carefully considering any guidance from NERC or the regions), and who roll their own definitions where these are MIA from NERC, have nothing to fear when the auditors come calling to assess their compliance. After all, what else can they do? Simply tell the auditors they’re not complying with this requirement because they don’t understand it?[iv]

The difference between what I and the auditor are saying is really one of degree. To understand what I mean, I refer you to the great philosopher Donald Rumsfeld, who said “There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.”

I will paraphrase this in reference to CIP-002-5.1 R1. There are unambiguous ambiguities, meaning things that are definitely ambiguous and on which there is little disagreement as to their ambiguity. And there are ambiguous ambiguities, meaning things that are ambiguous but on which there is disagreement as to their ambiguity (i.e. some people think they’re crystal clear, while others think they're not clear at all, or even worse do not make sense in the English language).

I think my disagreement with the auditor is over the relative proportions of these two types of ambiguities in R1 (and by R1, I mean “R1 and Attachment 1”). He clearly thinks that most of the ambiguities in R1 are of the first type; I happen to think most of them are of the second type – so they aren’t being officially acknowledged by NERC and thus aren’t going to be addressed in Lessons Learned, FAQs, etc.

I’m sure I and the auditor both agree (and he has reviewed this post beforehand, so I’m not speculating here) that auditors won’t issue PVs for violations of wording that is ambiguous of the first type. In other words, if it is pretty clear to the auditor that the wording is ambiguous (or that a definition is missing), he/she won’t issue a PV for a violation. This makes sense; auditors aren’t evil people; they’re professionals who try to be as fair and consistent as possible. Plus, auditors don’t want to make a bunch of unnecessary work for themselves. Violations cost a huge amount of time to the auditor, as well as (especially) to the other staff of the region; and that is even before any appeal of the finding. I’m sure all auditors write PVs very reluctantly, knowing that it will probably mean at least some lost Friday evenings for them in the coming months.

However, since this auditor believes there are few ambiguities of the second type in R1, he thinks that any PVs that are issued will be fairly indisputable - in other words, there aren’t many parts of R1 where there are ambiguous ambiguities. Given this, auditors won’t hesitate to issue PVs when they think an entity is wrong, since they won’t worry that there are a lot of hidden ambiguities in R1 that may come out and invalidate the PV he/she just issued.

I, on the other hand, think there are many ambiguities of the second type in R1. This leads me to believe that auditors won’t issue many PVs, since they will always be second-guessing themselves on whether the wording is clear enough for them to do this.[v] This is why I say that R1 will end up becoming effectively an open requirement, whether or not NERC declares it such.

Of course, we won’t know whether there will be PVs on R1 until v5/6/7 comes into effect and audits start taking place. But here’s how you’ll be able to tell whether I or the auditor was right: If you hear of a lot of PVs being issued for CIP-002-5.1 R1, he is right. If you don’t, I’m right.[vi]

The problem with this little contest is it will take a number of years for you to determine which of us was right. But I have a better idea. Why doesn’t NERC make this contest irrelevant and do the three things I’m requesting it do?

Postpone the compliance dates for CIP v5/6/7, hopefully by a year;
Declare CIP-002-5.1 R1 an open requirement; and
Start writing a SAR for a new version of CIP-002 that could actually be interpreted without ambiguity.

A guy can dream, can’t he?

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] The auditor did point out to me that, while they don’t have the standing of true Interpretations, the Lessons Learned documents will have a higher legal standing than just some PowerPoint that a NERC staff member may have put together, since they are produced as part of a process - including comments from the membership - specified in the NERC Rules of Procedure. For more on that, see this post.

[ii] The auditor also informs me that NERC will come out with their “top 15” Lessons Learned by April 1, 2015. This will certainly help some entities, but it’s about a year too late for others; plus I’m sure there are more like a couple hundred LL’s actually needed (I identified 20 issues just in CIP-002-5.1 R1 in this recent post, and I have about 5-10 more I could now add, just about that one requirement. Lew Folkerth of RFC discussed a serious issue in CIP-010 in this post. The list goes on and on, and will keep growing as entities struggle to comply. For example, I wouldn’t be surprised if there ended up being over a hundred issues just having to do with the bright-line criteria in Attachment 1of CIP-002-5.1).

[iii] The auditor said, as I have before, that the entity is obligated to carefully consider any guidance NERC has provided. For example, I just provided a link to the draft Lessons Learned document on “programmable”, posted for comment last week. Entities can still roll their own definition if they think this document isn’t particularly helpful, but they have to document why they feel this way and be prepared to defend their position with the auditors.

[iv] The auditor points out, “Just bear in mind that the auditor is making his/her evaluation based on the best information available, coupled with the auditor’s technical training and work experience. That will be true regardless, but in the absence of a formal definition or guidance, the auditor will fall back on training and experience. For example, ‘Programmable’ is a well-defined term in the IT world. The issue is its applicability in the generating plant, and that is where the Lessons Learned guidance will come in.”

[v] At this point, the auditor adds, “Actually, there will be little second guessing. The auditor has to be qualified to audit a requirement, through training and experience, in order for the audit objectives to be met. The auditor will rely upon his/her training and experience, along with the best information available, in forming an expectation. The auditor goes into an audit with an expectation of what is necessary to demonstrate compliance. The challenge will be to be open to an entity’s approach as opposed to only allowing an approach fixed in the auditor’s mind (it is what you do, not how you do it; or as I have heard often, the color of the widget only matters if the requirement prescribes the color). And the Regional auditors are sufficiently experienced that this should not be a widespread problem. That is also why we have audit teams and a consensus process, not individuals, making the finding determination. The entity will have to be able to persuade the audit team that the entity approach comports with the intent of the requirement and with the specific prescriptions of the requirement as may be present. And don’t forget that consensus is not the same as unanimity.” This is good clarification, but it rests on the assumption that the wording of the requirement (or definition) in question is fairly clear. As I’ve said, we differ greatly on our assessments of how much of CIP-002-5.1 R1 is “clear”.

[vi] The auditor points out a third option: Maybe there will be few PVs because the auditors think the entities are doing it right! I guess there’s always that possibility….

Sunday, January 11, 2015

What Scares Me about the Sony Hack

There has of course been much written about the cyber attack on Sony, but I have seen nothing about its implications for control systems, especially in critical infrastructure like electric power. This may seem an odd complaint, since it’s not likely Sony had any control systems, and in any case they wouldn’t be part of what most of us call critical infrastructure.

Yet I do think this attack should be profoundly disturbing for all owners or operators of critical infrastructure. To understand why this is the case, think of why it has been hard to get many people to believe there could really be a large-scale attack on CI. In my opinion, a great many decision makers in critical industries simply fail to see a plausible scenario for such an attack. We can all understand why Target was breached, and why banks and other financial institutions are under constant attack: there is much money to be made from stealing credit card and other financial account information, as well as personal information like Social Security numbers.

However, the motive for an attack on critical infrastructure would have to be primarily a desire to cause destruction and chaos. Now, there are certainly groups like ISIS and al Qaeda that would presumably love to do that; but our power grid is already fairly well protected, and it isn’t likely any of these groups have the capability to launch the kind of large-scale, long-term effort required to make such an attack successful.

The entities that do have that capability are nation-states like Russia and China. But they clearly don’t have the motivation. Russia and China know that US retaliation for a large-scale infrastructure cyber attack would be devastating (and not necessarily limited to cyber weapons). And China, being the largest holder of US government debt, can hardly be expected to initiate an attack that might seriously impair the value of their investment.

Yet the attacker in the Sony case was most likely a nation-state, North Korea; it seems they have a formidable cyber attack force in operation – on the order of thousands of cyber warriors. And what was their motivation for attacking Sony? Simply to cause as much damage as possible, in a fit of pique over an upcoming movie. What’s to keep them from attacking the US power grid the next time they’re unhappy with us? Or for that matter, what’s to keep Iran from launching an attack if the nuclear talks fail and we impose more sanctions – especially if they come to feel they have nothing more to lose?

So for the first time we’ve seen a successful cyber attack, by a nation-state with deep cyber warfare capabilities, for the sole purpose of creating havoc. And since we know – from Stuxnet and the successful 2008 cyber attack on an oil pipeline in Turkey – that critical infrastructure can be destroyed through purely cyber means, we now have all the prerequisites in place for a devastating cyber attack on North American critical infrastructure.

And folks, that’s why getting NERC CIP right is so important.

P.S. In case you need to be reminded (as I did) of the potentially devastating consequences of an attack on the North American power grid, I refer you to this excellent commentary piece in the current issue of Power magazine.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Friday, January 9, 2015

Correction to My Post on CIP-002-5.1 R1 Compliance Methodology

Kevin Perry, Chief CIP Auditor of SPP, took strong exception to one of the paragraphs in my recent post on the compliance methodology for CIP-002-5.1 R1. The paragraph – at the end of the section headed “Task 3” – outlined a final step I thought should be taken after the entity completes the “top-down” identification of BES Cyber Systems. It reads:

"The final step of the top-down approach is to identify the component Cyber Assets that make up each potential BCS. These will be either BES Cyber Assets or Protected Cyber Assets. Since almost the same requirements apply to both BCAs and PCAs, it might be easier just to declare them all BCAs."

Kevin pointed out that you shouldn't label a component of a BCS as a BCA unless it actually meets that definition. A BCS must contain at least one BCA but can also include non-BCAs. Once included as a component of a BES Cyber System, it no longer matters whether the Cyber Asset is a BCA in its own right. The requirements all apply to the BCS, not the individual components.

Kevin also pointed out that you shouldn't label a BCS component as a PCA; this is because at the time you are identifying BCA and BCS, CIP-005-5 R1 and the concept of PCA are not in play. From the perspective of CIP-002-5, there are BCA and there are non-BCA, both of which can constitute a BCS. The PCA only results from inclusion within a defined ESP, which does not exist until the BCS have already been identified. I agree with both of these points, of course, and thank Kevin for making them to me. He has just saved you an extra step which wouldn't serve any compliance purpose.

Note: Kevin pointed out a more fundamental issue with Tasks 3 and 4 in my post. I will have a new post out addressing that in a few days.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Wednesday, January 7, 2015

Roll Your Own, Part VII: The Return of Lew Folkerth

A week after I’d posted my first post in the “Roll Your Own” series in September, I attended an RFC meeting on CIP v5 where Lew Folkerth (of RFC) gave a presentation that seemed to agree strikingly with what I had just posted. It wasn’t that he’d copied me; in fact, he had clearly been thinking about this issue long before I had, and frankly had a more sophisticated take on it. I summarized his discussion in this post.

Now, Lew has documented his position in a very good article in RFC’s newsletter (you can find the newsletter here; his article is on pages 8 and 9). I highly recommend you read the article, but here are my main takeaways:

When he sent me the article, he pointed out that he’s not a big fan of my phrase “roll your own”. He much prefers something like “do the best you can with what you’re given”; I certainly agree that is also a good description of what I’m talking about.
My approach has been fairly narrowly focused. Briefly, I’ve been saying in the “Roll Your Own” posts that, because some requirements in v5 are ambiguous, and because some required definitions are simply MIA in CIP v5, the entity needs to come up with its own interpretations of the former, and essentially create the latter as best they can.
Lew is taking a broader view of the problem – and requiring more work of the entity. He’s saying that, in any case where an entity has honest doubts about what a requirement means (and where there hasn’t been good guidance provided by NERC or the Regional Entity), the entity needs essentially to rewrite the requirement so it applies directly and clearly to the entity’s own situation.
Specifically, he says the entity should “Determine what the Requirement intends to accomplish in the context of the entity, and how the entity will address this intent.” Having rewritten the requirement so that it includes this, the entity needs to comply with the rewritten requirement in the same way they would if they were just following the given language of the requirement; this should be repeated for each requirement where there is ambiguity. Essentially, he’s saying you need to come up with “Entity X’s CIP Version 5”[i], and simply comply with that[ii].
You can hopefully see that this goes well beyond what I’ve been saying, since I’ve been focusing on clarifying the wording of each requirement, not on determining the intent of the requirement and how it applies to the entity’s specific environment. I do agree that Lew’s approach is superior, although it will require more work of the entity, especially in determining the “intent” of the requirement.[iii]
Lew uses R1 from CIP-010-1[iv] as his example, pointing out that the entity really needs to define the term “software” so that the requirement reflects “what the Requirement (meaning the original requirement) intends to accomplish in the context of the entity”. He stresses that, if you simply take “software” to mean all small scripts, etc, you will be creating a huge burden for yourself, for little or no gain in BES reliability. This part of the article is worth the price of admission (actually far more, since it’s free) all by itself.
Lew also puts this in the context of RAI (especially his Step 3). That is certainly something that entities need to think about more and more as they determine how they will comply with CIP v5.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] As in other posts, I’m using “CIP Version 5” in the way it’s commonly used: to refer to the mixture of v5, v6 and v7 standards that entities will actually have to follow, not the ten v5 standards approved by FERC in Order 791. See this post for a list of the actual standards you will have to comply with.

[ii] Of course, I’m not saying (and Lew isn’t either) that the entity needs to rewrite every requirement in v5. Most of the requirements are fairly clear, and don’t need to be interpreted by the entity. However, the ones that aren’t clear (especially my favorite, CIP-002-5.1 R1) can cause lots of problems.

[iii] I don’t think Lew means “intent” in the sense of “the intent of the Standards Drafting Team”. I have previously written that trying to discern that intent, or use it to interpret the requirements of v5, is a fool’s errand. I believe he’s simply saying that, by closely reading all of the v5 requirements, the Guidelines and Technical Basis and the Lessons Learned, as well as using background documents like ES-C2M2 or the SANS Critical Security Controls, you can get a good idea of what the requirement would mean in your environment.

[iv] He writes “CIP-010-1 (and -2)” in his article, since I think he wrote it before v7 made its appearance on the scene. In fact, the version that entities have to comply with will be the v7 version, or CIP-010-3. Things move fast here in the NERC CIP world.