Tom Alrich's Blog

Sunday, February 1, 2015

NERC CIP in Fantasyland

I attended WECC’s CIPUG – CIP User Group – in Anaheim, Calif. last week. This was the third CIPUG I’ve attended in this location, at a hotel a couple blocks from the gates of Disneyland. It was as usual an intimate gathering – just me and 350 of my closest friends. And as usual, it was a very well-organized and well-programmed event.

The first time I heard about a CIPUG being next to Disneyland, I thought, “How appropriate. There is such a huge amount of unreality in NERC CIP; we’ll all feel right at home there.” But after attending the meeting last week, I saw this juxtaposition from an almost opposite perspective.

I have often put myself in the place of the people working in Disneyland, and especially Fantasyland – playing Mickey and Minnie Mouse, the Seven Dwarfs, etc. I am sure these people have no illusions that they work in a make-believe world. When they come off work, they don’t have to adjust to our “real” world –they feel they never left it. The only people who actually believe in the make-believe world of Fantasyland are of course the very young kids who visit there.

Let’s contrast this with the people attending the CIPUG: staff members of NERC, WECC, and the NERC Responsible Entities, as well as consultants like me. We are all indulging in fantasies about NERC CIP Version 5 and its path to implementation; those fantasies were on display in the CIPUG presentations as well as the conversations at breakfast, lunch and the breaks. The difference between us and the people who play the Fantasyland characters is that they know they’re in a make-believe world. Those of us attending the CIPUG, on the other hand, didn’t have a clue that this is the case. We were in the position of the young kids visiting Fantasyland, not the workers putting in their time there and thinking about anything except Mickey Mouse.

In this post, I will list three fantasies that are quite prevalent in the world of NERC CIP and that were on display at the CIPUG (but, of course, that are certainly not limited to the people who attended the CIPUG). I do wish to point out that I am not singling out any particular individuals as being more prone to these fantasies than anyone else, although I will illustrate the fantasies through their manifestation in the presentations and discussions at the CIPUG. These are institutional fantasies that have evolved to enable the whole NERC CIP “world” to live with an increasingly impossible situation, and to justify the fact that thousands of people in that world are plodding dutifully ahead, with no clear idea where they’re actually going or whether in fact they are really getting anywhere at all.[i]

Note: The CIPUG was just one of a total of three days of meetings. The first two days were a combination of the WECC Compliance User Group (or CUG - i.e. the group that manages compliance with the other NERC and WECC standards, collectively known as the “693” or “O & P” standards) and the Western Interconnect Compliance Forum (WICF) – the association of NERC compliance professionals at WECC entities (whose meetings and forums are off-limits to NERC and WECC staff members). The presentations from the entire three days can be found at this location; you can find the ones relevant to CIP by looking for titles with “CIP” or “Cyber Security”.[ii]

Fantasy Number One: The Foundation of CIP Version 5 is Strong

It’s no surprise when I tell you that NERC and WECC (and the other seven Regional Entities) are moving forward (perhaps not at full speed) on implementing CIP Version 5. Given that, it should also be no surprise that the NERC and WECC staff members, in their presentations, didn’t bring up any fundamental issues with v5. They certainly did mention various issues that needed to be resolved, but none of the “hold the presses” variety. Indeed, how could they stay in their current jobs unless they truly believed this? NERC is committed to implementing CIP v5 as written; if you don’t think that can really be done, you should seek employment elsewhere.

If you’ve read any of my posts since April 2013, you know that I think the foundation of CIP v5 is rotten. The foundation consists primarily of CIP-002-5.1. I recently documented twenty serious problems with that standard, and I’m sure I could add another 15 – 20 problems to that today without much effort (some of these came out directly or inadvertently in the CIPUG presentations).

But the “foundations” of CIP v5 aren’t just in CIP-002; they’re part of other standards as well, especially CIP-005-5. And this is why I found the presentation by Morgan King of WECC on CIP-005 Lessons Learned to be so interesting. Morgan did a very good job of discussing four Lessons Learned (none finalized as of yet, and one or two not even released in draft form) that relate to CIP-005. His presentation provides some good information, although I really wish there were recordings available, since the Q&A was the really interesting part.

The Q&A for Morgan’s presentation clearly revealed something I’d already suspected: as you[iii] start to probe more deeply into any particular question about CIP v5, you’re almost certain to uncover a number of additional questions. And so it went with Morgan. He addressed four particular Lessons Learned, but I’d say there were 3 - 5 new questions raised about each one (and if there weren’t five questions raised, it’s only because discussion had to be shut off to move to the next presentation. There could have easily been at least a whole half day of Q&A just on Morgan’s presentation; this is true for a few of the other presentations as well, especially those of Dr. Joe Baugh on CIP v5 Pilot Study Lessons Learned and Lisa Wood on Low Impact Assets [iv] in v5. I highly recommend WECC expand the CIPUG to a day and a half, just like the CUG).

I unfortunately didn’t take notes on the different issues that were raised – they came fast and furious, and as I said Morgan (as well as all of the presenters) was under a lot of pressure to finish up so the next presentation could start. I do remember there were a lot of questions on the External Routable Connectivity discussion at the beginning of the presentation and the Virtualization discussion at the end. Morgan handled these new questions in the only way he could be expected to: by saying the NERC Transition Advisory Group (of which he’s an active member) hadn’t addressed them yet.[v]

To illustrate what I said about new questions being raised as soon as you try to address one question in CIP v5, I’ll point you to page 18 in Morgan’s presentation. There, he lists three criteria for the presence of External Routable Connectivity, including “Would the misuse or disruption of those routable protocols or BES Cyber Assets have an adverse impact on the BES within 15 minutes?” This isn’t a question that was asked at the meeting, but I’ll ask it now: What does adverse BES impact – within 15 minutes or 15 years – have to do with the question whether a cyber asset has ERC? Adverse BES impact is certainly important for determining whether a Cyber Asset meets the BES Cyber Asset definition (as I discussed in this post), but it has nothing to do with ERC.

Note (Feb. 2): Morgan emailed me this morning to point out that his slide 18 (the one discussed above) had been unclear; the third bullet point (quoted above) really had to do with the question whether the protocol converter was a BCS (see slide 12), in which case the question about adverse BES impact does make sense. He said he mentioned this during the presentation, which I don't doubt - there was so much he had to say and he (like the other presenters) was rushing through it so quickly that I couldn't really absorb even half of the things he said. Besides extending the CIPUG to a day and a half (it's really about 6 hours now), WECC should also make the webinar recording publicly available (they webcast the CUG/CIPUG for WECC members, who have to pay to "attend" that, just as they do to attend the live event).

And this brings up another topic. In his presentation, Tobias Whitney of NERC said that one way NERC plans to get more information out to entities regarding CIP v5 is the new "CIP University". No, CU won't consist of nice Gothic buildings with ivy on them. It will make CIP meetings hosted by the different regions available to all NERC attendees. This is nice, but it isn't going to make a huge difference, since I don't believe any of the other regions make their meetings available by webcast like WECC does; requiring people to attend in person, with limited travel budgets, isn't going to greatly increase the learning opportunities.

More importantly, what WECC is doing for CIP v5 education is far beyond what any of the other regions have done. WECC has had three two-day v5 workshops, and will be having two workshops on Low impact assets (one is this week, although it's sold out). This is in addition to the three-times-a-year CIPUGs. I don't know any other region that has had more than 2-3 days worth of CIP v5 workshops so far (a couple have had zero that I know of); this isn't surprising, since WECC is far more than twice the size of any other region. I have always recommended that people from other regions attend WECC meetings. This is allowed (although I don't know if the webinars are made available to non-WECC entities), and is encouraged because there really isn't a lot of WECC-specific content in the meetings (there was virtually none in the CIPUG); everyone can benefit from them. Maybe WECC can offer to expand their meetings and make them all available to NERC entities online.

Another issue I had with Morgan’s presentation was at the end, when he said – probably as part of his response to a question – that a network switch would be a BES Cyber Asset. I recently wrote a post pointing out that another NERC auditor (different region) strongly believes switches should not be BCAs. I won’t say who is right on this matter (although I lean toward the other auditor’s position). However, this shows there are some fundamental questions that are being seriously debated now (or should be debated, if they’re not) within NERC – exactly 14 months before the High/Medium compliance date. Anybody else see a problem with this?

Speaking of getting on to the next topic, it’s time for me to get on to the next fantasy that was revealed at the CIPUG. Suffice it to say that the primary “lesson learned” I took away from Morgan’s presentation (as well as a couple others) was that there can be no end to the questions raised about CIP v5, at least within a finite time period such as, say, the 14 months between now and April 1, 2016.

Fantasy Number Two: The Interpretation Issues with CIP v5 are Manageable

The previous paragraph is a great lead-in to this fantasy. I state again that I’m not pointing a finger at any particular individuals as subject to the fantasies discussed in this post, but I will use the presentation by Tobias Whitney of NERC – and his response to a question I asked him in the meeting – as an illustration of this fantasy.

Tobias’ presentation was titled “Version 5 Pilot, RAI Initiative and Transition Guidance”. It was good, and especially important because he –as the person in charge of all of this – was the one delivering it. A highlight was his list of 15 (or so) Lessons Learned that he promised would be addressed by April 1 of this year. He also mentioned that entities should submit any new questions to their regions, who will then submit them to NERC.

My question to Tobias was in essence the following, but it was much shorter: “Tobias, it’s wonderful that you’re addressing 15 questions by April 1. By the way, you didn’t mention that you’re using FAQs to address some other questions, but you have addressed maybe 30 additional questions that way, and will undoubtedly do more FAQs as well. However , as we saw in the presentations earlier today – especially Morgan’s – the questions keep metastasizing, so that as you probe deeper into almost any one of them you find a number of further questions, and so on perhaps ad infinitum.[vi] I’m sure that NERC entities could today come up collectively with over 500 serious questions on CIP v5, with more appearing all the time.

“I’m not asking you to tell me when every v5 question will be addressed. You obviously can’t tell me that until you have a list of all v5 questions. But since these questions are clearly growing daily, what can NERC do to at least develop and maintain a comprehensive online list of v5 questions that have been asked to date? These would be questions that don’t have an easy answer by referring to the wording of the standards, or one of the guidance documents like the Lessons Learned (although those have to be answered as well). I think this list would provide a big benefit just by itself, even though it wouldn’t actually answer any questions. Even though NERC entities wouldn’t have answers to most of these questions, they would at least have a rough idea of the size of the elephant as they take bites out of it.”

Tobias’ answer to me was quite interesting, and not at all what I expected. He didn’t dispute the idea that there were many questions on v5 that NERC hasn’t even thought of yet, or that NERC ultimately will struggle to address every question that comes up[vii]. What he said was that a comprehensive list of questions might be a bad idea because it could cause entities to get discouraged and slow down or even stop their current efforts to come into v5 compliance! When I expressed my surprise at this answer, he backed away from it, but later seemed to come back to it when he mentioned the danger of “paralysis by analysis”.

Think of what this means. He seems to be saying NERC entities need to at some point simply charge ahead and do their best to come into compliance with v5, even though they may have all sorts of questions (both officially acknowledged and unacknowledged questions, as well as ones that are unknown at the moment) that could call into question whether portions of their effort are actually in error and need to be re-done or simply abandoned.[viii]

This might be good advice for adventurers heading off to explore new territory. For them, it’s obviously impossible to know in advance all the obstacles that may lie ahead (otherwise, it wouldn’t be new territory). But it doesn’t exactly strike me as wonderful advice for organizations that are moving to comply with a new set of standards, where there can potentially be huge fines for non-compliance. What if Entity A takes his advice, puts aside any questions they have about whether they’re properly identifying BES Cyber Systems, and proceeds to develop an entire v5 compliance program based on a set of “BCS” that, as it turns out, weren’t properly identified in the first place? Is Tobias saying they won’t get assessed a PV when an auditor – maybe four years from now – realizes they have completely missed the boat?

Maybe he is saying this. I’ve already said regarding CIP-002-5.1 R1 that, not only should it be declared an “open” requirement with no PV’s assessed for good-faith efforts to comply, but that it will be an open requirement even if not actually “declared” so. This is because no NERC auditor is going to assess a violation for a requirement that is ambiguously worded, and which the entity has tried their best to understand and comply with. Maybe this idea really applies to all of the CIP v5 standards, not just CIP-002.[ix] In other words, maybe the entire set of v5 standards should be declared open; and even if not declared so, they will be anyway because the auditors won’t assess PVs.

If Tobias really means to declare all of the v5 standards to be open ones, he of course needs to first get NERC and FERC on board with that idea; at the moment I think that would be quite a challenge. So maybe his idea is to either postpone the v5 compliance dates (as I advocated in this post), or to declare just the first year of compliance to be an “open” one (which I didn’t advocate, although I’m not against this as long as it’s stated explicitly by NERC – not just left up to the discretion of the auditors. There’s far too much auditor discretion already, and nobody is unhappier about that situation than the auditors themselves. They want clear guidance).[x]

In any case, it should be clear that I’m not satisfied with Tobias’ answer to me. I certainly agree that entities shouldn’t stop their CIP v5 compliance efforts at this point. But I don’t see any way they can go ahead with an untroubled mind – as Tobias wants them to do – without the compliance date being moved back or the whole of CIP v5 being declared “open” for a year or so.[xi]

Fantasy No. 3: “We’ve Got it Under Control”

The third fantasy I identified at the CIPUG (and have identified before) is one indulged in by some compliance personnel at Registered Entities. They believe they have CIP v5 pretty well figured out, and just need to fill in the blanks in order to complete their compliance implementation.

Let me say that I don’t think these people are lying when they say this. They honestly believe that – while there are admittedly some less-important questions that need to be resolved – the fundamental concepts in CIP v5 are clear. And why shouldn’t they believe that? Every NERC presentation, webinar, bulletin, etc. says or implies the same thing. I have yet to see a presentation by a NERC staff member who says, “Yes, for this important v5 issue we haven’t a clue about what we’re going to do to address it.” Yet I could name a number of very fundamental issues[xii] for which that is exactly the case (or at least NERC hasn't announced they're addressing them).

I quoted the physicist Richard Feynman in a recent post, who famously said “If you think you understand quantum mechanics, you don't understand quantum mechanics.” I’ll paraphrase that: “If you think you understand how NERC CIP v5 works, you don’t understand how NERC CIP v5 works.” As someone who has spent close to two years trying to understand how CIP v5 works and isn’t much further toward that goal than when I started, I absolutely believe this to be true. No matter how smart you are, how many people you’ve talked to, or how many conferences you’ve attended, if you think you have CIP v5 down pat conceptually and you can just unthinkingly forge ahead to compliance, you’re living in Fantasyland. The utmost humility is called for when CIP Version 5 is concerned.

As an example of this, one consultant and I were discussing at CIPUG the entities who say they understand CIP v5, then say that they really just have to identify all of their Critical Cyber Assets from CIP v3 as BES Cyber Systems, and they’ll be done with their BCS identification (one entity said that to me at the CIPUG). If you’re one of these, I won’t say you’re wrong when you think that your lists of BCS (or BCAs) and CCAs will be the same; you may well be right about this. However, you are wrong if you think the auditor will be satisfied if you tell him/her you just took your CCA list and made it your BCS list. CIP-002-5.1 R1 requires you to develop and document a methodology for identifying BES Cyber Systems, and to apply that methodology to identify BCS. You need to develop the methodology, and then run every Cyber Asset you have through it to develop your BCS list. If it turns out the result identifies the exact same cyber assets that were on your CCA list, great. But you can never assume that at the start.

So this is the last of the three fantasies I saw running rampant at the WECC CIPUG last week. Like the measles, these things seem to spread at Disneyland.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Feb. 9: I've written a sequel of sorts to this post, which you can find here.

[i] Of course, institutional fantasies are nothing unique to the NERC world. In fact, there is probably no institution that doesn’t indulge in its own fantasies, since not to do so would make it impossible for the employees to do their work. Examples (and much more serious ones) include:

· I’m sure employees of tobacco companies – in the years before the companies finally admitted the documented serious health effects were in fact real – were trained in how to respond when people brought up these effects to them: by denying that there were such effects (and probably believing that what they said was true). What else could they possibly do?

· I’m also sure that most US Congressmen/Congresswomen and Senators don’t believe that the vast majority of the work they’ve been doing in recent years has been completely futile, if not outright counterproductive. How can you possibly stand before the voters and ask to be re-elected, with any other belief in your mind?

· There are people to this day that assert that the Vietnam War produced some positive good for the people of Vietnam. Again, if you were closely involved with that effort, how could you say otherwise?

[ii] A few of the CUG presentations will be of interest to CIP compliance professionals, even if they don’t deal with any other NERC standards. This includes one presentation on the BES definition and about five on RAI – two with “RAI” in the title, as well as presentations labeled “Internal Controls” and “Risk-Based Framework”.

[iii] And by “you” I mean a cyber security professional. I am not that, but there were of course many of them in the room at the CIPUG.

[iv] I notice now that Lisa’s presentation is shown as being about “Low Impact BES Cyber Systems” on the WECC page, but the actual title on the slides says “Low Impact Assets”. Of course, the politically correct way to say this is “Assets Containing a Low Impact BCS” (and Lisa did catch herself and use that phrase a few times), which is a nonsensical attempt to bridge the gap between the two completely different points of view from which CIP-002-5.1 was written (without any clear reconciliation). I’ve written about this sorry mess a number of times, including in this post under the heading “Have an Apple, Adam?”

[v] I much prefer Morgan’s approach to answering hard questions to that of a NERC manager who often addresses industry meetings, and who seems to feel compelled to answer every question that gets raised. This has led him multiple times to say things that later have to be retracted or reworded by NERC. Unless you’re going to say that the opinions you’re expressing are entirely your own (as I do in this blog), you shouldn’t be making statements without confirming them with the organization under whose auspices you’re making those statements.

[vi] I have wondered why the situation is so much different with CIP v5 than it was with v1-v3, where there were certainly some questions but they seemed to be much fewer, and much more contained. I believe the problem is that v5 was much more ambitious, and requires the entity to make judgments about a number of areas that weren’t relevant in CIP v1-3. The bright-line criteria are one example of this, but the biggest example is probably the concepts of BES Cyber Asset and BES Cyber System. Just look at my recent post on “methodology” for BCS identification and classification to see how fiendishly complicated – and ill-defined – the concepts of BCA and BCS really are. I hope to have a post on this topic in the future.

[vii] In fact, I will state unequivocally that there is no way NERC will ever be able to address all of the questions with v5, no matter what time frame you look at.

[viii] In fact, there is a real danger that some entities’ CIP v5 compliance efforts may be entirely for naught if it turns out they guessed the answer to a particular question wrong. Let’s say your entity has only one Medium impact substation, and that the Attachment 1 criterion it falls under is ambiguous. You go ahead and spend a million or so (which isn’t a lot for a large entity’s compliance program, but is huge for a small entity’s) implementing compliance with all the requirements, for that substation. Four years later, you get audited and the auditor casually mentions, “Oh, that criterion was recently clarified by (someone at NERC), and your substation would now be considered Low impact.” Wouldn’t that make you feel wonderful?

[ix] Of course, by CIP v5 I really mean the combination of v5, v6 and v7 standards that entities will actually have to comply with, which I have otherwise called CIP v6.3940.

[x] Even if CIP v5 is declared “open” for a year or so (I’m sure the idea of making it permanently open wouldn’t fly with FERC or Congress), NERC also needs to write a Standards Authorization Request for a complete rewrite of CIP-002-5.1. While the other CIP v5 standards can probably be salvaged with enough interpretation effort, CIP-002 is beyond salvation. It needs to be condemned to the eternal fires and be reborn in a completely new standard. That will take a few years, but at the end CIP v5 will be on a solid foundation. Without that, there will always be questions about whether an entity has properly identified and classified its BES Cyber Systems in the first place, even if there are no remaining questions about the other v5 standards. I called for rewriting CIP-002 in this post.

[xi] At one point in his presentation, Tobias asked how many entities had completed their CIP v5 compliance implementation. He seemed genuinely surprised when nobody – of the 350 people in the room – raised their hand. I would have been astounded if anyone had.

[xii] Here are five examples of fundamental issues that NERC doesn’t even plan to resolve, as far as I know: a) the use of Facilities vs. assets in the bright-line criteria; b) the meaning of “impact the BES”; c) the status of the term “Group of Facilities” (discussed in the CIP-002 Guidance) as it relates to the Criteria; d) whether connectivity has anything to do with whether a Cyber Asset is a BCA; and e) whether entities are advised to group BCAs into different BCSs depending on the requirement (this is implied to be desirable in the Lessons Learned document on “Grouping BES Cyber Assets”, but strikes me – and others – as a recipe for utter chaos). These are all quite fundamental questions that NERC hasn’t even said they’ll produce guidance on (I intend to do posts on all of them, not just the first two, which I have already addressed). I could definitely list a few more, except I’ve already been working on this post all day (fortunately, there’s a huge blizzard going on in Chicago as I write this, and I don’t feel bad that I’m not doing something outside the house).

Monday, January 26, 2015

Two (more) Changes in My CIP-002-5.1 R1 Methodology

A recent post described – at a very high level – my “methodology”[i] for complying with CIP-002-5.1 R1 (which I usually refer to simply as “R1”). When I wrote that post, I didn’t think there would need to be many changes to the methodology. However, I have already made one change in the methodology and now have two more to make – one substantial, one less so.

Besides describing these changes in this post, I will make them in the original post as well (as I did for the first change). In fact, since I know there will be more changes in the future, I will do this from now on: put out a post describing the change, then edit the original post so it reflects it. This makes the original post a “living” document that will hopefully always describe my most recent thinking on R1 methodology.

I. BCS Identification

If you haven’t read the original post (but if so, why are you reading this one?), I’ll point out that my “methodology” is heavily laden with a series of decisions the entity must make in order to comply with R1. Perhaps the most important of those decisions is exactly how BES Cyber Systems will be identified in the first place (i.e. before they’re classified High, Medium or Low impact).

In that post as well as a previous one, I described two primary methods for identifying BCS: “top-down” and “bottom-up”. My post stated that the best practice is to combine the two methods, since I believed that, in all cases, some BCS could be missed if only one of the two methods were used. However, since that post I have heard from two different sources - one a CIP auditor - that the top-down approach doesn’t really buy much in substations, although it does in control centers and generating stations.

The reasoning for this makes a lot of sense: in control centers and generating stations, there are certain well-understood functions that are performed by the asset as a whole; these functions each have systems associated with them. For example, BA control centers almost always have systems including production SCADA/EMS, Outage Management System, ICCP, Historical Data Retention, Operations Engineering Support System, etc. Generating stations have a digital control system, soot blow down system, control air management system, etc.[ii]

The entity only needs to confirm that the loss, misoperation, etc. of any of these systems has a BES impact within 15 minutes; if it does, the system is a BCS. So for these two asset types, starting with the top-down approach is best. Of course, the entity still needs to perform the bottom-up analysis, in which it considers each of the Cyber Assets at or associated with the asset[iii] - that haven’t already been identified as components of BES Cyber Systems through the top-down analysis - to determine whether or not they meet the definition of BES Cyber Asset, including having a 15-minute impact on the BES. Every BCA so identified should then be included in a BES Cyber System.[iv]

Substations are different. Substations don’t inherently perform particular functions – they can be all over the map, and can include some mix of Transmission (in scope for CIP) and Distribution (not in scope) functions[v]. There is no inherent set of functions that most or all substations perform. You really have to look at each individual Cyber Asset and consider whether it has a 15-minute impact on the BES, then perform the rest of the bottom-up analysis. But the top-down analysis isn’t likely to identify BCS that aren’t identified in the bottom-up approach, and therefore doing both analyses doesn’t buy you anything.

However, you may say, “What about the BES Reliability Operating Services (BROS), which are an integral part of the top-down approach? Do we just forget about them for substations?” No. Just because the entity doesn’t use the top-down approach for substations doesn’t mean the BROS don’t come into play in the BCS identification process. Since the heart of the BES Cyber Asset definition is that the loss of the Cyber Asset would “adversely impact” the BES within 15 minutes, a good way to identify BCAs is to consider whether a Cyber Asset has a 15-minute impact on one or more BROS. If so, the Cyber Asset is most likely a BCA.[vi]

So I will revise my R1 methodology post to reflect what I’ve just said. However, I’ve just identified another post that needs to be modified. This is a post I wrote on the meaning of “affect the BES” in the BCA definition. In that post, I stated that there was no point, when doing the bottom-up analysis, to consider the BROS. I said this because I was assuming that all entities would start their BCS identification with the top-down analysis, so they would have already identified all Cyber Assets that fulfilled a BROS - they are components of BCS. Since I’m now saying the top-down approach doesn’t help for substations, this means the BROS should be considered (again, not exclusively), as substation owners/operators identify their BCAs through the bottom-up analysis. I will modify this other post as well.

II. “Transmission Facilities”

In my R1 methodology post (item 3 under Task 2), I indicated that one of the definitions each entity needs to develop is one for “Transmission Facilities”. This term is used in several of the Medium impact criteria, yet even though both “Transmission” and “Facility” are NERC-defined terms, I had heard that trying to combine these two definitions didn’t yield anything very helpful. And I heard this was causing problems for Transmission entities as they tried to sort out Transmission from Distribution cyber assets in their substations. In addition, I heard the new BES definition (which essentially defines Transmission) wasn’t too helpful in sorting things out. I had discussed this issue in a previous post.

However, I have since heard from a couple knowledgeable persons that it really isn’t all that hard to separate Transmission from Distribution cyber assets (and Facilities) in substations, using the new BES definition. Since I haven’t heard any further comments to the contrary, I am officially declaring this a non-issue (leaving only 4,368 known issues with R1 and Attachment 1, by my latest count), and will remove it from the R1 methodology post.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I use quotes here because, as explained in the original post, it is impossible to write down – in a document with fewer words than the Bible – a single methodology for complying with R1; there are far too many branches and options required. But this doesn’t mean NERC entities, with High or Medium impact assets under CIP v5, don’t have to follow any particular methodology when they comply with R1. They have to follow some methodology, and it has to be documented. My post should be seen as more or less a “template” for developing the methodology, although a large part of the contents – various definitions and interpretations – need to be determined and inserted by the entity; there is no way they can be dictated in advance, given the ambiguities and contradictions in the wording of R1 and Attachment 1.

[ii] All of these examples of systems were suggested by the auditor.

[iii] If the asset is a High impact Control Center, the applicable wording is “used by and located at”. If it is a Medium impact Control Center or a Medium impact generating station, the wording is “associated with”.

[iv] The exception to this rule is for large plants (usually coal) that are in scope with v5 because of criterion 2.1. In these, it is usually impossible to apply the true “bottom-up” approach, because of the huge number of devices (sometimes in the tens of thousands) that may meet the definition of Cyber Asset. Since my post on R1 methodology in theory just applied to substations (although I think it also works for generating plants that don’t meet 2.1), I still haven’t addressed the “2.1 plant” methodology. I hope to in a future post.

[v] It occurred to me that this is why CIP Versions 1-3 fit so badly in substations. A Critical Cyber Asset was defined as a Cyber Asset “essential to the operation” of a Critical Asset. Since, strictly speaking, a substation considered as a whole doesn’t perform any particular operations, there really aren’t any Cyber Assets that meet that definition. Version 5 tried to address this issue by writing all the criteria that apply to substations (2.4 – 2.8) with the word “Facilities” in the subject – meaning the lines, transformers, busses, etc. that are located at the Transmission substation. These are what becomes Medium impact, not the substation itself. Of course, many Transmission entities and even Regional Entities seem to be interpreting the word Facilities to mean the substation itself, even though that is almost certainly not what was intended (although as I said in my methodology post, there’s nothing wrong with doing this – as long as you accept that you’ll probably identify more Medium BCS than if you used the pure “Facilities” approach). I have discussed this issue in several posts, including this one.

[vi] Of course, the converse isn’t true: If the Cyber Asset doesn’t have a 15-minute impact on a BROS, it doesn’t mean it isn’t a BCA, since its impact could be in another area than reliability. For example, the fire suppression system in a substation doesn’t fulfill any particular BROS, but were it to fail to operate when needed (in the event of a fire), its failure to operate would presumably have a 15-minute impact (e.g. one or more lines might be tripped because their associated relays burned up).

Friday, January 23, 2015

Are Networking Devices BES Cyber Assets?

There is a discussion going on in NERC circles about whether networking devices should be declared BES Cyber Assets or not. At first glance, it seems almost an open-and-shut case that they should be. After all, the BCA definition includes Cyber Assets whose loss, etc. would impact the BES within 15 minutes. It would seem that a switch that ties together the whole network in a substation or generating station would certainly fit that bill, right?

At least one CIP auditor doesn’t think so. He makes his argument by drawing a distinction between networking devices on the ESP and those that are inside the ESP. For the latter, the argument is very simple (and there was a similar argument in CIP v3): Since the ESP needs to include all routably connected BES Cyber Assets/Systems, if you consider the device (e.g. a switch) on the ESP to be a BCS, then you need to redraw the ESP to include it. Then the switch on the redrawn ESP becomes a BCS, and you have to redraw the ESP again, etc. Ergo, a switch on an ESP perimeter can never be a BCS. In fact, it may very well be an Electronic Access Point.

So how about a switch that’s inside an ESP? There isn’t a compelling logical argument against making this switch a BCA/BCS, but the auditor asserts there’s no compelling logical argument to make it one, either. It’s better to user the simpler approach and not declare it a BCA. Of course, any switch within an ESP (and not otherwise part of a BCS) will have to be a Protected Cyber Asset, and will thus be subject to almost all the same controls as a BCS.

Here’s another question: How about a switch that’s within a BES Cyber System? For instance, if an entity declares their whole EMS is a BCS, should a network switch (again, one that’s not on the ESP boundary) be declared a BES Cyber Asset? As I pointed out in this recent post, if a Cyber Asset is part of a BCS, you don’t need to take the additional step to declare it a BCA (or a PCA). Since all of the v5 requirements apply at the BCS level, a switch will be protected by the standards in any case.

The auditor also does want me to point out that “What we are talking about are traditional networking devices like routers, switches, and firewalls, along with multiplexors, microwave, and the like - basically the LAN/WAN equipment that serves as the communications backbone. Not included are end devices like port servers, terminal servers, Digi devices, and so forth that simply convert the data stream between TCP/IP and serial. Those are not networking devices in the traditional sense and, as end devices that only appear in the LAN, should be identified as BCA if they have a sub-fifteen minute impact on BES reliability as described in the BCA definition.”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Thursday, January 22, 2015

A Consultant Criticizes NERC CIP

Last year, I wrote two posts (here and here) about what I see as a great sport engaged in by many in the press (and the consultants who egg them on): attacking the electric utility industry for real and imagined failings in their efforts to secure their infrastructure against cyber and physical attacks. I have now found another prime example of this sport, this time engaged in by a longtime practitioner, consultant Joe Weiss. I am referring to his recent blog post [i], which makes the case that the NERC CIP standards aren’t making the grid more secure or more reliable. More importantly, Mr. Weiss blames the industry for both developing and circumventing these standards.

I wish to say at the outset that I certainly don’t think all attacks on utilities for not having proper security in place are unjustified. And I certainly don’t think that attacks on the NERC CIP standards are unjustified; indeed, I think I’m listed in Guiness as the all-time leader in number of complaints about CIP version 5. But as I said in the two posts last year, the attacks need to be based on facts, and they need to make sense logically. Most of the points Mr. Weiss makes in his post don’t meet one or both of these criteria. Because these points are ones that have often been raised by others, and because they all have quite interesting implications, I will spend some time addressing all of them.

I also want to point out that Mr. Weiss bases his post in part on a doctoral thesis (publicly available and linked in the post) by Marlene Ladendorff, titled “The Effect of North American Electric Reliability Corporation Critical Infrastructure Protection Standards on Bulk Electric System Reliability”. Some of the “facts” cited by Mr. Weiss come from the thesis; others come from other sources (not all identified). I have not had the time to go through the thesis, so I will stipulate that Mr. Weiss has accurately represented Ms. Ladendorff’s findings.

...and eating it, too

My biggest problem with Mr. Weiss’ post is that he repeatedly tries to have his cake and eat it, too. That is, he bashes the utilities (or the CIP standards) for doing something, then turns around and bashes them for doing just the opposite. He is like the two ladies at a Catskills resort, in an old joke. The first says, “The food here is terrible.” The second says, “Yeah, and the portions are so small!”

1) The second paragraph of his post provides a perfect example of this. He says, “the exclusions in the NERC CIPs provide a road map to attackers as they identify what is in-scope, and just as important, what is out-of-scope and consequently not addressed.” Let’s break this down. First, he’s saying the CIP v5 bright-line criteria (for High or Medium impact assets) give attackers a “road map”. That is, they let them know what the most important assets are so they can presumably attack them. However, in the second part of the sentence Mr. Weiss complains about just the opposite. There, he says the criteria implicitly give attackers a list of assets that don’t meet these criteria, and are therefore not going to receive protection under CIP v5.

Do you see the problem here? He’s saying that attackers will use the BLC to find the best targets to attack (Highs and Mediums) – and will presumably attack them. But they’ll also use the BLC to find the targets that are easiest to attack (Lows - since the requirements that apply to them are much lighter) – and will also attack them. So the “road map” that NERC is giving to the attackers simply says, "Attack all BES assets!"[ii] Some road map.

2) Here’s a more important example. Mr. Weiss alludes at least three times to the fact that some entities literally removed routable connectivity (especially to substations) in order to reduce their compliance burden under CIP v1 – v3 (since Critical Assets that didn’t have external routable connectivity wouldn’t therefore have Critical Cyber Assets)[iii]. I don’t dispute this assertion at all; it is certainly true (although the number of entities that simply put off plans to implement routable connectivity was certainly much higher than the number that literally ripped it out). And it is also quite unfortunate, since there was probably some negative impact on reliability and security because of this practice.

However, later in the post he makes a completely different argument. He says that the requirements of NERC CIP (presumably v5) meant that “utilities with hundreds to thousands of substations will most likely connect their protective systems to external networks (usually over the Internet) to support a compliance requirement that can actually compromise security.” OK, so in the first case, CIP was bad because it gave utility companies an incentive to remove routable connectivity. Now it’s bad because it gives them an incentive to implement that connectivity! Can’t win for losin’, as they say.

3) A third example of having-your-cake-and-eating-it-too: Mr. Weiss complains “Depending on the cost of the fine compared to the cost to install NERC CIP compliance, some utilities have made the decision to pay the fine rather than make the security improvement.” I don’t doubt that there are some utilities who are doing just that, although I also doubt it’s very many and I’m sure in the long run it’s a very bad idea to do that.

Yet he later states, “Since the NERC CIP guidance requires anti-malware and anti-virus protection, some utilities are mandating protective relays to have malware protection even though adding this function will reduce the effectiveness and function of the relay.” So it seems these same utilities who are doing everything they can to avoid compliance are now going way overboard and actually jeopardizing their own operations by taking the requirements far too seriously[iv]! Now, that is devious. No wonder he’s outraged.

Other Items

Most of Mr. Weiss’ other arguments fall apart when you look at them closely:

1) Early in the post, he says “Electric distribution is excluded (majority of Smart Grid falls under this exclusion).” This is a common criticism of NERC CIP, from people who don’t know any better. But that doesn’t include Joe Weiss, so I’m surprised he’d say this. The CIP standards (and all the other NERC standards) only apply to the BES because that’s what FERC has authority over (of course, FERC’s authority is what makes the NERC standards more than just nice guidelines). Electric distribution is the domain of the state PUCs[v].

So what is Mr. Weiss advocating to fix this problem? Do we need to have a single central regulator for all electric generation, transmission and distribution? Lots of luck getting that through Congress. And should NERC and FERC just drop the idea of cyber security regulation altogether until this happens? At least then there would be consistency on both the BES and the Distribution sides: there would be no regulation at all.

2) Mr. Weiss cites an example from the thesis stating that “an exercise was cancelled by (a utility’s) compliance group, citing potential non-compliance issues with one of the CIP standards as the reason. The logic behind the compliance groups’ (sic) action was that if a potential weakness was found, it may (sic) need to be reported and the entity risked receiving a fine from NERC.” I know exactly what Mr. Weiss and Ms. Ladendorff are talking about, and I agree there are probably at least a few legal departments at utilities who take this attitude: we don’t want to find out what we’re doing wrong, because then we’d have to report it.

On the other hand, this is a very short-sighted strategy, not only from a cyber security but from a legal / compliance point of view. If an entity is out of compliance with a NERC requirement (not just CIP, of course), they need to self-report it immediately. If they don’t, and the NERC Regional Entity discovers this lack of compliance (either through an audit or perhaps as part of an Investigation), things will go much worse for the entity than if they had reported it in the first place. By deliberately not allowing non-compliance to be discovered, this legal team is setting their employer up for a much bigger fall further down the road.

I haven’t personally heard of any case where something like this has happened, although I certainly don’t dispute that it may have. This is certainly a strike against the NERC CIP standards, but it is also a strike against any mandatory regulations of any sort. If an entity has to report when it finds itself to be in violation of any regulation, there will always be a few misguided lawyers who think it’s in the entity’s interest not to know about a violation in the first place. This is an argument against any sort of regulation (or laws, for that matter. If I think I’ve misrepresented something on my taxes, should I investigate to find out if that is really the case -at the risk of then having to revise my filing - or should I not bother to look further and hope the IRS doesn’t either? I don’t have a ready answer for that question, but please don’t tell the IRS that); it is not an indictment of NERC CIP in particular.

3) Mr. Weiss summarizes some other examples from the thesis by saying “’some of the transmission owners….are gaming the system in order to prevent the application of the CIP standards.’ To accomplish this, some companies modified their networks to avoid compliance issues with CIP-003 through CIP-009.[vi]”

This sounds particularly devious, doesn’t it? TO’s are modifying their networks to avoid CIP compliance issues! Hmmm…I thought that was what compliance was all about. For example, the standards say (by implication) that your control network(s) shouldn’t be directly connected to your corporate network – so you modify the network by breaking that connection. Is that a bad thing?[vii]

4) Mr. Weiss states (again referring to the thesis), “Participant 2 in her study found that a company had the most sophisticated network protection he had seen. However, NERC staff reviewed their architecture and wanted them to tear it out. It took the company 6 months to convince NERC that this was the best protection they could do for the control systems the company was operating.”

Here, it seems the NERC staff was getting a little carried away in their zeal to enforce strict compliance with the letter of the requirements, and was trying to get an entity to remove a network protection scheme that was the best that could be implemented under the circumstances. This of course is unfortunate, but clearly neither the utility nor NERC can be accused of lack of zeal for doing the right thing in this case. What fault there is seems to be in the CIP standards, and there the fault is that they are too prescriptive. I completely agree they are too prescriptive, but nothing in this quotation squares with the general tenor of Mr. Weiss’ post – namely, that NERC, the utilities, and the CIP standards themselves aren’t doing anything to increase security.

5) Mr. Weiss complains early on that “the ‘brightline’ criteria exclude smaller facilities.” The BLC apply to all BES facilities, as High, Medium or Low impact. I believe what he is trying to say is that the Low impact requirements aren’t rigorous enough for his tastes; if so, he certainly wouldn’t be the first to feel that way. But he needs to say it explicitly, and also say what would be an adequate set of requirements for Low facilities, consonant with the idea that we can’t devote the entire GNP to complying with NERC CIP.

6) There is one paragraph of the post that I simply don’t understand: “Another example of the inconsistency of the NERC CIP guidance is that when it comes to grid reliability (sic) is the use of ‘black start’ facilities. Black Start facilities are those necessary to restart the grid after a complete grid outage. This function is considered critical by grid planning and operations organizations as well as organizations within NERC. During the review of the NERC CIP Revision 5 process, ISO New England raised a concern that adopting a new requirement for specific controls for Low Impact assets could have unintended consequences, such as the withdrawal of black start resources. This would make the grid less reliable.”

What is Mr. Weiss trying to say here? I at first thought he was saying it was bad that blackstart facilities had been removed as Medium (and made Low) impact in the BLC. But it now seems to me that he may not know that they were removed (even though that happened three years ago, during the drafting process), and he seems to be arguing that forcing blackstart assets to meet Medium requirements means that more will be withdrawn, thus negatively impacting “reliability” (although not having blackstarts doesn’t actually impact reliability, since blackstarts don’t prevent outages. It does impact resiliency, since blackstarts are needed to rapidly recover from a widespread outage).

And if Mr. Weiss does know that blackstarts were removed from the Medium criteria (as I said, the wording is ambiguous) and made Lows, then I don't understand his reporting of what the New England ISO supposedly said: that placing too onerous requirements on Lows means that blackstarts will be withdrawn. The way CIP v5 works now, every BES asset (with at least one BES Cyber System) is in scope as either High, Medium or Low. If the Low requirements prove too onerous for blackstarts, then they will have to be removed for all Low assets - meaning we'll go back to just the Low requirement in the original CIP v5 (which FERC was so unhappy with): there must be four policies in place at each Low asset. Is this what Mr. Weiss is advocating?

7) Mr. Weiss states, “Some of the security hardware can affect control system performance. A NERC report identified that a device locking tool used to meet NERC CIP requirements caused a disturbance that resulted in the loss of SCADA services. This is obviously making the grid less reliable and secure.” What is this saying? It seems to be that some device manufacturer developed a device locking tool that actually had negative effects. OK, whose fault is this? The utility’s? NERC’s? The CIP standards’? It seems to me he should file his complaint with the company that made the device.

Alternatively, whatever requirement the locking tool was addressing could just be removed from the standards, along with every other requirement that might possibly lead to implementation of measures that could cause a "disturbance". This would probably result in 10-20% of the CIP v5 standards being removed. Is this what Mr. Weiss wants?

8) Mr. Weiss’ concluding argument states, “Perhaps the most important point is there have already been four major cyber-related electric outages in the US (more than 90,000 customers). If the NERC CIPs were fully implemented, they would not have prevented any of these outages.” First off, I would very much like to hear about these four outages. I certainly never have heard of them before, and Mr. Weiss doesn’t point to any further information.

Second, once Mr. Weiss has given us information on these outages, I would like to know how he draws his conclusion that NERC CIP wouldn’t have prevented these outages. Of course, when he says the outages are “cyber-related”, he’s not necessarily saying these were the results of actual cyber attacks or malware. For that matter, the 2003 Northeast blackout had a couple “cyber-related” causes that NERC CIP wouldn’t have prevented either. This certainly doesn’t mean that CIP is ineffective.

Summing Up

You might get the idea that the only thing I like about Joe Weiss’ post is the font it appears in. Believe it or not, I regard the post as a flawed one that could actually have had some validity. He makes some perfectly legitimate points about entities removing connectivity to avoid having to comply, about Legal departments not wanting to see any evidence of non-compliance, about Distribution not being included, etc. But in his zeal to strike out against NERC, most utilities, and above all the CIP standards, he has simply thrown any and all arguments that come to mind into a single pot, with the hope that they’ll magically form a coherent stew. They don’t.

Note 1/23: This post originally had a sentence mentioning Senator Joe McCarthy. I realized this morning that, while it was not my intention to compare Mr. Weiss to McCarthy and the wording didn't state that, some readers might have drawn that inference. I sincerely apologize to Mr. Weiss for having included that sentence in the first place.

Note 1/25: I just modified the section marked "6)" above. It tries to make sense of Mr. Weiss' paragraph regarding blackstarts. When I wrote it, the only possible interpretation I could see was that Mr. Weiss didn't know blackstarts were no longer included in the criteria for Medium impact. However, I just realized this may not be the case, and Mr. Weiss was actually arguing for lesser requirements on Low impact assets. That doesn't make sense either (especially with what I have heard to be his opinion on the Low requirements), but I want to show I considered that possibility as well.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I want to thank Bob Radvanovsky for posting a notice of this on LinkedIn.

[ii] Of course, the criteria don’t list individual assets, nor do NERC or the regions publish such lists; the attackers will presumably have to go elsewhere to find out where to direct their attacks.

[iii] Quoting Mr. Weiss, who quotes the thesis, “Some entities were trying so hard to keep equipment out of scope that they spent money to ‘rip out fiber and CAT-5 [networking cable] and replaced it with serial [cable] to get away from routable protocols’ that would have brought networks into the compliance scope. Entities calculated that it would be cheaper to replace fiber and CAT-5 network cable with serial cable in order to remove equipment from the CIPs scope. Doing so eliminated the requirement to comply with CIP standards for those networks and equipment.”

[iv] CIP v5 makes it very clear that there is no requirement to load anti-malware software on a device that isn’t capable of loading or using it. In fact, in v5 the entity doesn’t have to take a Technical Feasibility Exception for this, as they did in v3.

[v] Actually, the PUCs only have authority over the IOU’s in their states, not the coops and municipals. So you could say that nobody regulates those entities, other than presumably their members or citizens.

[vi] The sentence in single quotes is presumably from the thesis. The second sentence is presumably Mr. Weiss’s.

[vii] There theoretically could be network modifications that might be taken to serve no purpose other than avoiding having to comply. But Mr. Weiss doesn’t say that is the case, and my brief review of the other examples in the thesis that he cites didn’t turn up any such modifications other than two cases which he addresses separately (and which I also discuss in this post). However, my point remains: entities are supposed to modify their networks to comply with the CIP standards. There is nothing at all sinister about modifications per se.