Wednesday, December 31, 2014

Here’s Your CIP-002-5.1 R1 Compliance Methodology!*


This is the third in a series of posts on the serious problems with CIP-002-5.1 R1, and what entities and NERC need to do to deal with them.  The first post is here.  The exciting conclusion - in which I chide NERC for their mishandling of these problems and say what I think needs to be done to address them - is here.

June 11, 2015: I'm afraid I've come to the conclusion that there can be no definitive guidance developed for complying with CIP-002-5.1 R1; there are just too many contradictions and ambiguities in the standard.  I would like to see the standard rewritten, but since that is a multi-year process, it obviously won't help entities preparing for compliance next April. I will continue to discuss different aspects of the R1 (and Attachment 1) compliance process, such as in this post.  But my hope to keep updating this post as a kind of comprehensive guide for R1 compliance is officially ended.  There is probably nothing in this post that is simply wrong, but keep in mind that my ideas on how you should comply with R1 have moved beyond what's here.

*I’m worried the FTC lawyers will be contacting me any minute to hit me up for pulling a bait-and-switch.  The fact is, I have no intention of telling you what your CIP-002-5.1 R1 and Attachment 1 (“R1” for short) compliance methodology should be.  That’s because I’ve become convinced that it is impossible to write down a single procedure for R1 compliance that takes up something less than the length of War and Peace; or if you prefer, I don’t think you could document the process with a diagram that takes up less than the size of my living room.  And this leads to a couple important conclusions.  For the whole sorry story, read on.

A brief history:  I’ve made three main efforts to come up with a single methodology for R1 compliance.  The first was when I rewrote R1 as a comment to FERC in June 2013.  I cringe when I read it now, since a lot of it is simply wrong – although I still think it’s better than the requirement as currently written.  The second was at the very beginning of 2014, when I thought I finally knew exactly how to comply with R1 and I laid it all out in three posts (the first is here).  Yet within three weeks, I knew I’d missed a couple really important nuances, most notably why “Facilities” was the subject of criteria 2.3-2.8.

My most recent attempt was last April, when I wrote Part 1 of what was to be a couple posts that would lay out for once and all exactly how to comply with R1 for substations (and substations, of course, account for 95% of the v5 compliance effort).   I don’t think there was anything in that post that was strictly wrong; however, I gave up on it when it became clear that criterion 2.5 – and probably others as well - had to have a different compliance methodology associated with it from the other criteria; so there was a whole new layer of complexity I hadn’t realized.

I’ll be honest:  Since that time, I’ve just kept on discovering more layers of complexity as I realize new problems with R1 (for a list of the 21 primary problems I see now, see my last post).  One of the biggest new sources of complexity is the various areas where an entity needs to “roll your own” definitions and interpretations.  You could think of each of these areas – e.g. the definition of “programmable” – as being its own “subroutine”[i].  In other words, you need to roll your own definition of programmable and insert it at the appropriate place in the overall R1 methodology.  The same for your interpretation of “affect the reliable operation of the BES”, your definition of “associated with”, etc.

So it’s not like I’m saying there could never be a complete methodology for R1 compliance, although it might take something approaching the remaining lifetime of the universe to put it down on paper.  But the real problem is something that I first learned in my statistics classes in college: Uncertainty is multiplicative.  If you have only a 50% certainty that a particular sub-methodology (such as the definition of “programmable” you’ve just developed) is correct, and you have 9 other sub-methodologies each with only a 50% certainty, the percent of certainty you have about the whole string of processes together is 0.09765625% (i.e. less than a tenth of a percent).  This means you simply have no clue whether the whole thing makes sense or not.  Of course, I think there are a lot more than ten major areas of uncertainty in complying with R1; even this very low level of certainty is probably too high an estimate for any methodology you might develop. 

So I’m not exaggerating at all when I say there is simply no way to write a single methodology for R1 compliance that has some reasonable probability of being correct.  The likelihood of being able to do that is similar to the proverbial likelihood of a bunch of monkeys, pounding on keyboards, being able to recreate the works of Shakespeare.

Now that I’ve established the fact that I lied to you in the title, what is the point of this post, anyway?  It’s quite simple: The fact that no comprehensive R1 compliance methodology can be written down doesn’t mean you’re off the hook for complying.  Regardless of the difficulty, every entity does need to come up with a documented process for R1 compliance, and to follow that process in identifying their Medium and High impact BES Cyber Systems, as well as their Low impact BES assets. 

So in this post I’m going to try to lay out, on a high level, the main tasks that need to be taken to comply with R1, as well as what I think are the primary considerations that need to be addressed under each task.  There is no way I can put together even a reasonably detailed methodology that will work for your organization.  Hopefully, you can use this post to guide your effort to put that methodology together – perhaps using the help of a knowledgeable consulting organization such as the one I work for.  Where I’ve discussed a topic in a previous post, I’ll include a link.

Note that the discussion below is geared toward substations, since as I said above, I believe literally 90-95% of the CIP v5 compliance effort will be for Medium impact substations.  I believe the discussion also applies to Medium impact generating plants that meet criteria other than 2.1.  The methodology for those plants is quite different from what I outline below, mainly because of the huge numbers of devices that may meet the definition of Cyber Asset in large coal plants, as well as because of the “exemption” for BES Cyber Systems that don’t affect 1500MW.  I’ll hope to address those plants in a future post.

Task 1: Preliminary Identification and Classification of Substations and Facilities
The first step of the process is to decide which substations may be Medium impact – or more correctly, which substations are likely to contain Medium impact BES Cyber Systems.  Criteria 2.4 to 2.8 are the ones that can potentially apply to substations.[ii] 

The next step is to make a decision that every owner of Medium impact substations needs to make: whether to classify BCS based on the asset (substation) or on the Facility (line, transformer, etc) they’re associated with.  There are a couple key questions that your entity needs to answer, in order to decide which route they will take.  If you decide to classify based on substations, you will probably end up identifying more Medium impact BCS than you would if you classified based on Facilities.  On the other hand, there may be more work involved, and potentially more cost if networks need to be separated.  I will generally refer to “substations/Facilities” below; you need to choose this to mean one of the two terms, depending on your answer to this question.

If you do choose the Facilities route, you need to identify and list the Facilities at each substation.  To simplify things going forward, you can remove from the list Facilities that don’t have any cyber assets associated with them, since they obviously won’t have any BES Cyber Systems (the same applies to substations that don’t have any cyber assets associated with them, if you’re going the “substations” route).  Finally, you need to identify the Medium impact Facilities using Criteria 2.4 – 2.8 (and you should document a methodology for doing this, both to guide the people who do this work, as well as to show the auditors how you did it). 

A final step in this task is to develop a methodology for dealing with substations that are jointly owned or operated with one or more other NERC entities.  NERC has promised this will be one of the upcoming Lessons Learned, but you probably can’t wait for that to come out (of course, when it does, it will just be a draft for comment.  And even when it’s finalized, it won’t be binding on auditors or entities.  This applies to all the Lessons Learned[iii]).

Task 2: Inventory of Cyber Assets
This task (and the remaining tasks except for the last two) only needs to be taken for substations/Facilities that meet a Medium impact criterion.  You need to identify all of the electronic devices “associated with” the  substation/Facility, that meet the definition of Cyber Asset: “programmable electronic device”.  Of course, there are three important details embedded in this seemingly simple task:

  1. You need to come up with a definition of “programmable".  This word is the heart of the NERC definition of Cyber Asset, but isn’t itself defined.  A Lessons Learned document was recently released in draft form by NERC.  You need to consult this, but keep in mind that it isn't mandatory you follow it - however, you do need to at least document why you didn't.
  2.           Jointly Owned Substations – Your entity needs to decide how it will allocate responsibility for BES Cyber Systems with any joint ownership partners.
  3.          “..affect the reliable operation of the BES” – This undefined phrase is an important part of the definition of BES Cyber Asset.  How your entity defines this phrase will have an impact on the number of BCAs (and BCS) you identify.  See this post.
Once you have developed addressed these three items, you need to inventory every device that could possibly be a Cyber Asset associated with a Medium impact substation.  The inventory needs to include identification of the Facility/substation with which it is associated, as well as the asset (whether that is a substation or another of the six asset types in R1, like a control center) where it is actually located (and if you’re going the “Facility” route described above, you’ll first need to inventory the Transmission Facilities located at each Medium substation).

Task 3: “Top-Down” Identification of BES Cyber Systems / BES Cyber Assets
I have written extensively about the two main approaches to identifying BCS/BCA: “top-down” and “bottom-up”.  Until recently, I thought that entities should combine the two approaches, for all types of assets.  It combines top-down and bottom-up, since not using both approaches can lead to under- or over-identification of BES Cyber Systems.  However, I've recently been persuaded that, for substations, only the bottom-up approach is needed; it remains a good idea for generating plants (except those that meet Criterion 2.1.  As I said above, these need a completely different appraoch) and perhaps for Control Centers.

Since, this post is partly about generating plants, I'll start by describing the top-down approach.  For generating plants (other than 2.1 plants, of course) I think it is better to start with the top-down approach, then use the bottom-up as a check on it (you’ll see this will reduce the work required, since starting with bottom-up will result in your doing some unnecessary classification work).  It would be nice if NERC, or for that matter the regions, provided some guidance on this issue.  But that is pretty unlikely at this point, so you’ll need to decide for yourself whether to use just one or both approaches. 
 
The first step for applying the top-down approach is to develop a methodology for it.  While the Guidelines and Technical Basis in CIP-002-5.1 does provide a good overall description of how the BES Reliability Operating Services (the heart of the top-down approach, of course) can be used to identify BES Cyber Systems, this is far from being a complete methodology for this task.  You need to develop that methodology first, both to guide whoever will be doing this and to show the auditors how you came up with your list of BCS. 

The methodology you develop should show how you will apply the BROS to develop a list of systems that are potentially BCS.  Very briefly, “potential BCS” are systems that support one or more BROS and are associated with a Medium impact substation/Facility.  For each of these potential BCS, you then need to ask, “Does this affect the reliable operation of the BES within 15 minutes?”  If the system doesn’t do that, it isn’t a BCS; if it does, it is one.  This is because a BCS is made up of BCAs, and the 15-minute criterion is part of the definition of BCA; what doesn’t meet this definition isn’t a BCA, and a system with no BCAs isn’t a BCS.

At this point, you should make a list of the component Cyber Assets within each BES Cyber System, but you don't need to actually classify these into BES Cyber Assets or Protected Cyber Assets (as I originally thought - see this correction to this post.

Task 4: “Bottom-Up” Identification of BCS
Whether you're dealing with a substation or a generating plant, you do need to do the bottom-up analysis to identify BCS.  In the bottom-up approach, you start with the definition of BES Cyber Asset, apply it to each Cyber Asset, and then determine which Cyber Assets are BCAs (and remember to use the "interpretation" you did of "affect the reliable operation of the BES" in the BCA definition, as described in Task 2 above).  Finally, you aggregate BCAs into BCS, so that every BCA is included in at least one BCS (and a BCS can consist of a single BCA).

There is a big difference in how you apply this process to generating plants vs. substations, though. For plants, you only need to apply the bottom-up analysis to the Cyber Assets that haven't already been included in a BES Cyber System as a result of the top-down analysis.  Since these are already in scope for v5, you don't need to spend your time deciding again whether they're in scope.  For substations, since you aren't doing the top-down analysis, you need to apply the bottom-up analysis to every Cyber Asset you've identified.

For both substations and plants, you now need to include all BES Cyber Assets in at least one BES Cyber System.  But there is a difference in how you do this for the two asset types.  For substations, you simply create as many BCS as are required to include every BCA.  But for generating plants, since you have already identified a number of BCS in the top-down analysis, you can always look first to those BCS when you're trying to find "homes" for the BCAs you've just identified in bottom-up.  You can then create new BCS to "house" the new BCAs (and of course, you always have the option of making every BCA its own BCS).

Of course, when I talk about combining BCAs into BCS, I'm not saying how that should be done. This is because there are no "directions" for this in CIP-002-5.1, the NERC Glossary definition of BCS, or the CIP-002 Guidance and Technical Basis.  NERC did publish a draft Lessons Learned document that at least gives you some ideas on what you can do - but in the end, this is something that's up to the entity to determine, with the idea that some choices will lead to a much easier CIP v5 compliance job than others will.

Speaking of this Lessons Learned document, I think it provides several good ideas and one spectacularly bad one: the idea that you could group BCAs into BCS differently in order to comply with different requirements - i.e. you would comply with one requirement using one set of BCS and another requirement with a potentially different set.  There is nothing illegal about doing this, but in my opinion it would create all sorts of problems in other parts of the v5 compliance effort.  I'll hopefully have a post out on this soon.

For a substation, the list of BES Cyber Systems you develop in this step is your final list.  However, for generating plants - since you have already developed a BCS list as part of the top-down analysis - you need to combine the bottom-up with the top-down list.  This, then, is the list of BCS that is the final outcome of R1.

Task 5: Draw Preliminary ESP
You may wonder what drawing the ESP has to do with complying with CIP-002-5.1 R1.  After all, that requirement is solely concerned with identifying and classifying BES Cyber Systems, not PCAs and certainly not ESPs.  The reason I have this task here is that, as we get into classifying BCS as Medium or High vs. Low impact, it is important to know which of these are networked with which others.  This is important not from a strictly compliance point of view but more from an operational point of view: if you don’t know where your ESP is drawn, you don’t know whether a Cyber Asset – that isn’t a BES Cyber Asset – is a PCA or not.  As a consequence, you may make decisions about classifying BCAs/BCSs that will result in your over-classifying Cyber Assets as BCAs.

There is a big difference between drawing the ESP in CIP v5 vs. in v3.  In v3, you needed to include all Critical Cyber Assets in an ESP.  In v5, you just need to include those BCS that are connected on an internal routable network (this is of course a different issue than whether an asset has external routable connectivity).  Other than that, you should draw your ESP(s) just as you did in V3, trying to include only those Cyber Assets that need to be in it, and making networking changes to reduce their number as much as possible.

Task 6: Classifying BES Cyber Systems
We now come to probably the most important task, and the only one that is explicitly called out in R1.  Each of your BES Cyber Systems needs to be classified according to the Facility/substation with which it is associated.  If the Facility/substation is a Medium impact, the BCS will be Medium; if it is Low, the BCS will be a Low.

There is an exception to this rule in the case of relays, located in an otherwise Low impact substation, that are associated with a line that meets criterion 2.5 as Medium impact (i.e. “far-end” relays in a transfer-trip scheme).  NERC’s recent Lessons Learned document on this topic, and their previous pronouncements, seem to make clear that such relays will not be Medium impact but Low; this is because of the specific wording of criterion 2.5 (thus, this provision only applies to that criterion). 

I need to point out that, IMHO, if you are taking the interpretation that criteria 2.4 – 2.8 apply to entire substations, not to particular Facilities in those substations (as discussed at the beginning of this post), then it isn’t clear that this “exemption” will apply to your far-end relays.  The Lessons Learned document and the wording of criterion 2.5 make it quite clear that this exemption applies to the Facility – i.e. a line between 200 and 499 kV – not to the substation itself.  So if you interpret 2.5 as classifying an entire substation[v] as Medium impact, then strictly speaking you should classify your far-end relays as Medium BCS.  However, my guess is that no auditor would issue you a PV for calling the far-end relays Low impact, at least if they didn’t want to come out to find their tires had been slashed. 

You finish this task by listing your Medium impact BES Cyber Systems, and the substation/Facility with which they are associated.

Task 7: List of Low Impact Assets
The final list you need is one of Low impact assets (although for not-very-good reasons they are called “assets containing Low impact BES Cyber Systems”) in R1.  This will of course be all of the Transmission substations that aren’t Medium impact.  However, if some of the Medium substations do contain Low BCS (meaning you identified BCS based on the Facility they’re associated with, not the substation – this can then result in a mixture of Medium and Low BCS at the same substation), you need to list these as Lows as well.

Task 8: Distribution Provider Assets
There are many NERC entities that are registered as both TO or TOP (who thus could have Medium impact substations) and Distribution Provider.   Section 4.2.1 of CIP-002-5.1 lists four types of assets, owned by some Distribution Providers, which are in scope for CIP Version 5.  In other words, if an entity is registered as a DP, it needs to treat any of these assets that it owns as in scope for CIP v5, even though they are Distribution assets and wouldn’t otherwise be subject to CIP.  Each of these assets needs to be added to the Low impact list, since none would meet one of the Medium criteria.

And Now, the Moral of Our Story
The purpose of this post has been twofold.  First, it has hopefully at least given you an idea of everything I currently see that needs to be included in a CIP v5 methodology, so you can use it as a completeness check for your own methodology.  But more importantly, I hope it has shown you (with "you" meaning NERC entities, FERC, the NERC regions and NERC itself) how serious the problems are with this requirement.  If you can't develop a defined methodology for complying with a requirement, you simply have a requirement that can't be complied with in any meaningful sense of the word; and that's what R1 is.  More on this in the next post, which brings this four-part series to its exciting conclusion.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.


[i] The fact that the best way I can express this idea is to use a term from the days when programming involved spending a lot of time typing punch cards perhaps tells you the last time I did any serious programming (in FORTRAN).  I’m told you no longer have to use punch cards to write programs, which I’m glad to hear.

[ii] Criteria 2.2, 2.9 and 2.10 could also identify substations that contain Medium BES Cyber Systems, although you need to make some modifications to the remaining Tasks to properly deal with these criteria.  Of course, I'm completely glossing over the fact that technically none of the criteria apply to substations.  They apply to Facilities that may be found at substations.

[iii] In fact, the term Lesson Learned is quite inappropriate here, since the documents NERC is coming out with address topics that nobody has had to address so far, meaning no lessons can yet have been learned.  But “Lessons Learned” is an established category of documents with NERC, which provides a convenient fig leaf covering the fact that these are really quasi-Interpretations (capital I), and thus pushing the bounds of legality in the NERC world.

[v] In my opinion, to say that criteria 2.4 – 2.8 apply to substations, not Facilities, is to ignore the clear meaning of the words in those criteria.  The reason why you might nevertheless want to do this is that it will make your job of classifying BES Cyber Systems easier, although it will most likely increase the number of BCS that you end up classifying as Medium impact.

Thursday, December 18, 2014

Interpretation and Definition Issues in CIP-002-5.1 R1 and Attachment 1


This is the second of a series of four posts on the serious problems with CIP-002-5.1 R1, and what NERC entities and NERC need to do to address them.  The first post is here.  The next post is here.

In the at least 30 posts I have done on problems with CIP-002-5.1 R1 and Attachment 1 (hereinafter “R1”), I have identified many problems with the wording in that document.  However, I’ve never gathered these together in one post.  I am now doing that, both for the sake of clarity and to support a couple new posts I’ll be doing very soon.

This list is important because NERC entities with Medium and High impact assets need to get started very quickly – if they haven’t already – on developing their final lists of cyber assets in scope for CIP v5, and they can’t do that without having some resolution to the issues listed below[i].  Unfortunately, none of these issues have been finally resolved – that can only done by rewriting R1, or by capital I Interpretations, which take 2-3 years.

Please note:

  • This is far from being a complete list of problems with R1.  For one thing, there are a whole host of issues with the bright-line criteria, since those criteria don’t seem to fit any asset very well.  I’m sure you could easily more than double this list by including all of those issues.  And I’m sure there are more problems in the other parts of R1 as well.
  • All of the issues in this list relate to Medium impact Transmission substations, since it is for these assets that the bulk of the CIP v5 effort will be expended.  There are other issues that relate to control centers, generating stations, etc. that I haven’t included in this list; I hope to add those at a later date.  On the other hand, all entities subject to CIP v5 compliance should find this list useful.  Most of the items on this list apply to all types of BES assets, not just substations.
  • I have addressed some of these issues in previous posts; I include links in those cases.  I hope to do future posts on some of the other issues.
  • FWIW, NERC has included a couple of these issues on their list of planned Lessons Learned and FAQs.  If we’re lucky we’ll see a draft for comment of these documents within a few months.  But my guess is there aren’t too many entities that will feel comfortable waiting a few more months to identify their cyber assets in scope for v5, while the compliance date remains 4/1/16.  If you’re still waiting to do this, you’ve already waited too long.  You need to get moving, even though that means taking these issues into your own hands to resolve.
  • My suggestion is that, for each of the issues below, entities should decide how they will interpret each one (in conversation with their NERC Regional Entity, if possible) and document it – then go about identifying their cyber assets in scope.  Of course, any guidance NERC has provided will be helpful, as will any advice from the Regional Entity.  But remember, the only mandatory “guidance” is the wording of the standards themselves, along with any capital “I” Interpretations that NERC and FERC may approve.  The standards aren’t going to change in the next few years, and there will be no Interpretations available for at least 2-3 years.  I don’t advise anyone to wait for either of these things to happen, before they start to become compliant with CIP v5.

Here’s the list:

1.       The beginning of Section 4.2 of CIP-002-5.1 says “…the following Facilities, systems, and equipment owned by each Responsible Entity in 4.1 above are those to which these requirements are applicable…”  Yet R1 talks about “BES Cyber Systems” as being in scope and also discusses six types of “assets”.  Attachment 1 discusses things like “control center”, “generation”, “reactive resources”, “Transmission Facilities”, SPS, RAS, and “system or group of Elements”.  What is the relation, if any, between all of these things and “Facilities, systems and equipment” in 4.2?  And if there is no real relation, why is this wording in 4.2?
I can see that “Facilities” might be taken to roughly correspond to the “big iron” referenced in the bright-line criteria (i.e. roughly “assets” and true “Facilities”).  I can also see that “systems” might refer to the BES Cyber Systems that will be in scope for v5.  But “equipment”?  That sounds more like monkey wrenches and forklift trucks.  Is that really in scope for v5?  Does the entity need to come up with a list of all the “equipment” they own and decide what impact it has on the BES?  I’m sure they don’t, but that would be a valid interpretation of this section.
2.       Subsection 4.2.2 seems to narrow “Facilities, systems and equipment” down by saying that, for all entities listed in 4.1 except DP’s, what is in scope for them is “All BES Facilities”.  If the SDT hadn’t capitalized “Facilities”, this would be quite easy to understand, since lower case “facility” can generally be thought to be any of the “big iron” to which CIP v5 might apply, including control centers, substations, generating stations, etc.  However, the fact that Facility is capitalized means it’s a NERC defined term.  If you look it up in the NERC Glossary (and also look up Element, which is a key part of the Facility definition), you’ll see that a Facility has to have terminals and presumably be operated at high voltage.  Do you know any control centers that have terminals and are operated at high voltage?  I don’t either.  This means that all control centers are out of scope for CIP v5!  I’m sure all the big BA’s will be pleased to hear this.[ii]
3.       There’s another “get out of jail free” card embedded in the quote from Section 4.2 in item 1 above.  Note that the “Facilities, systems and equipment” need to be “owned by” the Responsible Entity, in order for them to be in scope for CIP v5.  So to eliminate your CIP compliance burden, how about selling your equipment – or even the asset itself – and leasing it back?  Businesses do this all the time as a financial strategy.  There you go: You don’t own anything, so you don’t have anything in scope for v5!
4.       Of course, the whole point of R1 is to identify and classify BES Cyber Systems.  What is the first step in that process?  If you restrict yourself to the wording of the requirement itself, the only thing you have to go by is R1.1 – R1.3, since they constitute the entire actionable part of the requirement.  1.1 and 1.2 tell you to respectively “Identify each of the high impact BES Cyber Systems….” and “Identify each of the medium impact BES Cyber Systems…”  But how do you do that?  There is nothing in the requirement to guide you, other than the definitions themselves.  And you have to work backwards.  Since you’re told BCS are your target, you need to read the BCS definition first; of course, that references BCAs, so you then need to read that definition; that references Cyber Assets, so now you need to read that definition. 
Why couldn’t these three crucial steps have been each explicitly stated in R1?  Better yet, why couldn’t they have been broken up into three or four separate requirements, as was the case in CIP v1 – v4?  The whole process would have been much easier to understand if this had been done; plus the whole process would have been a lot less susceptible to confusion, as shown below.
5.       The first step for identifying BES Cyber Assets/Systems is to identify Cyber Assets, which are defined as “programmable electronic devices.”  But what does “programmable” mean?  This is on NERC’s list to address, of course, but many entities have decided they can’t wait for NERC to do something to start their BCS identification process.  These entities have developed and documented their own definition (a number of entities – especially owners of large generating stations – did this last summer.  They had to get going on their v5 compliance process then, if they were going to have a good chance to meet the 4/1/16 compliance date).
6.       The definition of BES Cyber Asset includes the phrase “affect the reliable operation of the BES”.  What does this mean, and how do we measure it?  It’s safe to say that no cyber asset has been installed in a substation, control center or generating station purely because it looks nice.  They can all be said to affect the reliable operation of the BES in some way, albeit small.  So what distinguishes BCAs from other cyber assets?  Again, entities that can’t wait for this issue to be clarified by NERC (and NERC hasn’t even listed this as a topic they’ll address in the Lessons Learned) need to develop and document their own interpretation of this phrase.
6.5    In order to classify a BES Cyber System as Medium or Low impact, you need to know which substation or Facility (see below) it is "associated with", since Section 2 of Attachment 1 says that Medium BCS are those that are associated with any asset/Facility that meets one or more of the Medium criteria.  But "associated with" is not defined, nor does NERC currently plan to define it. Each entity needs to develop its own definition (although EnergySec has developed a white paper that discusses this).  It could be an operational definition, stating for example how to determine which Facility or substation a relay is associated with.
7.       Once you’ve identified your BES Cyber Assets, how do you get to BES Cyber Systems?  The definition of BCS is “One or more BES Cyber Assets logically grouped by a responsible entity to perform one or more reliability tasks for a functional entity.” While the entity has complete discretion on how to do that grouping, some groupings may be more efficient than others, depending on the environment.  Fortunately, NERC does have a good Lessons Learned document on this question; but again, it would have been nice if this step had been explicitly called out, rather than implicitly included – along with about four other steps – in the single word “Identify” in R1.1 and 1.2.  R1’s use of very compressed meanings works very well if you consider it haiku poetry, but not well at all if you consider it a requirement that in theory can carry million-dollar-a-day penalties for violation.
8.       In any case, through applying three NERC definitions (and adding a couple of our own), we have now come up with a list of BES Cyber Systems, staying strictly within R1 itself.  But in the Guidance and Technical Basis section of CIP-002-5.1, there is a lengthy discussion of the BES Reliability Operating Services (aka BROS), including a description of how they can be used to identify BES Cyber Systems; this is what I have called the “top-down approach” to identifying BCS (what we just did above is the “bottom-up” approach). 
However, the BROS are nowhere referenced in R1 (or the BCA/BCS definitions) itself[iii].  What place should they play in identifying BES Cyber Systems, vis-à-vis the “bottom-up” approach described above?  Should an entity use both approaches and then combine the results, in order to make sure they do not over- or under-identify BCS?  This question isn’t raised, let alone answered, in CIP-002-5.1.  Yet it is very important.  If the entity uses just one approach rather than the other, there is a big risk of either under- or over-identifying BES Cyber Systems.  And another consideration: the only approach that’s actually required by R1 is the bottom-up one (although as I’ve just said, that “requirement” is purely implicit in the definitions of three phrases, not overtly stated).
9.       What is the role of the six asset types (control centers, etc) listed in R1?  Are they meant to be the types of assets that are “run through” the bright-line criteria to determine which are High, Medium or Low, or are they the locations at which BCS can be found?  If the former interpretation is chosen, a number of wording conflicts result[iv].  If the latter interpretation is chosen, it needs to be made clear in your R1 compliance methodology that, even though BES Cyber Systems associated with a Medium impact substation can be located outside of the substation itself, they have to be located at one of the six asset types; otherwise, they need to be treated as remote users (I discussed this question in this very long post, under the section Questions of Scope, about 5 or 6 paragraphs down.  Also in this post, in the section entitled “The Auditor’s Methodology”).
10.   Criteria 2.4 – 2.8 apply to Facilities, not assets.  It is clear that substations are not Facilities.  Rather, Facilities are lines, transformers, busses, etc.  Yet the regions seem to differ in their interpretation of “Facilities” in these criteria.  SPP makes clear that Facilities are the lines, etc; NERC also indicates that.  However, some regional auditors (including Joe Baugh of WECC, in his presentation on CIP-002-5.1 in September) have indicated that “Facilities” means “substations” in these criteria.  Which should it be?  If “Facilities” means lines, etc, there will potentially be a lower compliance burden for Transmission entities, since BES Cyber Systems at a “Medium” substation, that are not themselves associated with a Medium Facility (line, etc) will be Low impact.  For instance, at a 500kV substation that falls under criterion 2.4, relays associated with a 230kV line will be Low impact, not Medium.  Only the relays associated with the 500kV line(s) will be Medium impact.
11.   Criteria 2.4 – 2.8 refer to “Transmission Facilities” as being in scope.  This is to distinguish the lines, transformers, etc. that are associated with Transmission from those that are associated with Distribution; this is important, since in many substations both Transmission and Distribution Facilities are present.  However, there are many questions that arise when it comes to actually separating the equipment out.  There needs to be a definition of Transmission Facilities that entities can use to distinguish the two types of Facilities. 
12.   Medium impact BES Cyber Systems are “defined” in Attachment 1 as those that are “associated with” Medium impact Facilities (in criteria 2.4-2.8), meaning they don’t have to be located at the same substation as the line, breaker or transformer they’re associated with.  However, according to NERC’s recent Lessons Learned document, relays that are associated with a Medium impact line - under Criterion 2.5 - through a “transfer-trip” scheme, but which are themselves located at a Low impact substation (i.e. so-called “far-end relays”), are Low impact. Will this exception apply in other cases where systems (like relays) associated with a Medium impact substation or Facility are located at a Low impact substation[v]?
13.   There are a host of issues that come up regarding equipment located in shared substations.  NERC has promised a Lessons Learned document on this question.  Entities that can’t wait for that will have to “roll their own”.
14.   If a relay (or other device) in a substation is connected serially to an intermediate device like a terminal server or RTU, and that intermediate device has External Routable Connectivity, in what circumstances can the relay itself be considered to have ERC?  In what circumstances should it not be considered to have ERC? 
15.   Criteria 2.6 and 2.9 both refer to IROLs, which are not used in WECC.  How should WECC entities interpret these two criteria without referring to IROLs?
16.   What does “routable” mean?  There are about three places in CIP v5 where a definition is required, but there is no NERC definition.  While this may seem like a fairly well-understood term, it isn’t so clear cut when you look at Modbus/TCP, DNP/IP, etc.  NERC’s very well-written 2010 guideline for identifying Critical Cyber Assets (pp. 26-29) contains a good discussion of this issue.  Should entities assume that, if their definition of “routable” coincides with what is discussed in this document, that they are defining it properly?
17.   What constitutes a “substation” (there is no NERC definition)?  This is important for criterion 2.5.  For example, suppose a substation meets the 3000-point threshold in criterion 2.5, but has two separate control rooms.  If each of these control rooms is considered part of a separate substation (say there is a fence between them, or the entity decides to put one there to lower their compliance costs), then each of the separate substations probably won’t have 3000 points.  Instead of one Medium substation, there would be two Low substations, and all the BES Cyber Systems at both “substations” would be Low impact.
18.   Low impact assets are “defined” in R1 as “assets containing a Low impact BCS”.  But no inventory of cyber assets is required for Lows, and thus BCS will never be identified at Low impact assets.  How is this contradiction to be reconciled?   
19.   Conversely, the implication of this “definition” of Low assets seems to be that assets that don’t contain a Low impact BCS aren’t even Lows.  This is good, but how do you prove it to your auditor, without inventorying all the Cyber Assets at the potential Low asset to show that none of them meet the definition of BCA/BCS?
20.   The beginning of Section 3 (“Low Impact Rating”) of Attachment 1 reads: “BES Cyber Systems not included in Sections 1 or 2 above that are associated with any of the following assets…”  This states pretty clearly that the entity is to a) take their pre-existing list of BCS, b) subtract out those BCS not identified as High or Medium impact in Sections 1 or 2, and c) identify the remainder as Low impact BCS.  There is only one problem with this: The entity is never required to make a list of all their BCS before they start to classify them[vi].  In fact, v5 says explicitly in two places that an inventory of Low impact BCS isn’t required.  How can this contradiction be reconciled?


The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.


[i] The list also contains a couple items – like 14 and 16 – that aren’t really part of R1 per se, but are definitely part of the asset identification process for substations.  Therefore, Transmission entities need to address these at the same time they’re applying R1 to identify BCS.

[ii] Of course, I don’t recommend that Southern California Edison tell WECC that their control centers don’t have to comply with v5; there is too much other evidence in R1 and Attachment 1 that control centers do have to comply.  It was obviously a mistake by the SDT that “Facilities” was capitalized.  I know one region has suggested to NERC that there be an errata filing for all of the v5 standards – since this wording appears in all of them, as well as the v6 and v7 standards – asking to un-capitalize that word in Section 4.2.2.  But I doubt that will happen.

[iii] The BROS were part of the definition of BES Cyber Asset in the first draft of v5, so they were then the “official” means for identifying BCA/BCS; this was changed in the second draft.  See footnote v below.

[iv] For instance, let’s go to the Medium impact criteria in Section 2 of Attachment 1, and see if they “map” to the six asset types.  Each criterion has a subject, generally at the beginning of the criterion.  Some of those subjects do vaguely resemble items on the asset list, but how about criteria 2.2 (where the subject is “reactive resources”), 2.9 (“Each…Remedial Action Scheme (RAS), or automated switching System that operates BES Elements..”), and 2.10 (“automatic load shedding systems”)?  These aren’t on the list of six at all.  Why did the SDT carefully provide us this list of six asset types and tell us to consider them in Attachment 1, then ignore some of them and add some new ones when we actually get to Attachment 1?  The answer is what I’ve just said: the six asset types are the locations where you should look for BES Cyber Systems, not what gets run through the criteria; you classify the BCS themselves using the criteria in Attachment 1.  High BCS will always be located at a High asset, because they have to be “used by and located at” a control center that meets one of the four High criteria.  But Medium BCS don’t have to be located at a Medium asset/Facility, since they just have to be “associated with” the asset/Facility, not “at” it.  It is important to know that a Medium BCS has to be located at one of the six asset types, rather than somewhere else like the home of a manager.

[v] My opinion is they won’t, since NERC’s reasoning for their “ruling” on Transfer-Trip relays was very specifically tied to Criterion 2.5 (and that reasoning had first appeared in this blog post about two months earlier, having been contributed by an Interested Party).

[vi] Actually, in the first draft of CIP v5, which was roundly defeated in the first ballot in December 2011, the wording of R1 clearly required the entity to first inventory all Cyber Assets in its system, whether High, Medium or Low (of course, at this point in the requirement, nothing had yet been classified High, Medium or Low impact).  Then they had to determine which were BCAs (and hence what the BCS were, although the first draft of v5 almost used the terms BCA and BCS interchangeably).  To identify BCAs/BCS, entities had to apply the BES Reliability Operating Services analysis to each cyber asset (since the BCA definition at the time was based on the cyber asset’s fulfilling a BROS).  Of course, this would have been an incredible burden on entities, since it would have literally required spending hours inventorying and classifying every cyber asset they owned.  I wrote a post on this while the first draft was being balloted, and flatter myself that I contributed in a small way to its defeat.  R1 (and the BCA definition) was substantially rewritten at the first SDT meeting after this ballot.

Monday, December 15, 2014

Here’s the Smoking Gun


Bloomberg published a very good story over the weekend about a 2008 oil pipeline blast in Turkey that was definitely due to a cyber attack[i].  It caused well over $1Bn in losses, as well as a large spill.  My $.02 on this:

  • This and Stuxnet are the only two well documented successful cyber attacks on ICS that caused major physical damage (and I imagine the dollar loss for Stuxnet was much less than for this one, although it did also set Iran’s uranium enrichment program back a year or so.  For a great summary of Stuxnet a couple years after the fact, see this article by Ralph Langer).
  • However, I think this attack should be much more chilling for North American infrastructure owners (including power), since this was done by the “bad guys”.  As we all know, Stuxnet was perpetrated by the “good guys”, and was specifically targeted at the Iranian nuclear program.  Of course, the worm did end up propagating here and elsewhere, and it was expensive for some companies to clean it off their systems – but it never actually attacked other targets (and I don’t know of any successful “copycat” attacks).
  • We have all read that foreign entities, probably including nation-states, are doing reconnaissance of critical infrastructure in the US (including pipelines and of course the power grid).  The attackers in Turkey had also done their reconnaissance of the BTC pipeline and knew that the Windows system controlling the security cameras had vulnerabilities.  They exploited these vulnerabilities to attack other systems, as well as to disable many of the security cameras themselves during their attack.  What’s there about this scenario that couldn’t happen in North America?
  • The fact that this was a cyber/physical attack just confirms what we’ve heard many times this year – that combining the two types of attacks allows the greatest amount of damage to occur.  Metcalf was a purely physical attack, and – while it was certainly quite serious – it never came anywhere close to causing the amount of disruption that a cyber/physical attack could have.  The Metcalf attackers are frequently pointed to as being very knowledgeable and “professional”, but they don’t hold a candle to the attackers of the Turkey pipeline, and the destruction they caused is pocket change compared to what was caused in Turkey.
  • I thought the conclusion of the story was quite interesting: The bombs the Russians dropped on another section of this pipeline during the war with Georgia (which started three days after the cyber attack) all missed the target.  But the cyber attackers didn’t miss!
  • The moral of my story: Nobody can say now that there hasn’t been a successful large-scale cyber attack – by genuinely evil people – against critical infrastructure.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.


[i] Sean McBride of Critical Intelligence, the subject of a recent post of mine, pointed out in his excellent blog that he had reported the incident to his customers in 2009.  He then says “If you don’t want to hear about ICS security events five years later, subscribe to the Critical Intelligence Core ICS Intelligence Service.”  Touche!

Saturday, December 13, 2014

The CIP v5 Compliance Date Needs to be Moved Back


When I started this blog at the beginning of 2013, one of my first posts was about the need to move the CIP v4 compliance date back.  It seems I’m now doing the same thing for v5, less than two years later. 

I’ve been spending a lot of time lately, on the phone and in person, with NERC entities of various sizes (well, super large to medium size); I always ask how they are coming on their CIP v5 program.  To a man/woman, they all first tell me the program is moving along well.  Once we’ve gotten past that formality, a little digging usually reveals it really isn’t going along quite so smoothly, although the entity also may be in denial and not realize how far they are from being compliant on April 1, 2016.

One problem is funding.  I know at least a couple large entities that aren’t going to officially get a dime in v5 funding until 2015.  They’re then going to have to scramble – and fight over a small pool of consulting resources – to get everything done next year (and you definitely need to have your entire v5 program in place by the end of 2015.  You then should have an assessment to see what you may have missed; that leaves you at least a couple months to fill any remaining gaps by April 1, 2016).

This might seem pretty short-sighted of those entities.  Why didn’t they budget any money for v5 in their 2014 spending plans?  However, you have to consider the history.  Until April 2013 when FERC upset the applecart and issued their NOPR , the only thing set in stone was that CIP v4 would come into effect April 1, 2014.  And even with the NOPR, I know a lot of NERC compliance people had a big problem convincing their management and legal teams that v5 was really coming.  There was good reason for their reluctance: V4 was approved, while v5 was still just a wishful statement by FERC.  It was only when FERC approved v5 on November 22, 2013 that it became crystal clear it would come into effect.  Of course, most entities had already done their 2014 budgeting by that date.

So that’s one problem.  A bigger problem has been the huge amount of uncertainty about what the wording of the CIP v5 standards means - especially my favorite standard, CIP-002-5.1.  Of course, a lot of this uncertainly has been caused by irresponsible bloggers trying to split hairs over fine points of the wording.  But leaving such riff-raff aside, NERC and the regions have been honest that more guidance is needed.  Yet it has been slow in coming, and many entities have decided they can’t wait any longer for all the holes to be filled in, and they’ve “rolled their own” solutions to do the job.

But other entities haven’t quite been able to give up the cherished belief that NERC ought to tell them what the standards mean, not they NERC.  This is a relic of the quaint old tradition (going back say 4,000 years to the Code of Hammurabi), in which it’s the authorities that interpret the laws, while the people obey them.  I’m afraid NERC entities need to discard such outmoded concepts.  This is the brave new world of NERC CIP Version 5, where any official interpretation help will be late in coming and inconclusive, and perhaps won’t come at all.  You have to roll your own solutions, or simply not do anything at all.

It is the latter option – not doing anything at all – that has been embraced wholeheartedly by a number of NERC entities; since there is no other clear path forward, doesn’t it make sense that’s the best thing to do, pending better clarification?

I’ve compared this attitude to that of the heroes of the play Waiting for Godot, one of the greatest plays ever written.  In it, two men spend the entire play standing on a virtually empty stage, waiting for someone named Godot to come; they have already been waiting for some time.  Godot sends a messenger every day to say he can't come but he will for sure the next day, yet the fact remains: The protagonists know all along Godot isn’t coming, and they’ve always known that (in fact, they’re not at all sure why they’re waiting for him in the first place).  But even at the end of the play, when it’s clearer than ever Godot will never come[i], they continue to wait.  They say they're through with waiting, but they just stand there as the curtain falls.

I don’t want to press this analogy too far (Samuel Beckett, the playwright, was of course writing about the human condition in general, not NERC CIP v5).  But it does seem to me that entities that are waiting for all ambiguities to be cleared up in v5 are in the same position as these two hapless gentlemen.  Deep in their hearts, they realize help won’t be coming – or at least not enough of it.  But they keep waiting.

I’ve now discussed two types of entities that are in danger of not meeting the 4/1/16 deadline.  One is those that haven’t had the funding available.  The other is those that may have funding, but are paralyzed by the fact that there are so many holes still to be filled in the interpretation of v5.

But there’s another entity that is probably even worse off than both of these.  This is an entity that thinks they are on the road to full compliance.  They’re mounting a big effort to understand the CIP v5 standards, and they’re producing various presentations and documents on what v5 means, both in general and for them.  But they’re not actually doing what needs to be done for compliance: deciding what policies, processes and technologies need to be in place for v5, then implementing them. 

I say this type of entity is worse off because they don’t know it.  They think that every PowerPoint and position paper is moving them further toward compliance, when at best they’re pretty much standing still.  The fact is, you’re on a fool’s errand if you think you can approach v5 compliance from first principles.  I’ve written probably 25-30 posts just on CIP-002-5.1, and the only conclusion I’ve been able to reach about first principles in that standard is that there are none (more accurately, it was built on two or three very different first principles, and the contradictions between them were never reconciled).

Richard Feynman, one of the greatest physicists of the 20th century, famously said (about quantum mechanics, literally the foundation of modern physics and the reason I’m able to type this post on a computer that doesn’t take up multiple rooms and cost millions of dollars), “If you think you understand quantum mechanics, you don't understand quantum mechanics.”  Unfortunately, the same applies to CIP v5: If you think that developing a deep understanding of what v5 means will help you comply with it, you don’t understand v5 in the first place.  Put down your PowerPoints; pick up your pen and start writing your v5 policies and procedures.  Those are what CIP v5 means.

I’m sure there are other reasons why entities aren’t ready for v5.  But I don’t need to know all the many reasons, nor do I need a survey or focus groups to tell me this: The majority of NERC entities won’t be ready for CIP v5 compliance by 4/1/2016, or if they do actually make the date it will be because they’ve spent far too many ratepayer dollars (or shareholder dollars) than they should have[ii].

So I’m saying the main v5[iii] compliance date should be pushed back – at least six months, hopefully a year.  Of course, this would mean that all the other compliance dates would have to be pushed back as well.  What’s the mechanism for this to happen?  Beats me, but it certainly seems something could be worked out among NERC, FERC and the regions.  Something has to be worked out anyway, given the interpretation problems (and you'll hear more from me on these interpretation problems very soon.  You might want to put me on your spam list while there's still time) and the fact that the ship has sailed on any effort to deal with them in a “legal” way.  These are extraordinary times, requiring extraordinary measures.

What are the chances the date actually will be pushed back?  I’d say they’re slightly better than those of the Cubs winning the World Series next year.  But you never know.  It’s been “Wait ‘til next year” for 106 years here in Chicago; one of these centuries, next year will come.

This post is the first of four posts that describe why the v5 compliance date needs to be moved back, and what else needs to be done to address the serious problems in CIP-002-5.1, the foundation of all the CIP v5 (or more explicitly CIP v6.3940) standards.  The next post in this series is here.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.


[i] VLADIMIR:
You have a message from Mr. Godot.
BOY:
Yes Sir.
VLADIMIR:
He won't come this evening.
BOY:
No Sir.
VLADIMIR:
But he'll come tomorrow.
BOY:
Yes Sir.
VLADIMIR:
Without fail.
BOY:
Yes Sir
(silence)
.…….
VLADIMIR:
What does he do, Mr. Godot?
…..
BOY:
He does nothing, Sir.
(silence)
……
BOY:
What am I to tell Mr. Godot, Sir?
VLADIMIR:
Tell him ….tell him you saw me and that…..that you saw me…..

[ii] Of course, given that I’m a CIP consultant, and given that the greater part of the money these entities waste will be on consultants, you might say this is the best thing that could happen for me.  But the fact is, there aren’t that many consultants who can really help right away in the v5 effort, although there are a lot who will say they can - meaning they’re happy to learn the ropes on your dime.  These people will come out in droves – correction, are coming out in droves – and the majority of the consultant spending in 2015 will be on them.  As I said, after an entity has spent a huge amount on these people, it may well be compliant on 4/1/16.  But it wouldn’t have to be this way, if entities were given more time to comply.  They could actually take their time and become compliant in an efficient, cost-effective manner.

[iii] And when I say “v5”, I mean the combination of v5, 6 and 7 standards that entities will actually have to comply with.  I’ve named this Version 6.3940, but I know everyone will continue to refer to the whole thing as v5; I’ll continue to do so as well, at least at times.