Tuesday, September 29, 2015

The News from WECC, Part IX?

I have been attending WECC meetings on NERC CIP since 2008 – as many as possible – and have always found quite important pieces of information in them. I’ve written a number of posts on what I learned or what went on in these meetings, some called “The News from WECC” and some with other titles. I’m guessing this might be the ninth such post, but I’m not sure. In any case, the sheer size of these meetings (always at least 250 people, usually 350-400), coupled with the size and talent of the WECC CIP audit staff (9-10 auditors), has always led to some good discussions and good information.

So it was with WECC’s “Advanced CIP Training” that I attended in Salt Lake City on September 9 and 10, 2015. This training had originally been billed as “CIP 101” (the same title used the previous year), but that was changed in the last couple of months. It seems we all graduated to Advanced even before we attended the course! Not having an overall theme for what I’m going to write, I’ll simply list the different things I learned in roughly the order they were mentioned in the meetings.

I recommend that any entity, no matter what region they’re in, take a look at the slides from this meeting. You can find them here. I particularly call your attention to the slides (in almost all of the presentations) showing likely Data Requests and Sample Interview Questions; these are good for audit preparation, even if you’re not in WECC.

What Breaks External Routable Connectivity?
As discussed in this post, FERC made clear in their July NOPR on CIP v6 that they didn’t like NERC’s idea of what can “break” Low Impact External Routable Connectivity (LERC) at Low impact assets. Specifically, they said they didn’t like Reference Model 6 in the Guidance and Technical Basis section of CIP-006-6, which stated that there can be a “protocol break” in conjunction with a change in the communications stream from routable to serial. As I discussed further in this post, this argument can well be construed as applying to ERC (i.e. for High and Medium assets) as well.

In the meeting, I asked Morgan King, the WECC auditor who had elaborated the idea of a protocol break in ERC at WECC’s January CIPUG meeting, whether he still thought that entities could claim this “exemption” – for either Low or Medium/High BES Cyber Systems. As I expected, he said no – that FERC’s opinion clearly carried weight for LERC and ERC, even though it had only been expressed regarding the former.

But even though FERC didn’t like the general idea of “protocol break”, they didn’t say there was nothing that could break ERC in the case where the end devices were serially connected, as is often the case with relays. Morgan agreed with me that, if a device that “translates” routable to serial communications requires re-authentication of the user, this still does break both ERC and LERC. And Morgan did allow for other means of breaking ERC/LERC as well (this post lists another situation that I would think fits that bill).[i]

But there’s a Catch to This, in the case of LERC
When I wrote about the NOPR, I pointed out “…if the entity is going to prove that there is no LERC at a particular Low asset – when there is clearly some routable connection going into the asset – then they are potentially going to have to identify any BCS and show that there is no LERC to any of them.” I said this because I saw no way an entity could prove that Cyber Assets at a Low asset don’t have LERC without being able to demonstrate this for every Cyber Asset. And this means there needs to be a full inventory of cyber assets at that asset.

Wanting to see if WECC saw this the same way, I first asked the question whether WECC will require an inventory of all Low BCS. The inimitable Dr. Joe Baugh, the WECC auditor who specializes in asset identification issues, said no. I then asked Joe whether if an entity were going to claim, for a Low asset that has some sort of external routable connection, that there is not in fact LERC at that asset, they would potentially need to be able to show this was the case for every BCS. He said yes, and admitted that in such a case, the entity will need to have an inventory of their Low BCS at that asset. My guess is the other regions would probably say the same thing; but as in most cases, you need to check with your region if you question this.

This was the first time I saw this acronym used in the NERC community, but I predict it will be a big hit. Of course, it means Low Impact BES Cyber System. And Josh Reber helpfully suggested some even better acronyms on slide 8 of his presentation. My favorite is MIBESCSWERCATAEACMSAPACS.

You should do an annual inventory of Cyber Assets for CIP-002 to make sure you haven’t missed any newly-installed devices. The one exception I would make to this rule is when an entity has a rock-solid change control process, and they will always know when a new device has been installed.

In Josh’s presentation, he mentioned that many entities had wondered what is the typical number of roles that other entities have delineated for training purposes in CIP-004-6 R2. He was quite frank: they typically see between 8 and 15 roles. Now you know.

CIP-005 (Andrews)
Joe Andrews gave a great presentation on CIP-005. Some of the points he made:
  • He distinguished between “Discreet ESPs”, in which the entire ESP is contained within one PSP, and “Extended ESPs”, in which an ESP crosses between rooms or even buildings, and there is some cabling that goes outside of the PSP. The new CIP-006-6 R1.10 is intended to address Extended ESPs.
  • He pointed out that to separate networks of different trust levels, you need a Layer 3 device – not a Layer 2 device with VLANs.
  • He said network gear needs to be identified as BES Cyber Assets, since it meets the BCA definition. To be honest, I think there might be some disagreement on this point, so you should check with your region if you don’t agree with this.
  • He said there can’t be mixed-trust VM environments. So you can’t have some devices on a VM server that reside in an ESP, and some that don’t. I believe I heard the opposite on a recent TRE webinar, but as with all questions on CIP v5, check with your own Regional Entity.

CIP-005 (King)
In case you have any doubt that some of the thorniest issues in CIP v5 are in CIP-005 (although I’d say it takes second place to CIP-002 in that regard), they will be allayed when you note that there were actually two presentations on CIP-005 at this event, totaling about 140 slides. The second presentation was by Morgan King. Two (of many) interesting points in his presentation:
  • He pointed out that the NERC Lesson Learned on Interactive Remote Access provides a good guide to when a remotely-executing script constitutes IRA and when it doesn’t. See slide 85.
  • He noted that an Intermediate System (IS) should be in the DMZ, but an overriding consideration is that it must be in a PSP; this is because an IS must be declared an EACMS (per the definition of IS). So if your DMZ is outside of the PSP, you need to include the IS within the ESP/PSP.

Eric Weston and John Graminski did an extensive presentation on CIP-007-6, which I recommend you read. Some of the interesting points they made:

  • There is such a thing as an “in-line antivirus proxy device”, which in theory can protect all of the cyber assets within an ESP. They didn’t say this would eliminate the need for anti-malware software, but they did point out that it is a good precaution in case there are devices that don’t have updated A/V signatures.
  • For CIP-007 R1.2 compliance (physical ports), it’s not OK to just put signs warning against use of USB devices on the PSP. They need to be on the device itself (or its cabinet if locked).
  • For R1.1 compliance, a host-based firewall can be used as evidence that a service is disabled. However, this doesn’t help for purposes of compliance with CIP-010 R1.1; that is, it doesn’t constitute evidence that a logical port is not “network accessible”.
  • (This point was actually made by one of the participants) You can’t justify a port just by saying “The vendor requires it.” You need to point to some reason why it has to be enabled.
  • In R1.4, logging can be performed at either the cyber asset or the system level (which brings up a good point: Some of the requirements in CIP-007 can only be performed at the cyber asset level, e.g., R1 and R2. For these requirements, the words “BES Cyber Systems” in the Applicability column should be replaced with “Components of BES Cyber Systems”).
  • For R4.2, the Guidance implies that real-time alerting can be accomplished with technical means only. But the actual requirement allows for procedural means as well.
There's a follow-on to this post here.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I’m told that NERC’s comments on the NOPR introduce a new term: “security break”. It seems like it’s essentially an authentication break, which I certainly hope FERC agrees is something that would break LERC (and probably ERC as well).

Friday, September 25, 2015

Do We Need to Pay to Receive Patches for Network Devices?

Matt Light of Deloitte Advisory and I ran a very successful workshop at EnergySec’s Summit on September 14. The title of the workshop was “Implicit Requirements in CIP Version 5”. One of the points I made was regarding software that has run past its end of life, for which the vendor is no longer offering patches. I believe such software doesn’t fall under the patch management requirement, CIP-007 R2, since there are no publicly available patches to apply.

I based this opinion on an email an auditor had sent me in response to my question whether machines running end-of-life software could be “compliant” under CIP v5, given that no more patches are publicly available. I discussed the question and the auditor’s response in this post. In brief, the auditor said “Patching is not an issue.  If (a software product) is past end-of-life, no patches are available, therefore there are none to assess for applicability.”

I got a couple of good responses to this post, which I discussed in a second post. That post dealt with two questions, one of which was whether the fact that a vendor is offering post-end-of-life software support (which presumably includes non-public patches) to certain organizations means that entities must subscribe to this service if they still have that software installed on machines that are subject to CIP v5. The auditor replied “In my view (and I cannot guarantee all auditors will take the same stance), there is nothing in the CIP standards requiring an entity to subscribe to post-end-of-life for-fee support.  There are no more…patches publicly available from a vendor that routinely makes its patches publicly available for supported systems.  I would not demand that an entity pay (significant) dollars for continued, limited security support.  But if they did, then they are expected to monitor for and install any applicable patches that might be released under the support agreement.”

I discussed both questions and the auditor’s answers with the workshop attendees. At this point, one of the attendees raised the question of annual maintenance fees for network devices, including routers and switches. They stated that these fees could amount to a lot of money for their organization, and implied that they weren’t currently paying them. The problem is that some major networking vendors release patches only for devices that are covered by a current maintenance contract. Since the auditor had said he wasn’t going to require entities to pay for extended software support, would he say the same thing about network vendor maintenance fees?

I ran this by the auditor, who said “The difference is that some major network vendors use a paid maintenance and support model, while some major software vendors do not - i.e. (the vendor) doesn’t normally require maintenance in order to receive…patches.  (The software in question) is past end of life and is an unsupported product.  The (network devices in question) are not at end of life and are still supported.  (The networking vendor) is still publishing updates to the general user community. (In contrast, in our supposition, the software vendor isn’t providing support or patches for their end-of-life product) Obtaining post-end of life support is not something the auditors have been expecting. Maintaining and protecting supported systems is expected. Pursuing this logic, is the entity suggesting that they are off the hook if they choose to not pay for annual support and maintenance of their SCADA/EMS?  Their generation plant DCS?  Their PACS?  That position will not be helpful at audit and especially unhelpful if they fall victim to a preventable cyberattack.”

So the answer is pretty clear that the entity needs to pay for networking device support, as long as the device has not yet reached end of life. But as always – and I said this so many times in the workshop that it became a source of amusement – you should check with your Regional Entity to confirm whether or not this is their opinion.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

Thursday, September 17, 2015

Rewriting CIP-002, Part II

This is the second in a series of four or five posts on the need to rewrite CIP-002-5.1. You can find the first post here.

II. The Primary Problem with CIP-002-5.1
CIP-002-5.1 R1 and Attachment 1 are confusing and contradictory. However, this hasn’t stopped NERC entities, Regional auditors and even NERC staff members from coming to a pretty good consensus on what it means. And this is a good thing – otherwise, the effort to implement CIP v5 would be at a standstill.
However, the fundamental problem with CIP-002-5.1 R1 is that this consensus is completely at variance with the words in the standard. It is literally true that an entity can’t comply with the standard – in the way that virtually all parties agree it should be done – without violating the wording. And if an entity were to try to follow the literal wording of the standard, they could never come into compliance – the wording is vague and contradictory, and omits many required steps. There is simply no way the compliance process could be flowcharted, even if the chart were the size of Yankee Stadium. This makes this standard completely unenforceable, and is the primary reason that I say it has to be rewritten before CIP v5 (and v6, and probably later versions) can be enforceable.
Here is the way most NERC entities I have talked to, as well as most auditors and others who have given presentations at Regional Entity meetings, understand the compliance process for CIP-002-5.1 R1.[i]
  1. Using the list of six asset types in R1, identify all assets owned by the entity (or operated by them) that correspond to one of those types.
  2. For each of these assets, identify those that are High or Medium impact.
  3. At each of the High or Medium impact assets, identify BES Cyber Assets using the definition, then aggregate these into BES Cyber Systems (the process - or really processes - that gets you from BCA to BCS is of course not described in the standard, but there has been some guidance published on this, including by me. My guess is auditors aren't going to worry about that too much, as long as the entity can show that every BCA is included in one or more BCS).
  4. Classify High and Medium impact BCS. Except in the case of large generating stations that fall under Criterion 2.1, all BCS located at a High asset will be High; all BCS located at a Medium asset will be Medium.
  5. List assets not classified as High or Medium as Low impact (specifically, “containing a Low BCS”). 
This is a nice, simple methodology. It corresponds very closely to the CIP v1-v3 methodology, if you substitute Critical Asset for High or Medium impact asset and Critical Cyber Asset for BES Cyber Asset and BES Cyber System. In fact, I know some entities are even using the definition of Critical Cyber Asset (a Cyber Asset “essential to the operation of” a Critical Asset) as a guide to identifying BCAs. Given that there is little guidance on how to interpret the words at the core of the BCA definition - “affect the reliable operation of the BES” – this isn’t such a terrible way to do it.
However, there is one big problem with the above methodology: it doesn’t correspond at all with the greater part of the wording of CIP-002-5.1 R1 and Attachment 1.  What does the wording actually say? Aye, there’s the rub – it’s literally impossible to give a clear account of what R1 says, other than to say that it in no way corresponds with the popular methodology described above. There are three main problems with it.
First, the wording is far too compressed. CIP-002-3 had three requirements leading up to identification of CCAs; CIP-002-5.1 has one, yet it actually encompasses (explicitly or implicitly) many more steps than the three v3 requirements did (in fact, R1 implicitly contains all 15 of the steps listed below, plus many others as well). When you’re writing regulatory standards, brevity isn’t a virtue – clarity is. Save the brevity for when you’re writing haiku poetry.
Second, some key steps are left entirely implicit; it us up to the entity to figure them out, usually by having to go to a definition. For example, one of the most important steps in the v5 asset identification process – the identification of BES Cyber Systems at Medium and High assets - is nowhere to be found in R1 (the word “Identify” is used in R1.1-1.3, but those requirement parts are actually telling the entity to classify BCS, rather than to identify them in the first place). The entity has to piece together their own idea of how to identify BCS by looking at the definitions of Cyber Asset, BCA and BCS; this leads to what I call the “bottom-up” approach to BCS identification.
But there’s another approach to BCS identification, outlined in the Guidance and Technical Basis. This one is based on the BROS, and is what I call “top-down”. There is no acknowledgement in R1 that there even are two approaches; yet since the BROS aren’t in the requirement at all but the three definitions are, by implication this means “bottom-up” is really the “required” approach. If so, why are the BROS talked about at all in the Guidance? No word on that, although I’ll give my theory on this in the next post in this series.
But I believe both approaches have their uses. Bottom-up is better for substations and Criteria 2.3 or 2.6 generating stations; top-down is better for control centers and Criterion 2.1 generating stations. You can read more about these two approaches in this post.
The third problem with R1 and Attachment 1 is that, even if you have an idea what the steps are that need to be taken to comply with R1, the order in which they need to be taken is not apparent from reading the requirement. You just need to piece them together logically.
Fortunately, I have pieced these “implied” steps together as best I can. I’m now ready to show you the primary steps that I believe are required for compliance with R1.[ii] When I get done, see if you can even remember half of them, let alone repeat them in logical order (as you read the steps below, you’ll realize that many of them actually contain a number of sub-steps).
  1. Develop a list of BES assets that meet one of the six asset types listed in R1.
  2. Decide whether, for substations, your entity will classify BCS based on the “rating” of the substation or that of the Facility (the line, transformer, bus, etc.) with which the BCS is associated.[iii] An entity that takes the second option will be able to classify some BCS at “Medium” impact substations as Low impact, not Medium. As far as I can see, most entities are taking the former option, not the latter. In almost all cases, this is because the entity doesn’t understand that there are two ways to do this (since no Regional Entity I know of has been promoting the idea that entities have these two options). In the few cases I know of where the entity understands the options and has deliberately chosen the first one, it is because they think it will complicate the asset identification process too much to implement the second option.[iv] I disagree with this assertion in general, but I do agree that there are organizational reasons why the second option might not work for many NERC entities. In any case, the entities should be told they have both options, and this isn’t being done at all now. It’s too bad, since it can potentially save an entity a lot of time and money required to implement v5 compliance.
  3. If the entity has decided to take the second option above, then it still needs to identify “Medium” substations (although a better description would be “substations containing one or more Medium impact Facilities”). At these substations, it then identifies the Medium Facilities, leaving other BES Facilities at the substation to be Low impact.[v]
  4. If the entity has decided to base their R1 process on assets, not Facilities (i.e. the first option in step 2), they must use the asset list from step 1 to identify High and Medium impact assets by running through the bright-line criteria (since criteria 2.4-2.8 refer to Facilities, this means not paying attention to that word and substituting the word “Substation”. Similar tricks have to be played with some of the other criteria, including 2.3, 2.9 and 2.10. Are you writing all of this down?).
  5. If the entity has decided to take advantage of the word “Facilities” in Criteria 2.4 to 2.8 (i.e. they’re using option 2 in the second step above), they need to identify the Facilities at each Transmission substation that meet one or more of these criteria. For example, a 500kV line will always become a Medium impact Facility under Criterion 2.4, and the substation it’s located at will be called a “Medium” substation; but a 345kV line located at the same substation will be Low impact.[vi]
  6. Once all assets and/or Facilities that are High or Medium impact have been identified, then BES Cyber Systems must be identified. This identification step, which is nowhere stated in R1 or Attachment 1[vii], is probably the most important in the R1 process. Since no BCS identification process is stated in the requirement, the entity is left to piece together whatever procedure it can, based on the definitions of Cyber Asset, BES Cyber Asset and BES Cyber System (which I call the “bottom-up” procedure). However, a different process is described in the Guidance and Technical Basis section, where the concept of BES Reliability Operating Service is introduced and used as the basis for identifying BCS (I call this the “top-down” procedure). For a description of these two procedures and when I believe each one is applicable, see this post.
  7. There is an important difference between High and Medium impact BCS that must be “overlaid” on the above procedures for identifying BCS. This is due to the fact that, in Attachment 1, High BCS are defined as those “used by and located at” High assets (which are all Control Centers, of course), while low BCS are defined as those “associated with” Medium assets or Facilities. This means that High BCS will always be located at the Control Center that meets the High criterion, whereas Medium BCS don’t necessarily have to be located at a Medium asset.  
  8. Since BCS associated with Medium assets or Facilities don’t have to be located at the same asset, this in theory means they could be located anywhere. The one restriction is that the BCS must always be located at one of the six asset types in R1. So if an AGC system (that meets the definition of BCS/BCA – 15 minute impact, etc.) associated with a Medium generating station is located in another plant or in a Transmission substation, it will itself be a Medium BCS. If it’s located in somebody’s basement, it’s not a Medium BCS.
  9. The interesting question is how the entity will identify associated BCS that aren’t located at the Medium asset. And the answer to that is they will simply have to know they’re there. However, this is not how R1 reads. Taken literally, R1 implies that the entity has to scour every asset it has that corresponds to one of the six types, to identify BCS. Of course, High BCS will always be at a High Control Center, so the entity only needs to look at those Control Centers to find High BCS. But Medium BCS don’t have to be at a Medium asset, so every High, Medium and Low asset needs to be gone over with a fine-toothed comb to identify any BCS that are associated with any one of the Medium assets or Facilities (not just the one they’re located at). Of course, this requires conducting an inventory of every cyber asset at all Low assets, determining which of these are Cyber Assets, then determining which Cyber Assets are BES Cyber Assets (and finally grouping these into BCS). Of course, none of the regions are interpreting R1 this way, so you don’t have to worry about having to actually do this[viii]. But it’s just another example of the wording of R1 not corresponding to how people are actually going to comply with it – and the result is that the only way to sensibly comply with R1 is to disregard most of its wording.
  10. The next step in the R1 compliance process – as it is written or implied by the actual wording – is to classify the BCS that have just been identified, that are “used by and located at” High impact Control Centers and “associated with” Medium impact assets. This is pretty easy, of course, as long as you watch these words carefully. Every BCS located at a High Control Center will itself be High, unless it is associated with a Medium or Low impact asset or Facility and is not also used by the Control Center itself – in which case it will be Medium or Low. And every BCS associated with a Medium Facility and/or asset will be Medium, except for BCS in a Criterion 2.1 generating plant, which will be Low if they impact less than 1500MW (aren’t you having fun so far?).
  11. The above step needs to be modified in the case that the entity is using the second option in step 2 and classifying BCS in substations according to the Facility they’re associated with. As described in step 5, some Facilities (lines, etc) will be Medium, some Low. The BCS (which will more often than not be relays) associated with Medium lines will be Mediums, and those associated with Low lines will be Lows.
  12. The last major step in the CIP-002-5.1 R1 compliance process is to identify Low impact assets. Since every asset that corresponds to one of the six asset types in R1 will have to be High, Medium or Low impact, all you have to do is subtract the Highs and Mediums from this list in order to identify the Lows – right? Again, this is how just about all entities will do it, but once again this requires violating the wording of the requirement (and Attachment 1). R1.3 says the entity has to identify “each asset that contains a low impact BES Cyber System according to Attachment 1, Section 3...” So now you go to Attachment 1, Section 3, and what do you find? It says you’re supposed to identify “BES Cyber Systems not included in Sections 1 or 2 above that are associated with any of the following assets...”  What’s going on here? R1.3 says you’re supposed to be identifying assets that “contain Low BCS”, yet Section 3 says you’re supposed to be identifying Low BCS themselves – even though R1.3 says explicitly that no list of Low impact BCS is required! Fortunately, all entities assume that what R1.3 is really saying is what I just showed above: you just have to take High and Medium assets out of your total asset list, to come up with Low assets. But this is another glaring instance of the fact that the only way effectively to comply with R1 is to violate its literal wording.
  13. In the “Thank God for small favors” department, there is one advantage to the fact that Low assets are referred to as “assets containing Low BCS”: If you can demonstrate that an asset on your initial Low list actually doesn’t contain any BCS, then it falls off the charts and you don’t have to apply even the Low impact requirements to it. For an asset that contains no cyber assets at all, this is very easy – it obviously can’t contain a Low BCS. For other assets that do contain cyber assets, if you want to take the time to show that none of these cyber assets meet the definition of BCA (i.e., no 15-minute impact on the grid if misused), then you should be able to remove these assets as Lows as well. I call these assets “No impact”.
  14. In putting together your Low asset list, you also need to keep in mind that any Medium or High impact assets that contain Low BCS need to be on the Low list as well. One example of this is a Criterion 2.1 plant that has some BCS that don’t affect 1500MW; these will be Low BCS, so the plant is both Medium and Low impact. Another example is a Medium substation that contains a BCS that is part of an SPS that doesn’t rise to the level of being included in Criterion 2.9, so the SPS is Low impact. This substation will also be Medium and Low. A third example: For an entity that is classifying BCS in substations based on the Facility they’re associated with, substations that contain both Medium and Low Facilities and associated BCS (for example, a Criterion 2.4 substation that contains both a 500+kV line and a 345kV line) will themselves be Medium and Low impact.
  15. Finally, if your entity has a Distribution Provider registration and owns one or more of the asset types listed in Section 4.2.1, these must be listed as Lows.

As complicated as the above list is, it in no way encompasses all of the steps required to comply with CIP-002-5.1 R1. For example, every entity needs to develop a “definition” for “Programmable”, as well as for the words “affect the BES” in the BCA definition. Every entity with substations that contain Transmission and Distribution Facilities needs to develop a methodology for distinguishing the two, as well as their associated cyber assets. Etc, etc. And then there are a huge number of questions on application of specific bright-line criteria; in fact, I don’t think you could ever write them all down, no matter how long you spent at it.
As I said in footnote ii below, I gave up a while ago on trying to write down a complete list of steps for complying with CIP-002-5.1 R1. It simply can’t be done, given the current wording of the standard. This is why literally nobody is actually following the words of R1 and Attachment 1 in their compliance process. It is simply impossible to do so.
Now compare the above 15+ steps to the five steps shown earlier in the post – that is, the list of steps that entities are actually following as they comply with CIP v5. Is there any wonder that entities are following this methodology, even though it doesn’t at all follow the actual wording of the requirement? There is simply no way an entity could comply with the actual wording of CIP-002-5.1, no matter how many years they spent trying to understand it.
In the next post, I will discuss how I believe this mess came to pass.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Even NERC follows this model. NERC hasn’t itself put out any guidance on the R1 compliance process. However, their February filing with FERC on the results of the BES Cyber Asset Survey, which was based on an implicit idea of R1 compliance, basically follows this model.

[ii] I originally thought it would be possible to put together a list of all the steps – actual and implied – required for R1 compliance. My last two major attempts at this were both in 2014: here and here. I now realize there can be no definitive methodology, period. You could spend the rest of your life trying to document the compliance process for R1, and you’d die before you finished.

[iii] For those who may be new readers of these posts, I have been pointing out for quite a while that the substation criteria, 2.4 to 2.8, don’t actually apply to substations at all, but to the Facilities at those substations.  Facilities is a NERC-defined term and means a line, a bus, a transformer, etc. This means that some of the BCS in a substation subject to one of these criteria might be Low impact, not Medium. I’ve discussed this in a number of posts, but this post was dedicated to the issue.

[iv] Another reason is that many entities aren’t sure about what is networked to what in their substations. So even though they might be able to classify some BCS as Low, they’re concerned that it wouldn’t make a difference since they couldn’t show the Low BCS weren’t networked to the Medium BCS (which would make the Low BCS into Medium PCAs, thus subjecting them to most of the requirements of Medium BCS). I can’t argue with this idea, since they know their substations; I don’t.

[v] Of course, any purely Distribution facilities – that is, lines and other facilities that don’t meet the BES definition in the first place – will not be in scope for v5, regardless of which option the entity chooses. However, in substations that mix Transmission and Distribution facilities (e.g., a substation containing 69kV as well as 115kV lines), the people doing the inventory need clear guidance on how to separate the two types; writing this down isn’t as easy as it might seem. See this post for further discussion of this point.

[vi] Criterion 2.5 is confusing because – even though its subject is the word “Facilities” – it does actually appear to be providing a criterion for classifying the substation itself; this is the famous 3,000-point table. However, what 2.5 actually does is a) Provide a criterion for Facilities to be Medium impact (those Facilities “operating between 200 kV and 499 kV at a single station or substation”), but then b) classify those Facilities as Medium only in the case that the substation itself has three connections and meets the 3,000-point threshold. Of course, if step b weren’t there, then every line, transformer, etc. between 200 and 499kV would be Medium impact – no matter where it was located; the SDT clearly didn’t want this to happen.

[vii] R1.1 – R1.3 use the term “Identify” in sending the user to Attachment 1, but this really needs to be understood as “Classify”, since the purpose at this point is to determine which BCS are Medium or High impact. It is assumed the entity has already identified BCS at High and Medium assets, even though that step isn’t called for anywhere in R1 or Attachment 1. This is a good example where an excessive concern with brevity has made R1 impossible to comply with as written.

[viii] The “far-end relay” question is of course related to this. Many people became quite upset when they came to believe that such a relay, on a 200-499kV line that terminated in a Criterion 2.5 substation, would itself be Medium impact, even if it were at a Low substation. However, NERC’s Lesson Learned on this issue last September – which echoed what an Interested Party had pointed out to me the previous June, as shown in this post – pointed out that the particular wording of Criterion 2.5 (which wording is also in 2.6) specifically prevents this from happening. Unfortunately, this hasn’t prevented a number of people – including one or more NERC staff members very involved in the CIP v5 effort – from mistakenly saying that there is now a new principle that “Location does matter” – and this means that Medium BCS have to be located at the asset they’re associated with, just as Highs do. That is definitely not the case (although I wouldn’t object if NERC wanted to put out a further LL saying this was actually a new principle and therefore legitimate – I’ve heard it has been promulgated in one or more of the SGAS).

Friday, September 11, 2015

Rewriting CIP-002, Part I

Shortly after FERC issued their NOPR saying they would approve CIP v5 in April 2013, I set out to do a series of posts that would create a road map for what an entity should do to comply with v5. Since the beginning is a good place to start, my first post was on CIP-002. But a funny thing happened on the way to the road map: I realized there was a fundamental “You can’t get there from here” flaw in the logic of CIP-002-5 R1. If an entity were going to try to follow the logic of the requirement in order to identify and classify their High and Medium impact BES Cyber Systems, as well as their “assets containing Low impact BCS”, they would simply run into a wall and never get where they wanted to go. Since then, I—and others—have identified many more problems with CIP-002 R1 and Attachment 1, as well as certain definitions, like “Cyber Asset”, that are necessary to comply with this requirement but that are missing or ambiguous.

However, I was surprised to see that this has not stopped NERC, the regions and the NERC entities from coming up with a generally-accepted “interpretation” of how to comply with CIP-002 R1 by first classifying “assets” as High, Medium or Low using Attachment 1, and then identifying BES Cyber Systems using a criterion something like the one used in CIP v1-3 to identify Critical Cyber Assets: those Cyber Assets that are “essential to the operation of” a Critical Asset (of course, that criterion is really used to identify BES Cyber Assets; then these are aggregated to BCS).

I’m quite happy with this development, since I didn’t want to see CIP v5 implementation stop dead in its tracks because of the problems I was writing about. But there remains one big problem: The methodology that NERC entities are using to comply with CIP-002-5.1 R1, and that the regions (and even NERC) are for the most part advocating as proper, is almost completely incompatible with the requirement (and Attachment 1) as written. In other words, assuming nobody makes the entities radically shift how they comply with R1, not a single entity will be in compliance with the literal wording of CIP-002 R1! This obviously isn’t something that is good for CIP, or for NERC for that matter. It means CIP-002-5.1 will never be enforceable, and perhaps this contagion will spread to the other CIP standards as well – since they all depend on the entity correctly complying with CIP-002-5.1 R1.

Of course, there are two ways to change this situation. One would be to beat into everybody’s head – the entities, the regions and NERC itself – what the real R1 compliance methodology is (or to state it better, the methodology that comes closest to following the words of R1 and Attachment 1). I don’t think this is a good idea, because I believe the methodology that people are adopting is far superior to the one that is found in the words of CIP-002. Plus enforcing this would be impossible in any case.

The second way to make the practice and the words compatible is to change the words – i.e., rewrite CIP-002. I have been saying for a while that this is the only possible way to make v5 enforceable, even though it will take several years (or more) for a new version to be approved and enforceable. Despite that fact, simply leaving CIP-002 in place as it is should not be acceptable. In my opinion, it’s likely to lead sooner or later to all of v5 (and v6, and perhaps future versions as well) being unenforceable. If the foundation is rotten, sooner or later the house will fall. CIP-002 is the foundation of CIP v5 and v6, and it can’t be allowed to exist forever in its current state.

This is why I say CIP-002 should be rewritten. In the next three or four posts, I will lay out the problems I see with the current CIP-002. I will conclude with a discussion of what might be done during the three or more years it will take to draft, ballot and have FERC approve the new version – given that CIP v5 and v6 will be effective in some way or other starting next April, whatever the status of CIP-002-5.1 on that date.

I divide the problems in CIP-002 into four types, each of which will have its own section in these posts (but not necessarily its own post). The sections are:

  1. “Spot” Problems
  2. Problems Caused by Excessive Brevity
  3. The Fundamental Problem with CIP-002
  4. Problems Dependent on Resolution of the Fundamental Problem
  5. What to do while CIP-002 is being Rewritten
I.                    “Spot” Problems
There are a number of problems that are confined to particular sentences or phrases, as well as a missing definition or two; I’m calling these “spot problems”.  Unlike the remaining problems discussed in this document, these problems can be dealt with through relatively small wording changes.
Before I begin, I want to point out that there is a large class of spot problems that I won’t be discussing here: These are problems with the Attachment 1 criteria.  I’m referring here to technical issues that have come up with application of the criteria in practice[i] – and that can be foreseen to continue to come up as the criteria are further applied to real-world situations.  I see the potential technical issues as almost unlimited in number.
To be honest, I don’t think these technical issues can be addressed through rewriting the criteria; I think the criteria are pretty good already – or at least as good as they could ever be.  The problem is with the whole idea of bright-line criteria being applied to an industry as incredibly diverse as this one in the first place; I don’t think God Himself could write a set of short criteria that wouldn’t lead to an almost infinite number of questions (and I don’t believe God was on the SDT in any case). I first wrote about this problem on a different blog in 2012, but I recreated that post here in 2013.
I think the best solution to the issue of the bright-line criteria is a kind of “Supreme Court of BLC” – a panel of experts from the industry that will consider disputes regarding the criteria and provide guidance.  I also know that the worst solution is to handle all of these disputes through the enforcement process.  This will be very expensive for NERC entities, the regions, and for NERC and FERC.  And to be clear, there will never be an end to BLC issues.  It’s like whack-a-mole: as soon as one issue is dealt with, a couple more pop up.  In my opinion, my court idea would have to be dealt with outside of the standard – based on some consensus among NERC, FERC, the regions and the entities, but not relying on a rewrite of NERC’s Rules of Procedure, which would take forever.
So here are the spot problems that I see in CIP-002-5.1, other than technical ones involving the BLC:
a)                  Section 4.2: “Facilities” – The title of Section 4.2, which is about the scope of CIP v5, is incomplete. Some of the “big iron” in scope for CIP v5 is indeed Facilities – i.e., lines, buses, transformers, etc. - per the NERC definition. But some of it is “assets”. This is an undefined term that is “defined” for purposes of CIP-002 by the list of six asset types in R1.  I think “Facilities” in the title of 4.2 should be replaced with “Facilities and assets”; the reasons for this will be made clearer in c) below.  In addition, Section 4.2 refers several times to “Facilities, systems and equipment”; the same phrase appears in the BES Cyber Asset definition.  For the life of me, I can’t figure out why this phrase is here, or what it means[ii].  Is the entity supposed to start out with a list of each Facility, system, or piece of equipment it owns, to see what’s in scope for v5?  Do “systems” include desktop PCs used in Accounting?  Does “equipment” include monkey wrenches?  And since, per the Glossary definition, an asset like a control center or substation can’t be a Facility (and assets obviously aren’t systems or equipment either), are control centers and substations therefore not subject to CIP v5 at all?  I think it would be much better to simply replace “Facilities, systems and equipment” with “assets or Facilities”[iii] everywhere it is found.
b)                  Section 4.2: “owned” – This section also says that the requirements only apply to FSE that are owned by a Responsible Entity.  What if an entity does a sale/leaseback deal to raise cash and transfers ownership of the systems in a control center to a finance company?  Do the requirements no longer apply to these systems?  Obviously, that shouldn’t be the case.  But I don’t see how it’s prohibited by the current wording.[iv]
c)                   Section 4.2.2: “All BES Facilities” – This section – which is of course found in each CIP v5 and v6 standard – states what is in scope for all Responsible Entities except DP’s. According to the Glossary definition, no substation or control center is a Facility.  And since in the Generation environment a Facility will usually be a single unit, most generating plants won’t be Facilities either.  So a literal application of this wording could possibly eliminate close to 95% or more of the assets in North America from CIP v5 compliance.  The solution to this problem is simple: use a lower case “f” so that the general meaning of “facilities” will apply.
d)                  Section “discrete Electronic Security Perimeters” – The problem with this wording was discussed quite eloquently in the Memorandum on “Network and Externally Accessible Devices” published in April (and rescinded in July). It seems the SDT forgot that there can now be BES Cyber Systems that are only serially connected and therefore don’t have an ESP; but they will still have external communications.  NERC recently put out a Lesson Learned that does a good job of addressing this issue; it can form the basis for a rewrite of this section.
e)                  Definition of “Programmable electronic device” – There has, of course, been a lot of controversy on this, as well as failed attempts to address the issue.  Clearly, there will be no agreement until a definition is drafted and balloted. Perhaps there shouldn’t be a definition of PED, but the Cyber Asset definition should be rewritten to eliminate the undefined term. Either way, this needs to be addressed.[v]
f)                    Attachment 1, Criterion 2.6 – This criterion reads “Generation….and Transmission Facilities..” Taken literally, the fact that this doesn’t say “Generation Facilities…” means that an entire plant is Medium impact, even if only one of its units is designated as critical to IROLs; although with a substation, only particular lines, transformers or other Facilities are in scope. However, in the Guidance the SDT did refer to Generation Facilities; plus NERC took it to mean this in their Memorandum in April. So I think it’s safe to say that this should really read “Generation Facilities and Transmission Facilities”.

For Part II of this series, go here.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] There is one issue of wording with the Attachment 1 criteria that I will address in this document.  But all the other issues I know of are technical ones relating to industry structure, engineering considerations, etc. As you will see, I’m saying they really need to be dealt with in a court-like setting.

[ii] Since I originally wrote this sentence – while writing Part III of these posts – I discovered where this phrase came from – a Concept Paper the SDT wrote in 2009 that provided much of the intellectual foundation of CIP v5. On the other hand, I still don’t understand why it is used in CIP v5, since the standards clearly don’t apply to “Facilities, systems and equipment”.

[iii] The other place where “Facilities, systems and equipment” is found is in the BES Cyber Asset definition.  As will be seen in the remaining Parts of this series of posts, I am recommending rewriting that definition in a way that will eliminate FSE, although that isn’t the purpose of the rewriting. But even if the definition weren’t rewritten for other reasons, I would still recommend that FSE be removed, since I think it’s just as useless and misleading as it is in Section 4.2.

[iv] I’m told there are some smaller Control Centers where the operator of the Control Center literally does not own the equipment. So this isn’t a completely academic question.

[v] On September 9, 2015, NERC released a Lesson Learned that discusses this issue (and others related to CIP-002). In that, they frankly admitted that entities need to figure out for themselves what “Programmable” means, and there are all sorts of possible ways of doing this. Of course, this discussion of “Programmable” doesn’t constitute  a definition at all, although I believe this document is as good as could be done; it is now up to each entity to develop their own definition or methodology for identifying Cyber Assets. While this is clearly the only thing that can be done in the short term, since “Programmable” is quite literally the foundation for the entire scoping process of CIP v5 and v6 (and probably for several versions after v6 as well), this situation can’t be allowed to stand in the long run. There needs to be some kind of definition of “Programmable” – or perhaps the definition of Cyber Asset needs to be revised to clarify exactly how an entity identifies “Programmable electronic devices”.

Tuesday, September 8, 2015

Where I (now) Stand on ERC

“Consistency is the hobgoblin of little minds.”
- Emerson

In my recent (third) webinar with EnergySec, I didn’t hide the fact that I have recently made a 180-degree shift in my opinion on the question of the meaning of External Routable Connectivity (ERC). On the same day as the webinar, I elaborated on this in a post, but since that post was more focused on what FERC had said about LERC (don’t you just love these acronyms?) not ERC, I didn’t give a complete exposition of what I currently believe. This post will provide that.

My previous position was stated in the post just cited and in my second webinar with EnergySec. In that webinar, we discussed NERC’s April Memorandum on “Network and Externally Accessible Devices” (which has since been withdrawn, along with the other Memoranda). That document, in the section entitled “Natively serial-based BCAs”, focused on the situation in which there is a serially-connected device such as a relay in a substation, that communications with a device – like an RTU or protocol converter - that in some way “translates” a routable communication stream (say, from an EMS) to serial format for transfer to the relay.

NERC’s position on the scenario described was clear: “Nothing in the plain language of the CIP version 5 standards or the record of development indicates that the SDT intended natively serial-based BCAs that have been modified to be externally accessible via a routable network to be treated any differently from natively routable-based devices.” (I would include a hyperlink to the Memorandum here, but the Memoranda have all been removed from NERC’s site. If you need a copy, email me at talrich@deloitte.com)

However, in the webinar and this post, I brought up Morgan King’s presentation from WECC’s January CIPUG meeting in which he stated that some devices perform a “protocol break” – that is, they terminate the routable communications coming from the EMS and initiate a different serial conversation with the relay. In such a case, Morgan stated (and I agreed) that ERC is truly “broken”, so the relay does not have ERC. To illustrate his point, Morgan had pointed to Reference Model 6 in the Guidance and Technical Basis of CIP-003-6, which had diagrammed exactly this case – although the reference was technically to LERC (Low impact ERC), rather than ERC.

Wishing to be as nice to NERC as possible, I stated in the webinar that I believed both NERC and Morgan were right, since they were contemplating different types of devices. However, I suspected that NERC had meant their statement to apply more broadly to any device that takes in a routable communications stream on one end and emits serial on the other, so they probably weren’t contemplating any exceptions to their rule. But I firmly believed that Morgan had gotten it right and there was something called a “protocol break” that would break ERC.

About a month after that webinar, I changed my opinion on ERC. It didn’t happen in a blinding flash of light on the road to Damascus. Rather it happened when I started trying to understand the implications of FERC’s NOPR, and specifically the section entitled “Definition – Low Impact External Routable Connectivity” (paragraphs 68-70). I came to believe that, while FERC’s statement had addressed only LERC, it was impossible not to consider it a statement about ERC as well.

You can read about what I thought in this post, but to briefly summarize. FERC made it very clear they didn’t understand what a “protocol break” was; therefore, they didn’t think it could be invoked as a way to remove ERC. I concluded by saying I believed NERC was working on a Lesson Learned on ERC, and it would be a mistake if NERC repeated Morgan’s argument (and mine) that there is something called a protocol break, and that it “breaks” ERC. While FERC couldn’t force NERC to rescind this opinion (since they’ve already approved the definition of ERC in Order 791), it just wouldn’t be a good idea to fly in FERC’s face on this issue.

However, in the post I should have asked the question whether there are any other ways that ERC can be “broken” by a device (like an RTU) that communicates routably to the outside world, but serially to one or more other devices. There is one way that came up in FERC’s discussion in the NOPR. The wording in Reference Model 6 in CIP-003-6 identifies authentication as another condition that would break LERC (and by implication ERC as well). If the device that translates routable to serial also requires the user on the routable end (e.g., at the control center) to re-authenticate before it will pass their communications on to the serially-connected device, then ERC is broken as well.  FERC didn’t comment on this at all, but they also didn’t rule it out. So I think it’s safe to say they are comfortable with ERC being broken when re-authentication is required.

I can think of another example, which I brought up in the first of four posts last year that discussed the ERC issue. This is of an RTU that is configured just to poll the serial devices and pass the data on to the EMS; there is no inbound communications that is passed on to the serial devices in any form. This seems to be a good example of another way in which an intermediate device can break ERC, when a transition between serial and routable communications is involved. There may be other examples as well.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

Tuesday, September 1, 2015

Reminder – CIP v5 Workshop at EnergySec Summit

As I announced in this post, Matt Light and I will be doing a workshop titled "Exploring the 'Implicit Requirements' in NERC CIP version 5 – What’s not stated as a requirement is just as important as what is" on the first day of EnergySec’s upcoming Security and Compliance Summit in Arlington, VA on Sept. 14.

While we have had a good signup so far, there is still room for more. I hope you’ll consider joining us. I’m certainly discovering a lot of “requirements” that are “hidden” in CIP v5 and I’m looking forward to discussing them, as well as hearing what other people have discovered.

Here’s the full story:

More than was the case with the previous CIP versions, NERC CIP version 5 includes a number of “implicit requirements” – i.e., steps that an entity should take in order to comply with the written requirements; these implicit requirements aren’t themselves explicitly stated in the standards. They occur in many of the CIP version 5 standards, although there is a large concentration in CIP-002-5.1. Complying with them is as important as it is for the “explicit” requirements.

Tom Alrich and Matt Light of Deloitte Advisory will lead the discussion of this issue.  Matt and Tom will present the implicit requirements they have identified so far; workshop attendees are welcome to bring up others they have identified. The workshop is intended to be completely interactive, and the goal is to identify and discuss all of the implicit requirements in CIP v5, as well as how entities should “comply” with them. This will help NERC entities have a full picture of what they actually have to do to comply with CIP version 5.

At the end of the week before the Summit, all workshop registrants will be emailed the preliminary list of implicit requirements, including discussion of each. This list will be revised after the workshop, and may be revised in the future as well; all workshop registrants will receive these updates.
Tom Alrich has been helping NERC entities comply with NERC CIP since 2008, first with Encari LLC and then with Honeywell Process Solutions. Tom is now part of Deloitte Advisory, where he is a Manager in the Cyber Risk Services practice, specializing in Power and Utilities.  Tom started attending and writing about the NERC CSO 706 (CIP) Standards Drafting Team meetings in 2010, as CIP versions 4 and 5 were drafted.  Since early 2013, he has written a popular blog on developments in implementation and interpretation of CIP version 5.  Tom has a Bachelor’s degree in Economics from the University of Chicago and lives in Evanston, Illinois.

Matt Light is a Manager within Deloitte Advisory’s Cyber Risk Services practice. He has over 8 years of experience working with electric power utilities on critical infrastructure protection and cyber risk management, first with the US Department of Energy (DoE) and more recently with NERC. His projects have included development of frameworks for building a cybersecurity program and measuring the maturity of the program relative to industry best practices. He also has considerable experience with collaborative efforts between the US government and industry, focusing on cyber threat information sharing and analysis capabilities.

Matt has a Master of Public Policy degree from Georgetown University and a Bachelor’s degree in Materials Engineering from Rensselaer Polytechnic Institute.

There is a $300 fee for the workshop, which goes entirely to EnergySec - a good cause! I hope you can join us for this. To register for the Summit and the workshop or to get more information, go here.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.