Tom Alrich's Blog

Sunday, April 17, 2016

Should CIP v5 and v6 be Rewritten?

I recently wrote a post discussing what is in NERC’s recent Standards Authorization Request (SAR) for the next CIP version (which I certainly hope will be called v7; no more talk of “revisions” to v5 or v6). I said I would soon write a post on what isn’t in the SAR, but perhaps should be. That is, I’ll list changes that could be made to CIP v5 and v6, even though these aren’t called out in the SAR. I hope to have that post out right after this one – and hopefully in advance of NERC’s CIP Technical Conference in Atlanta on April 19, which I am looking forward to attending.

However, I recently realized that, before I do that post, I need to address the question whether the team drafting the new version should go back to fundamental principles and “rewrite” CIP v5 or not. This might seem like an odd question, but it was something I was advocating until five or six months ago, and I have heard that at least one large NERC entity is currently pushing this very course of action.

Why would CIP v5 need to be rewritten? That’s an easy question for me to answer. It’s because there are two types of problems with CIP v5[i]:

Problems that can be addressed without rethinking any of the fundamental concepts in v5. For example, the term “custom software” in CIP-010-2 R1 isn’t defined, and has caused a lot of confusion for NERC entities. This problem can be fixed by coming up with a definition.
Problems that can’t be addressed other than by opening up the fundamental concepts in v5. For example, the fact that CIP-002-5.1 R1 and Attachment 1 were written simultaneously from two different points of view[ii] – and that these were never reconciled - leads to confusion in a number of areas. One example of this was the big controversy over the far-end relay issue, which was mostly due to the widespread (and mistaken) belief that CIP-002 Attachment 1 classifies assets (i.e. “big iron” – control centers, substations, etc) as High, Medium or Low impact.[iii]

From everything I have seen so far, including the SAR, the Standards Drafting Team is only being tasked with addressing problems of the first type, not the second type. And I can certainly understand this; going back to debate fundamental concepts like the asset identification and classification process could easily add six to twelve months to the SDT’s work. Since I’m currently estimating that – even without this fundamental debate – it will be at a minimum three years, and more likely four or even five years, before CIP v7 comes into effect, this is no small consideration.

But what is lost by not addressing the fundamental problems? For one, these problems are creating confusion, just like the non-fundamental ones are; getting them resolved will make it much easier for entities to comply with CIP v7 (which will otherwise include the same contradictory wording found in CIP-002-5.1) and for auditors to audit them. This was evident to me at the RF CIP workshop last week in Columbus, Ohio, where there were discussions about some fundamental questions that should have been settled three years ago. They shouldn’t still be causes of confusion now - less than three months before the compliance date.[iv]

But there is a bigger issue here: I have said previously that CIP v5 (and v6) will never be enforceable in the strict sense, unless it is rewritten to address these fundamental issues. And what do I mean by enforceability in the “strict sense”? I mean that, should a violation of CIP v5 be challenged in the civil courts, I simply don’t see how the violation (and its associated fine) could be upheld. At that point, CIP v5 and v6 (and v7, if the SDT doesn’t fundamentally rewrite CIP-002) would turn into nice guidelines to follow, not enforceable standards. What would happen at that point is anybody’s guess.[v]

Up until five or six months ago, I was advocating that CIP-002 be rewritten from scratch. However, some of you may have noticed that I have changed my tune now: I now think that the fundamental problem with NERC CIP is that it is a set of prescriptive standards, and prescriptive standards don’t work for cyber security – risk-based standards work. For that reason, the fact that rewriting CIP v5 might make it enforceable no longer excites me, since it will remain a set of prescriptive standards.

However, I recently heard that one or more large NERC entities are advocating for a complete rewrite of CIP v5, presumably to address both the clarity and enforceability problems. I certainly don’t want to discourage them. CIP v5 and v6 will clearly be around for a while, and if there is a will on the part of NERC entities – and the SDT – to try to make these standards both clear and enforceable, I will certainly support that effort.

I also realize that perhaps I have been exaggerating the amount of work that will be required to rewrite CIP-002. The biggest problem with that standard is the fact that CIP-002-5.1 R1 and Attachment 1 are written from two different points of view, and haven’t figured out what they want to be when they grow up. However, as I stated in this post, the NERC entities and regions have come to a remarkably consistent consensus on how to “comply” with this wording; they are just about universally following one of these two points of view, which happens to be pretty much the approach used in CIP versions 1-4.

In this approach, the entity starts with the “big iron” – control centers, substations, etc. - then classifies these High, Medium and Low impact. Once they have done that, they identify BES Cyber Systems at the High and Medium assets; the BCS take the classification of the asset. They come out with the three things that are required by R1: lists of High and Medium impact BCS and a list of Low impact assets (aka “assets containing Low impact BCS”, in the rather strange circumlocution adopted to try to bridge the unbridgeable gap between the two points of view in R1 and Attachment 1).

So the problem isn’t that entities and auditors don’t understand how to comply with CIP-002-5.1 R1; the problem is that the way they are complying with it doesn’t fit with the words of R1 and Attachment 1 (more specifically, it doesn’t fit with some of those words. It does fit with others). In one sense, the solution is simple: simply rewrite CIP-002 so that the words fit what everyone is actually doing anyway. This would be one giant step toward making CIP v5 and v6 enforceable in the strict sense. And I don’t think this would take much time.

But I need to throw a caution in here: It is very possible that CIP v5 and v6 will be unenforceable in the strict sense, no matter how much time the SDT spends resolving the fundamental problems in CIP-002-5.1. My reasoning for saying that can be found in these posts: here and here. If the SDT does decide to address these fundamental problems – as I believe they should – they shouldn’t do so with the idea that this will make CIP v5 enforceable in the strict sense; I believe that ship has already sailed.

Note April 18, 2016: It just occurred to me that rewriting CIP would make all the sense in the world if it could be rewritten as a risk-based standard. I have just been assuming that the consensus needed to do that is still years away. However, it is definitely the most logical thing to do: Simply leave v5 and v6 in place as they are, warts and all, and start work on a completely new v7. But I'd say that's the stuff of dreams at this point.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Note that, when I mention “rewriting” v5, I’m implicitly saying the same thing about v6. Since the fundamental problems are mainly found in CIP-002-5.1 (which is part of v5, of course), that is the only standard that would have to be substantially rewritten. However, there would be further changes required in all of the standards, both the v5 and v6 ones.

[ii] One point of view is that Cyber Assets become BES Cyber Assets if they have an inherent impact on the BES. The other point of view is that they become BCAs only if they impact a critical asset or Facility, which then has an impact on the grid. The latter is more or less how asset identification worked in CIP v1-v4. The former was an idea that came up when the team that drafted v2-v5 was starting work in 2009, embodied in this Concept Paper. Different parts of CIP-002-5.1 R1 and Attachment 1 ended up embodying both these points of view, and they were never reconciled. I discussed this in at least two previous posts: this one (the section titled “Have an Apple, Adam?”) and this one. I’ll admit I’ve never explained myself fully on this issue; that may need to be part of a book, not just a blog post.

[iii] As I said in this post, the problem with CIP-002 R1 and Attachment 1 isn’t that entities and auditors don’t agree on how to classify BES Cyber Systems as H/M/L, but that the asset classification model they all agree on doesn’t correspond to most of the wording in CIP-002. The best way to fix this problem is to rewrite R1 and Attachment 1 so that their wording follows the model that people are actually using (which IMO is quite good, and has the added benefit of being very similar to the model in CIP versions 1-4).

Note 4/21: Kevin Perry, Director of Critical Infrastructure Protection for SPP, takes exception to my reference to "auditors" above. He points out that SPP's message since CIP v5 was approved has always been that the Attachment 1 criteria are for classifying BCS, not assets. I didn't mean to say this wasn't their official position, nor that it wasn't the position of other regions, but that the procedure they advocate that entities follow - first identifying "assets likely to contain High or Medium BCS", then running these through the criteria - amounts to pretty much the same "big iron / little iron" approach as v1-3, and is understood by most entities to be basically the same approach. What I'm advocating is that the wording of R1 and Attachment 1 be changed so that it does actually reflect the v1-3 approach, since I think that one was pretty good and since just about every entity (if not every one) is actually following it anyway.

[iv] This was also evident by the fact that, when attendees were asked to raise their hands if they were ready for the v5 compliance date, only a small percentage did so. This was two weeks after April 1, which of course was supposed to be the compliance date, until less than two months ago. It seems very likely that a large number of NERC entities wouldn’t have been ready on April 1; it remains to be seen how many will truly be “ready” on July 1.

[v] Here are a couple of my guesses: 1) All the work that NERC and entities have done on v5 and v6 gets thrown out, and the industry goes back to v3, the last enforceable set of standards; or 2) Congress is so alarmed by the fact that there are no longer any enforceable cyber standards for the industry that they take responsibility for cyber regulation away from FERC and NERC and give it to some other agency, like DHS or even the military. I would say the second of these is more likely.

Wednesday, April 13, 2016

Can a Distribution Disturbance Alone Cause a Cascading BES Outage?

I freely admit I’m out of my league on this one. In my post yesterday, I stated – based on a conversation with a couple longtime reliability compliance professionals – that it was close to impossible for an outage on the purely distribution side of the grid to cause a cascading outage on the transmission side.

I continue to believe this is the case, but I did receive an email this morning from an Interested Party who has contributed to many of my posts over the years. He isn’t exactly saying that my statement was wrong, but he is pointing out conditions that might lead to a more widespread, prolonged BES outage than I’d thought possible, assuming there was an initial substantial loss of load on the distribution side. Here is what he says:

“I do not fully buy into the idea that an attack against the distribution system could not impact the BES. Understand that the Ukraine attack was concurrently directed against four distribution companies. There is no reason to believe a similar attack in the US would not target multiple distribution companies at the same time. The Transmission system impact of the attack will depend upon the current operating conditions and the amount of load shed. Even if the resultant impact is only the tripping of some generation, bear in mind that it takes a while to get generation back up after it trips. Fossil steam plants can generally get back up within 18-24 hours of available station services power. Renewable and GT/CT is pretty much instantaneous after allowing for grid synchronizing. Trip a nuke and it is days before the NRC allows the unit to be restarted. Whether fast recovery generation can restore load while the slow recovery units are brought back online will depend on Transmission congestion and total load conditions. So, yes, there are distribution outages all the time, but they are not typically widespread except during a severe weather event that damages the Transmission and distribution infrastructure. Rather, you typically lose a substation and inconvenience a couple thousand people until power can be restored.”

In other words (and these are mine), depending on the type of generation that would go offline during a widespread distribution system outage, there could be a substantial and prolonged impact on the transmission grid. This isn’t the same as a cascading outage, of course, but it does constitute a potential substantial effect on the BES.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

Tuesday, April 12, 2016

Two Lessons from the Ukraine Attack

If you haven’t read the excellent SANS/E-ISAC analysis of the Ukraine attack, I recommend you do so. They do a great job of drawing out the appropriate cyber security lessons learned. I can’t add to what they say regarding cyber practices, but I do see two lessons that can be learned regarding cyber regulation for the grid, beyond what I pointed out in this recent post.

Distribution

As I think most people know now, the substations that were attacked in the Ukraine were distribution substations, not transmission. They would therefore not have been subject to CIP. NERC and FERC don’t have any authority over distribution assets; the state Public Utility Commissions do. In fact, even though a few of the states have taken initial steps (including this recent order by the New Jersey BPU) toward cyber regulations for utilities that operate in their state, there is certainly no current national cyber regulation of distribution assets. But the question arises: Should there be such regulation?[i]

I must admit my initial reaction on learning that the Ukraine substations were distribution ones was to assert – rather breathlessly – that distribution substations pose a point of attack for the entire grid (i.e. both transmission and distribution), the implication being that they need to be regulated in something like the same way that CIP regulates transmission.

However, a NERC professional for whom I have a lot of respect organized a call with one of her colleagues to discuss my implication that distribution poses a “soft underbelly” to the transmission grid (also known as the BES). These two people pointed out two important points:

There is no significant cyber integration among distribution and transmission substations, either within a particular utility or certainly on a regional or nationwide basis. I already knew this, since I realized that most substations still have no data communications other than serial, and that is only with their immediate control center. Distribution substations are even more likely to be purely serially connected than transmission ones. Some parties have pushed the idea that there is a huge flat routable network – or even the public Internet itself – that connects a large portion of US grid assets; this is the stuff of fantasy.
Even more importantly, there is no purely electrical means by which a disturbance on the distribution grid would automatically propagate to the transmission grid; this is what I didn’t understand previously. My friends pointed out that outages happen all the time, for lots of reasons. Utilities live and die according to how quickly they can restore power after outages. But a distribution outage is not the same as a cascading transmission outage, since it doesn’t automatically propagate to other areas.[ii]

So my main takeaway from when my friends staged this “intervention” (my word, not theirs) is that the Ukraine attack, even though it did cause widespread temporary outages (restored within a few hours), is qualitatively as well as quantitatively different from the attack everyone fears in North America: an attack on the Bulk Electric System that causes a cascading outage, leading to a blackout of a large area for an extended period of time. And the former won’t ever lead to the latter.

On the other hand, neither I nor my friends believe that there shouldn’t be any cyber regulation of the distribution grid. After all, even the few hours that some 800,000 people in the Ukraine were blacked out had to be tremendously expensive, for the people and for the economy. But it is important to understand that the reason for doing this is different from the reason for regulating BES security.

The Enterprise

As you are well aware, NERC CIP – as other NERC standards – is completely asset-focused. While its purpose is to protect the BES as a whole, it does this entirely through protecting the most important assets that comprise the BES, especially generating stations, transmission substations, and control centers. This is demonstrated by the fact that CIP v5 only applies to cyber assets located at one of six asset types listed in CIP-002-5.1 R1, and that there are no protections that apply to the IT network, which is usually as big or bigger than the OT network.

For the non-CIP NERC standards, this asset focus isn’t a problem, because it doesn’t leave out very much (if anything at all) that’s important. After all, those standards are all about what happens on the grid; other areas of the company such as Accounting have no impact at all on things like grid stability and resiliency.

This mindset has clearly been applied to NERC CIP as well. That is, the only thing that matters currently in CIP are OT cyber assets. I think almost everyone involved with CIP will tell you proudly that the IT network is simply out of scope, and CIP can’t be expected to apply to that. I would have told you the same thing if you’d ask me this question last year.

But should the IT network be out of scope? Look at the Ukraine attack: It all started with well-crafted phishing emails that were opened by people who only had access to the IT network. Their systems were infected with malware, and the attackers used them as a stepping stone to the systems they were really aiming at: the workstations of engineers with OT network access. The attackers weren’t concerned about preserving the IT/OT boundary, so they attacked IT systems first because they knew they had a much better chance of succeeding than if they spent months or years trying to directly access the substation relays, which were the ultimate target.

This is why I believe that IT networks of NERC entities should be in scope for NERC CIP – but not for the prescriptive CIP we all know and (some of us) love. You may have begun to notice that in just about every post nowadays I’m beating the drum of moving CIP to a risk-based format, something like CIP-014: the entity gets an assessment of their risks and vulnerabilities, they develop a plan to address those vulnerabilities on a risk-prioritized basis, and they execute the plan. Were this to be the framework for CIP, I would absolutely argue that the assessment should include all cyber threats and vulnerabilities faced by the entity, not just those that are found only in OT assets. And the prioritization of the elements of the cyber security plan should be based on risks to the entire enterprise, not just those faced strictly by the OT network. As Ukraine showed, the enterprise needs to be protected as a whole. If IT is compromised, OT will inevitably follow.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] It is important to keep this question separate from the question whether distribution assets are vulnerable to cyber attack. I am sure utilities are currently providing appropriate cyber protection to most of their distribution substations. But the idea of mandatory national standards is that they would ensure a certain minimum level of protection is achieved for all distribution assets, just as CIP is there to ensure a (higher) minimum level of protection for BES assets. Just to give you an idea of the numbers involved, I know one utility has about 60 Medium impact substations, but they have 1,100 substations that are Low impact transmission substations or purely distribution ones. I don’t know the exact breakdown, but I’m sure a big majority of those are purely distribution.

[ii] I believe that the only mechanism by which a large loss of load could impact the transmission grid would be through the fact that a lot of generation would trip out as a result. But I don’t believe this in itself would then lead to a cascading outage. There could conceivably be a way in which a cyber attack, combined with this large load loss, would lead to a widespread transmission outage, but for the moment I’m only discussing purely electrical events. I know of no real dispute that there needs to be cyber protection for the distribution grid, to protect both against an attack causing loss of load and against the remote possibility of a combined cyber/physical attack - which could conceivably lead to a cascading outage on the BES.

Sunday, April 10, 2016

What’s in the CIP v7 SAR

This is the first of two posts on the draft NERC CIP Standards Authorization Request (SAR) recently posted for comment by NERC. This post discusses what is in the SAR; a subsequent post will discuss what isn’t there. The second post will be longer, since there’s more to say about what isn’t in the SAR than about what is in it.

The SAR starts by pointing out that there are two primary reasons why a new CIP version needs to be developed (or if you prefer, why CIP v5 and v6 need to be “revised”). The easier reason to discuss is the fact that FERC ordered some changes to NERC CIP in their Order 822. The harder reason is that the CIP v5 Transition Advisory Group (V5TAG) – the group that developed the Lessons Learned on v5 – decided there were certain v5 issues that couldn’t be addressed simply by providing guidance to NERC entities; they require revisions to the actual standards. The V5TAG provided a document discussing issues they would like the Standards Drafting Team (SDT) to address.[i]

What FERC Ordered

To address the easier part first, the SAR lists three items that were ordered by FERC:

Protection of transient electronic devices used at low-impact bulk electric system cyber systems,
Protections for communication network components between control centers, and
Refinement of the definition for Low Impact External Routable Connectivity (LERC).

I was disappointed when I saw these listed in the SAR, since – as I reported in this recent post – this list is incomplete. You can read my discussion in my post on Order 822, but I believe the most significant of FERC’s directives was where FERC stated (in paragraph 56) “NERC’s response to the directives in this Final Rule should identify the scope of sensitive bulk electric system data that must be protected and specify how the confidentiality, integrity, and availability of each type of bulk electric system data should be protected while it is being transmitted or at rest.” (Emphasis mine)

In other words, FERC wants protections for both data at rest in control centers and data in motion between control centers; neither of those protections is included in the second point above.[ii] I assume the SDT will in fact correct this problem. I’m sure FERC wasn’t joking when they said they want protections for data being transmitted or at rest (in control centers).[iii]

I discussed all of these items in my post on Order 822, so I won’t repeat that discussion here.

What the V5TAG Requested

Besides what FERC ordered, the SAR tasks the SDT with a number of changes requested by the CIP v5 Transition Advisory Group (aka V5TAG). Here are what I consider to be the main changes requested by the V5TAG:

Cyber Asset definition. Of course, the problem with this definition is that the word “Programmable” is undefined. As I’ve previously stated, I think it would be a waste of time to try to develop a dictionary-style definition of “programmable”. I think a set of use cases – e.g. “a device with any of the following characteristics is programmable…” or even better, “a device that doesn’t have the following characteristics isn’t programmable” – is the right approach here. The V5TAG’s document seems to favor this idea.

BES Cyber Asset definition. The SAR lists three desired modifications to the BCA definition. The first is “Focusing the definition so that it does not subsume all other cyber asset types.” This actually sounds a little scary (“The definition that ate all cyber asset types!”), but what it seems to be aimed at is the idea that, if all devices on the OT network (including EACMS and PCAs) are considered to have adverse impact if misused, then everything on the OT network will be a BCA. Of course, this leads to an infinite regress, where you now need other EACMS to protect the current EACMS/BCAs, and you need further EACMS to protect those, etc. I think one way to avoid this might be to distinguish between adverse impact caused by the misuse of the device itself and adverse impact caused by the fact that the device is networked with other devices that definitely are BCAs. The device in the former case would be a BCA, while in the latter case it wouldn’t. I believe this would remove both PCAs and EACMS as being automatically BCAs.

The lower bound. The second item related to the BCA definition is “Considering if there is a lower bound to the term ‘adverse’ in ‘adverse impact’.” The TAG continues “For example, is the focus of a typical generating unit the servers and operator human machine interfaces (HMI) and controller cabinets and Programmable Logic Controllers (PLCs) or is it the thousands of individual sensors and transmitters throughout the plant?” They’re clearly saying they would like there to be a clear line that would keep all those sensors and transmitters from becoming BCAs. All I can say on this one is, Good Luck!

This might seem like a question that has a straightforward answer, but suppose we come up with a technical boundary line where “impact” kicks in. For example, impact could be defined as “causing a variation in frequency of .001%” or something like that. How would an entity ever be able to establish how much impact the loss of a particular Cyber Asset would cause? And how would an auditor disprove an entity’s assertion that a Cyber Asset’s impact was below the threshold? Even if there were a straightforward way to do this, it seems to me it would lead to a nightmare situation, in which a test would have to be done at a certain point on the grid before and after any device was installed or removed from service. But how could the tester be sure that an electrical change wasn’t caused by some completely different event that happened in another part of the grid at exactly the same time? This request strikes me as impossible to fulfill. I honestly don’t see any alternative to simply having this be a matter of judgment, on both the entity’s and auditor’s parts, as is the case now with v5.[iv]

Double impact. The third item related to the BCA definition is “Clarifying the double impact criteria (cyber asset affects a facility and that facility affects the reliable operation of the BES) such that ‘N-1 contingency’ is not a valid methodology that can eliminate an entire site and all of its Cyber Assets from scope.” I actually think this contains two issues.

The first issue is the fact that the BCA definition talks about the impact that the loss or misuse of a Cyber Asset would have on the BES, when in fact – in my opinion – it is almost impossible to conceive of a Cyber Asset impacting the BES by itself; rather, that impact occurs because it first impacts an asset or Facility, and that asset or Facility impacts the BES.[v] For example, the loss of the DCS in a unit of a generating station will certainly have an impact on the BES, but only if it actually controls that unit; if the unit happens to be offline anyway, the loss of the DCS wouldn’t impact the BES. This relates to what I have referred to both as the Fundamental Problem and the Original Sin of CIP v5: the fact that a lot of CIP-002 is written under the assumption that Cyber Assets can directly impact the BES, while other parts are written under the assumption that they only have an impact through the actual assets or Facilities.[vi] I definitely support making it clear (in the BCA definition) that a Cyber Asset only impacts the BES if it impacts an asset or Facility, and the latter impacts the BES.

The second issue is whether “N-1 contingency” is a valid “excuse” for removing an asset or Facility from scope in CIP v5. I think the answer to that is definitely no. In fact, I’m surprised it’s still considered a legitimate question, since I thought it had been laid to rest – with a stake driven through its heart – in the CIP v1-v3 days. An asset’s loss may not impact the grid if there’s another asset that will “fill in” for it if it suddenly goes down. But in the event of a widespread cyber attack that brings down a number of assets (e.g. generating plants) or Facilities (e.g. lines) at once, this reasoning goes out the window. So I think there needs to be some phrase added to the BCA definition to the effect of “N-1 contingency is not a valid reason for determining there is no BES impact”.

Section 4.2.3.2. We now get to a new section entitled “Network and Externally Accessible Devices”. The first item in this section refers to Section 4.2.3.2, which is found in all of the CIP v5 and v6 standards. The issue here is that this section exempts from CIP v5 “Cyber Assets associated with communication networks and data communication links between discrete Electronic Security Perimeters.” This provision is actually meant to exempt communication links between assets like control centers and substations; there is a very similar provision in CIP v3. However, in drafting CIP v5 the SDT forgot that – unlike in v3 – there can now be BES Cyber Systems in an asset where there is no routable network and therefore no ESP. A strict reading of this exemption would lead to the communications links between one or more such assets not being exempt from CIP v5, which was clearly never the original intent of the SDT.[vii] The V5TAG has already developed a good Lesson Learned on this problem, but it now needs to be incorporated into the standards themselves.

TO / TOP Issue. For anybody affected by this, I know it’s a very big deal, and the TAG spends a lot of time discussing it. But since I don’t have any special knowledge of this issue, I won’t comment.

Virtualization. I don’t think anyone in the NERC community disputes the idea that CIP needs to take specific account of virtualization. There wasn’t any account taken in CIP versions 1-3 either, but the effect of that was that most entities didn’t even try to use any form of virtualization within the ESP. However, virtualization has become so prevalent now that many entities are using it in their CIP v5 ESPs, notwithstanding the fact that it’s no more “permitted” in v5 than it was in v3.

The only way I differ from what the TAG says on this topic is in my idea of what will be required to make CIP “virtualization-friendly”. They just suggest “revisions to CIP-005 and the definitions of Cyber Asset and Electronic Access Point”. Of course, all of these are necessary, but I believe that a lot of the other requirements will need to be updated as well. To give one example, TRE pointed out in a webinar last year that the hypervisor and VMs need to be all within the same PSP. This should be stated in CIP-006.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] NERC has also stated that existing Requests for Interpretation (RFIs) will be incorporated into the SDT’s “agenda”, but this isn’t mentioned in the SAR.

[ii] I brought this up at the NERC CIPC meeting in March and was told by Tobias Whitney and Scott Mix that it was a mistake to leave this item off the list. However, it seems the SAR was being finalized that very day, and it wasn’t corrected. I did notice that the agenda for the NERC Technical Conference in Atlanta in two weeks seems to include data in motion between control centers.

[iii] I’m told that protecting data in motion between control centers with encryption won’t be too hard. I’m also told that trying to encrypt data at rest in a control center will be problematic, since this might hinder response to a crisis. However, there are other ways to provide special protection to data at rest; the SDT will have to determine what is most appropriate.

I also just noticed that FERC says (in the quote above) that NERC should specify how the CIA of each type of BES data should be protected in motion or at rest. This is actually a bigger mandate than I’ve been interpreting it to be. It will be interesting to see how the SDT addresses this.

[iv] Of course, when you have to rely on the auditor’s judgment in something as fundamental as the BCA definition, you’re conceding the fact that CIP v5 and v6 can never really be enforceable in the strict sense, as I’ve been saying for a while. I got over this problem by becoming convinced that it’s time to move to a different approach for NERC CIP altogether; if you meditate on that idea three times a day, you may get over the problem, too. You’ll sleep much better at night!

[v] I discussed this in a post in April 2015, where I pointed out that a Cyber Asset’s “impact on the BES” needs to be broken into two parts. The first is its impact on an asset or Facility; the second is that asset or Facility’s impact on the BES. In order to know whether the Cyber Asset is a BCA, you need to ask whether there is an impact in both cases. If so, then it is a BCA; if one or both of the answers is no, then it isn’t a BCA.

[vi] I remember raising this issue a couple times with the v5 SDT in 2011. In fact, I asked if anyone could give me an example of a Cyber Asset that had a direct impact on the BES, independently of any asset or Facility. The only example that was brought up was that of a leak detector, which sits directly on a line and, if tampered with, might not identify a leak that could itself have an effect on the BES. However, as discussed in this recent post, the leak detector wouldn’t even be in scope as a BES Cyber System in CIP v5, since it’s not located at one of the six asset types in CIP-002 R1. And even if it were so located, I think it’s a case of the tail wagging the dog if we pretend that the leak detector is actually the paradigm case for any Cyber Asset that impacts the BES; it is much more the exception than the rule.

[vii] Of course, as mentioned above, FERC now wants protection of cyber assets on communications networks between control centers – although they’re not asking for it for communications with substations.

Thursday, April 7, 2016

NERC CIP and the Ukraine Attack

Ted Guttierez of SANS wrote a good blog post on March 24 entitled “Ukrainian Grid Attack: How NERC CIP-like Measures Might Have Helped”. The post takes a very sensible approach. It doesn’t ask if CIP could have “prevented” the attack (which of course is a nonsensical question), but it does ask whether having measures in place like those found in CIP v5 would have lessened the risk of the attack. Since I agree with everything Ted says, I won’t repeat his arguments here.

Ted’s conclusion is that, if the Ukrainian utilities that were subject to the attacks had been taking measures similar to those required by NERC CIP, the likelihood of the attacks succeeding would have been much less.[i] The important conclusion he draws from this is that NERC CIP, for all its problems, is actually increasing the cyber security of the North American Bulk Electric System.[ii]

I completely agree with this conclusion, which tracks what I’ve said previously. However, I no longer think the most important question is whether CIP improves security; there is no doubt that it does. The more important questions relate to the cost:

Are the benefits conferred by NERC CIP commensurate with the costs of compliance, when looked at from a North American-wide basis?
Are the large sums that are being spent on NERC CIP compliance resulting in diminished cyber security spending in other areas, that might benefit grid security even more?
Is there another approach to cyber regulation of the grid that would yield greater cyber security while costing no more than NERC CIP does currently?

The answers to the first two questions, in my opinion, are respectively No and Yes. For an explanation of why I say that, see this post (specifically, the section that is titled “Second Consideration..”). In that post, I “answered” the third question by pointing to my presentation at Digital Bond’s S4 conference in January, which unfortunately is still only available by emailing me (talrich@deloitte.com), since the videos of the presentations haven’t been made available yet.

But I won’t keep you in suspense for what I’m advocating as the answer to the third question. I think a risk-based approach is the only one that makes sense for cyber regulation. CIP is a set of prescriptive standards because all NERC standards are prescriptive; but I think our experience with CIP so far shows that prescriptive is the wrong approach for cyber security standards. Two examples of risk-based security standards that can serve as partial models for what I’m advocating are CIP-014 and the new cyber security regulations for New Jersey utilities published by the Board of Public Utilities.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] He does note at the outset that the Ukrainian attack was against the electric distribution system. Since CIP applies to the Bulk Electric System (i.e. the transmission grid), this means that technically CIP wouldn’t have prevented the attack. So he essentially rephrases his original question to assume that it applies to an “in-scope (for CIP) control center controlling in-scope substations”.

[ii] He also concludes that it would be good for owners of North American distribution assets to implement measures similar to CIP. While I think there should be some sort of standard cyber security regulations for the distribution sector across the US (and the New Jersey regulations linked at the end of this post wouldn’t be a bad model for the other states), I don’t think that just focusing on assets – whether distribution or BES (as in CIP) – is the right approach. You have to look at the whole enterprise. In the Ukraine, the initial attacks came from phishing emails sent to people on the administrative side. They clicked on the attachments, and the attackers used their desktops to attack workstations of people who had access to the OT networks. True, the OT assets – especially substation networks – would have been better protected if measures like those in CIP v5 had been in place. But anti-phishing measures on the IT network would also have made an attack much less likely to succeed, and IT networks don’t count as “assets” for CIP.

Wednesday, April 6, 2016

What Products are Compliant with NERC CIP?

I have heard there are a few vendors that say their products are “NERC CIP Certified”. Of course, I find this assertion to be amusing, since there are obviously no product certifications issued by NERC or any party authorized by them. I have heard more often that vendors refer to their products as “NERC CIP Compliant”. This is equally mistaken, but requires a little more discussion.

To be succinct, there is no way a product can be compliant or not with NERC CIP. If you think about the requirements in CIP v5 that have anything to do with cyber assets directly, they are mainly the requirements in CIP-007, but also those in CIP-005, CIP-009, CIP-010 and (perhaps) CIP-011. None of these requirements mandate that products should have particular features. Rather, they all say that the entity should do something involving their BES Cyber Systems – restrict ports and services, protect against malware, implement logging and alerting, control access, etc. None of these are product features.

But can it be said that some vendors’ products will make it easier to comply with a particular requirement than other vendors’ products, because of the features they offer? Strictly speaking, no. The CIP requirements all take care not to be seen as mandating that the entity needs to buy products with particular features, or replace products that don’t have those features. For example, if a particular cyber asset can’t do logging, CIP-007 R4 doesn’t require the entity to replace it with one that does; the words “per device capability” in R4.1 (as well as in other requirements) make that clear.

However, there is one way in which some requirements do nudge entities toward purchasing products with certain desired features, and that is Technical Feasibility Exceptions. While TFE’s are specifically designed to allow entities to continue using products that don’t have particular security features, the fact that a product would require one or more TFEs (which require a lot of work) should be enough to make an entity think twice about purchasing it – and maybe encourage them to move it up on their list of devices to be replaced, if they already own it.

For example, CIP-007 R5.1 says that BCS (and PCAs, etc) should have “a method to enforce authentication of interactive user access, where technically feasible.” This means you can still keep using that 20-year-old RTU that doesn’t allow for authentication, but you’ll pay the price for this by having to go through the trouble of submitting and maintaining TFEs. So if a vendor wants to say that their product will help you comply with CIP-007 R5.1 because it supports authentication, I guess I won’t object to that (although I’m not sure what products don’t support authentication nowadays).

But this is a long way from saying a product is “CIP compliant”.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

Friday, April 1, 2016

April 1, 2016: Everything’s Peachy!

Note: I decided a while ago I would prepare a post to appear on April 1, discussing whether entities felt they had successfully come into compliance with CIP v5 by that date. This included interviewing compliance people at a few NERC entities, which I did in late January and early February. Of course, the compliance date was subsequently moved back to July 1, so in fact this post could have appeared on that date. But I’ve decided to still run it today, since – as you can read below – I was very surprised to discover that NERC entities are almost universally in complete compliance with CIP v5. I would almost say now that there was no need to move the date back as I advocated, but I’m sure there are still a few entities that will appreciate having the extra three months to tidy up a few minor issues.

As the sun rises today, April 1, 2016, I am pleased to announce I have determined that CIP version 5 has been a huge success. As far as I can see, NERC entities feel they have come into full compliance with the CIP v5 standards, and they are confident that they clearly understand their obligations for maintaining compliance going forward. My congratulations go out to NERC for the outstanding job they have done in rolling out CIP v5.

To judge how well US utilities had come into compliance, I decided to interview three electric utilities – an IOU, a cooperative and a municipal. Let’s start with the IOU. It is admittedly a small IOU, but this choice was deliberate. I had anticipated that the small utilities might have the hardest time coming into compliance, since they don’t necessarily have the resources to devote to resolving what I perceived (wrongly, as it turns out) to be the many ambiguities and contradictions in the wording of the requirements of CIP v5.

For the IOU representative, I chose an old friend: Fred Anderson, CEO and Director of Compliance for Anderson’s Electric Utility and Screen Door Repair in West Overshoe, Nebraska. When I interviewed him in late January, Fred was quite upbeat. “Believe it or not, I think we have this thing beat, Tom. Not only do I believe we’ll be 100% compliant on April 1, I also think we know almost to a certainty what we will need to do to maintain that state of compliance indefinitely into the future. Not only do we have all the processes and procedures in place to maintain compliance, but all our SMEs have been trained and re-trained on what to do. They all know exactly what their requirements mean, and they have everything they need to do for compliance built into their schedules for the next two years.”

I asked if perhaps AEU had encountered any challenges related to understanding what the v5 requirements meant or how to comply with them. He grinned, “No, we really didn’t. I know you’ve been writing day and night for close to three years about all of the problems in the CIP v5 wording, but we really haven’t come across any significant issues in that regard. And if we did, we found that NERC had almost always already come up with a Lesson Learned that specifically addressed this issue. It was simply uncanny, the way they seem to have thought through all the possible issues with the wording of v5 years ago, and made sure that all of the guidance we needed was in place long before we even realized we might need it.”

For the coop representative, I interviewed another old friend: Janet Millman, Manager Reliability Standards Compliance for Southern North Dakota Energy Cooperative in God’s Little Acre, ND. We talked in early February, and she was just as upbeat as Fred. “You really blew this one, Tom. I don’t know where you got this idea that there was so much uncertainty about the meaning of the CIP v5 requirements. I will admit that you do have to spend some real quality time with the requirements – it’s not like the meaning jumps out at you the first time you read them. But once you do that, I don’t see how you could say there’s any uncertainty at all. Other than for a few minor wording problems, the meaning of all the CIP v5 requirements is right there in the wording.[i]

Finally, for the municipal, I chose Greater Eggwhite Power & Lighting in Eggwhite, NC. Once again, I have an old friend there – Jim Halfinch, Director of Reliability Compliance and Janitorial Services. Jim wasn’t quite as upbeat as were Janet and Fred. “I’ll be honest with you, Tom. We did have a few problems at the outset. One of the biggest was the word “Programmable” in the definition of Cyber Asset. We thought the first draft Lesson Learned that NERC came out with on this subject in early 2015 was quite good, and we started identifying our Cyber Assets using that definition.

“Then NERC turned around in April of last year and came up with a completely different definition in one of the Memoranda, and they withdrew the draft Lesson Learned. Not only did they change definitions, but they said this one was “mandatory”. That really threw us for a loop, since we’d already invested a lot of effort in identifying Cyber Assets based on the old definition; it now looked like all that effort was wasted.

“However, in early July NERC withdrew all the Memoranda and said they’d go back to using Lessons Learned. But they didn’t do a Lesson Learned on this topic, and then they said last December they were going to refer this matter to the Standards Drafting Team for inclusion in CIP v7. Since v7 won’t be in force for three or four years at least, this means we had to come up with our own definition. We looked for the original draft Lesson Learned but found it had been removed from NERC’s web site, along with other Lessons Learned (as well as the Memoranda) that were abandoned for one reason or another.

“Fortunately, we discussed this question with some of our peers, and referred back to your post on this topic from 2014[ii], and we’re quite happy with what we came out with. We have used the same approach in a few other areas where we’ve had issues. So I really can’t complain; we are now compliant and I’d say we know exactly what we need to do going forward.”

I was pleased to hear Jim say this, since I’ve found quite a few utilities with Medium or High impact assets for whom the lack of a clear definition of Cyber Asset (and other missing definitions, ambiguities and contradictions) has been a huge stumbling block and has delayed their compliance efforts substantially. So I asked him, “What made it so much easier for you than for the other utilities with Medium and High impact assets?” He replied, “Oh, we don’t have any of those. We’re all Low impact.”

I was quite surprised when he said this. I was about to point out to him that he didn’t need to do all of this work if he just had Low impact assets, since there is no need to identify BES Cyber Systems (and therefore no need to identify Cyber Assets or BES Cyber Assets) for Low assets. However, I decided just to thank him for his time and end the conversation. I realize that, if I’d brought this up, he would have argued with me, pointing out that a Low asset is “defined” in CIP-002-5.1 R1 as an “asset containing Low impact BES Cyber Systems”. How (he would have asked) could you possibly identify Low assets without first identifying Low BCS? I would then have had to explain to him that this is just one of those “Between Us Girls” areas where there’s an implicit understanding between NERC and the entities – in this case, it is the understanding that “asset containing Low impact BCS” actually means “Low impact asset”, although that phrase is strictly verboten in the prevailing orthodox interpretation of CIP v5.

I would further have had to explain that the wording of Attachment 1 of CIP-002-5.1 shouldn’t really be taken literally, since it’s actually a relic of the first draft of CIP v5, in which entities were implicitly required to identify all BCS – regardless of impact level – before they even started classifying them High, Medium or Low impact. But that would have just ruined Jim’s day and probably made him hate me forever. I’ve already done that with enough friends in the industry.

So there you have it. Other than for a few entities – like GEP&L – who have perhaps done more work than they had to, I’d say that CIP v5 has been a resounding success. I certainly hope the rollouts of v6, v7, the new CIP Supply Chain standard(s), and other new CIP versions will be equally successful.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I must admit that I found this implicit criticism of me a little unfair. I have spent an enormous amount of quality time with the CIP v5 standards. Unlike Janet, I have found many problems with the wording that I haven’t been able to resolve. But I’ll admit that, just because a requirement is worded badly enough that there can be no consistent interpretation of it, this doesn’t mean people won’t come up with their own agreed-upon interpretation. And I’m sure that if a CIP v5 fine ever gets appealed to the civil courts, any judge will be quite capable of overlooking the fact that the wording is contradictory and there are missing definitions, and still uphold the fine. After all, in the legal world, what really matters in the end is whether everybody had good intentions at the outset, not any particular words they may have written in a regulation or a contract. That’s why there are seldom any serious disagreements about what laws, regulations, or contracts mean. Of course, this is a good thing, since otherwise there might be lots of litigation tying up the court system and wasting valuable resources.

[ii] I told him he wasn’t alone in looking at that post. It has had 892 hits as of yesterday.