Tom Alrich's Blog: July 2016

Tuesday, July 26, 2016

Stop Making Sense!

Many of you may know this was the title of a Talking Heads concert film. I thought of it in relation to a conversation I had recently with a CIP compliance professional.

The conversation was about the fact, which I have discussed previously, that NERC CIP is almost indisputably hindering deployment of important technologies on the OT networks of electric utilities. In particular, the subject was the cloud. I pointed out to this person that the literal wording of CIP-004 pretty much precludes using cloud services within an ESP (e.g. SCADA in the cloud).

This person was quite surprised by that statement, and was sure I was wrong. Being an IT person, he had deployed other applications in the cloud with no problem. He pointed out that security shouldn’t be an issue if the cloud vendor could provide an SSAE 16 report attesting to their security controls. He said it just didn’t make sense that the only area where such a report wouldn’t carry any weight would be NERC CIP.

Stroking my chin in a wise fashion (which he didn’t see since he was on the other end of a phone line), I said, “Unfortunately, CIP compliance is based entirely on compliance with the literal wording of the requirements, not on what makes sense.” And this is true! Given the prescriptive nature of the CIP standards (indeed of all NERC standards, although I think a prescriptive format probably makes sense for the other standards like COM and TOP), there is simply no way that an SSAE 16 will overcome the fact that no cloud provider will be willing to comply with the access control requirements of CIP-004. Were CIP-004 to be modified so that an SSAE 16 could be taken as an alternative compliance methodology for those requirements, then that would be one way of dealing with the problem; of course, if someone were to write a SAR for this today, it would still be 3-4 years before that change came into effect.

However, as readers of this blog are hopefully beginning to realize, I see a whole host of problems flowing from the fact that the NERC CIP standards are prescriptive[i]. I am not at all in favor of making any further modifications to the current CIP standards, other than the ongoing effort to draft CIP v7 (which I am trying to assist with, time permitting). I think the next version needs to be a non-prescriptive one, since that is the only type of standards that are sustainable in the long run (in fact, even in the not-so-long run. Were the CIP standards to become non-prescriptive tomorrow, a lot of benefits would immediately be realized. But if we keep with the current format, I strongly believe the whole current edifice of CIP will collapse of its own weight in 3-5 years. The tangible and intangible costs of the current prescriptive format are already too high, and will only continue to grow by leaps and bounds, especially as new areas are covered like supply chain security).

Were a set of non-prescriptive standards to be drafted, there would then be a requirement that read something like "For any providers of outsourced services that have access to BES Cyber System Information, take steps to ensure that appropriate security is applied". It would be up to the entity to demonstrate to the auditor's satisfaction that the cloud provider was secure, using SSAE 16 or some other method (and there would be guidelines associated with the requirement, providing suggestions of what might be acceptable evidence).

Of course, I haven’t so far said exactly what form these non-prescriptive CIP standards should take, because I am still trying to figure that part out.[ii] But I really do need to get moving on that, since there is now an urgent need for it. As I will describe in a new post shortly, the new supply chain security standard will almost certainly only be workable if it is non-prescriptive. And as discussed in my last post, NERC effectively only has about six months to draft that standard.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] With two exceptions, as discussed in this post: CIP-007-6 R3 and CIP-010-2 R4. In addition, as I discussed in this post, the new requirement part for electronic access control for Low impact assets in CIP-004-7 R2 has been initially drafted by the v7 SDT as a non-prescriptive requirement.

[ii] I am also planning on writing a book on this topic with two co-authors. However, when I have a good idea of what form the standards should take I will post it in this blog. I won't wait for the book - which definitely is at least a year away from publication.

Friday, July 22, 2016

FERC Orders a Supply Chain Security Standard

Yesterday, FERC released Order 829, a Final Rule ordering that NERC develop a CIP standard for supply chain security, and deliver it to FERC within a year of the effective date of the Final Rule (roughly two months from now). To be honest, I was surprised they did this. I hadn’t attended the Technical Conference on supply chain in January, but I have been reading the transcript lately. Since the comments from NERC entities were so overwhelmingly against a mandatory standard, and since the Commissioners who spoke didn’t seem that committed to such a standard either, I thought FERC might just order that NERC draw up voluntary guidelines or something like that.

Instead, FERC’s document was quite clear: Given the serious nature of the supply chain threats, voluntary guidelines aren’t enough. And the threats are so imminent that NERC only has a year to develop the standard and submit it to FERC. Folks, this is lightning speed, considering that:

NERC has to constitute a Standards Drafting Team to address FERC’s request. Say that takes two months.
The team has to hold at least a few face-to-face monthly meetings, plus probably multiple conference calls each of the other weeks.
The first draft will need to be put out for comment and ballot no more than probably four months after the team first meets. In other words, the SDT has to decide on the form and content of the standard by this time. That is six months from now, assuming the team is constituted and meets within two months.
The first draft will almost certainly be voted down; the SDT will have to meet two or three times and put out a second draft.
If this is voted down again, they’ll have to repeat the process.
At this point, there will probably be no time for further revisions; NERC will have to approve the standard and send it to FERC.

When I started this post yesterday, I was going to say something like “Yeah, the timeline is tough. But NERC will be able to do it. They developed CIP-014 in 90 days!” But on further thought, I think this deadline is a big mistake, and I hope NERC will petition FERC to reconsider it. We are talking about a huge (or “yuge”) undertaking here – a standard that is not only brand new, but may well have no precedent anywhere in the world. And yet FERC is essentially telling NERC to gather input and develop the standard entirely from scratch in six months!

Of course, the problem with this isn’t that NERC won’t deliver the standard to FERC by their deadline of one year. Rather, the problem is that they may deliver something that isn’t very good. And then FERC faces a tough choice:

They can hold their nose and approve the standard outright.
They can approve the standard but order changes to make it better (which has happened with every NERC CIP version except v3 and v4). In this case, the standards development process starts all over again, and it will probably be another year or more before FERC has the revised standard on their desk.
They can remand the standard. This will also send NERC back to the standards drawing board. It will also mean that there will be no supply chain standard in effect while the new version is being drafted.

Commissioner LaFleur issued a strong dissent to FERC’s decision (her comments were included in the PDF of the Order itself, as well as a separate PDF now on the main page of FERC’s web site). Her main points were:

In July 2015, FERC issued a “NOPR” for a supply chain standard within their larger NOPR which said they were going to approve CIP v6. Commissioner LaFleur points out that this was very perfunctory, and it would have been more appropriate to devote a separate NOPR just to this issue.[ii] Her main concern is that the NOPR didn’t make any suggestions about what the ultimate standard would look like, so the comments that were received (and also the presentations in the Technical Conference, in my opinion) were much more on the question of whether or not there should be a standard at all, not on what it should contain.
Given that there has been no discussion about the form of the standard, she found it surprising that FERC was proceeding to issue a Final Rule that requires the new standard to address four specific objectives: software integrity and authenticity, vendor remote access, information system planning, and vendor risk management and procurement controls. The Commissioner points out that there was no mention of these objectives in the NOPR.[iii] Therefore, “no party has yet had an opportunity to comment on those objectives or consider how they could be translated into an effective and enforceable standard.”
Commissioner LaFleur continues “NERC, industry, and other stakeholders will have no meaningful opportunity before initiating their work to provide feedback on the contents of the rule, to seek clarification from the Commission, or to propose revisions to the rule.” In other words, she is saying there is a big outreach step that is necessary before standards can be drafted; this would normally occur as part of the NOPR comment period, but FERC precluded that by publishing a NOPR that lacked any suggestion of what the new standard might look like. And FERC’s one-year deadline doesn’t allow NERC to do this outreach on their own.
The Final Rule frequently mentions flexibility; the writers were obviously proud that they had not prescribed any particular content for the new standard, other than that it has to at least address the four objectives FERC listed. However, Commissioner LaFleur isn’t impressed with this. She says “I believe that the Commission is essentially giving the standards development team a homework assignment without adequately explaining what it expects them to hand in.”
She continues “...given the inadequate process to date, I fear that the flexibility is in fact a lack of guidance and will therefore be a double-edged sword. The Commission is issuing a general directive in the Final Rule, in the hope that the standards team will do what the Commission clearly could not do: translate general supply chain concerns into a clear, auditable, and enforceable standard…” She goes on to say “While the Commission need not be prescriptive in its standards directives, the Commission’s order assumes that the standards development team will be able to take the ‘objectives’ of the Final Rule and translate them into a standard that the Commission will ultimately find acceptable.”
This all might not be terrible if, in the event FERC found NERC’s initial take on the standard unacceptable, it had the option of simply revising it. However, as the Commissioner points out, the only option open to FERC under Section 215 of the Federal Power Act of 2005 (the statue that governs FERC’s relationship with the ERO) is to order NERC to re-start the standards development process and address whatever concerns FERC expresses. As I mentioned above, this will most likely add six months to a year to the process of developing the new standard.

In her conclusion, Commissioner LaFleur says “Ultimately, an effective, auditable, and enforceable standard on supply chain management will require thoughtful consideration of the complex challenges of addressing cybersecurity threats posed through the supply chain within the structure of the FERC/NERC reliability process.” She wanted the Commission to delay the Final Rule and perhaps issue a Supplemental NOPR; however, it is clearly too late for this. I think it is up to NERC to push back and now request 18 months to two years[iv] to develop the standard, with the first 3-6 months being devoted to gathering feedback from the NERC membership on how this standard might be structured.

I was going to discuss the content of FERC’s Order, but I will leave that for a subsequent post. It’s been a long day.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I didn't have the link when I put this post up originally. It is now there.

[ii] In my post on the NOPR, I noted “I believe a separate NOPR would have been better, given that this issue is so divorced from the others discussed in this document.”

[iii] In fact, she states that FERC should have issued a Notice of Inquiry before the NOPR, so they could gather comments on what a supply chain standard might look like.

[iv] Commissioner LaFleur noted that NERC, in their comments on the NOPR, requested two years for the standards development process, if FERC decided a mandatory standard was necessary.

Tuesday, July 19, 2016

Son of LERC

My last post was about the CIP version 7 Standards Drafting Team’s discussions regarding their FERC-mandated task of developing a new definition of the term LERC (Low impact External Routable Connectivity) in CIP v6; they are currently finalizing the first draft for posting and a NERC ballot. As I stated in the post, the team decided that their revised definition necessitated revising the requirement to which it applies (which is CIP-003-6 R2, and specifically Section 3 of Attachment 1 – since R2 itself just refers to the attachment).

In the post, I described the discussions that led to this requirement being revised to be what I call “non-prescriptive”. However, in that post I didn’t discuss what is actually in the new definition and requirement. I will do that now. This is important because NERC entities with Low impact assets will have to comply with the revised requirement and definition, not the current one. In other words, CIP-003-7 and the new LERC definition will almost certainly come into effect[i] before September 1, 2018, when entities have to implement physical and electronic access controls at their Low assets.

Because, as of the time I’m writing this post, the SDT has not actually submitted the revised definition and standard for posting, I won’t quote any wording – since it could still change in some small way. But I will paraphrase that wording. The purpose of this post is to let NERC entities understand in general how the LERC definition and requirement have changed, since they may already be in the process of planning for their CIP rollout to their Low impact assets.

There is another purpose of this post as well: While the SDT was not specifically asked to revise the ERC definition (i.e. External Routable Connectivity, that applies to Medium and High impact BES Cyber Systems), I am confident they will take on that task as well – since the definition of ERC is in as much need of revision as that of LERC. In fact, I believe the thinking behind the new LERC definition (and revised requirement) can probably be directly applied to ERC as well – so that the new ERC definition may flow fairly naturally, once the SDT considers it.[ii]

The SDT recently spent two and a half days in Chicago discussing LERC; this was preceded by three or four weeks of phone meetings amounting to at least two to four hours per week. I attended the entire Chicago meeting and the majority of the phone meetings. Early on, some team member (and I don’t know who) pointed out that the current definition (i.e. the one “in CIP v6”) actually includes one or more “requirements”.

To really explain what this person was referring to, I need to delve down into the deep contradiction regarding Low impact assets in CIP-002-5 R1 (while this is related to the more fundamental – or “primary” – contradiction in CIP-002 that I have referred to in other posts including this one, I won’t go into that one now. This post will be long enough as it is). The contradiction is this: Strictly speaking, in CIP v5 there is no such thing as a High, Medium or Low impact asset; there are only High, Medium or Low impact BES Cyber Systems. However, since the CIP v5 SDT wanted to make sure that an inventory of Low BCS would not be required, they took pains to make sure that any Low impact requirements would only apply to the Low assets, not the Low BCS. If they hadn’t done that, auditors would have rightly demanded to see a complete inventory of Low BCS (which would in turn have required an inventory of all Cyber Assets at Low assets).

But the SDT now faced a logical dilemma: Since there are strictly speaking no Low assets but only Low BCS, they couldn’t say that the Low requirements applied to Low assets. They came up with the way-too-cute solution of calling Low assets “assets containing Low impact BES Cyber Systems”; so the Low requirements (and there are just two of them: CIP-003-6 R1.2 and R2) are said to be required for entities with one or more assets containing Low BCS. However, the content of one of the requirements (actually a part of one of the requirements) actually applies to the Low BCS themselves! Got it? I swear, I’m not making this up. Rube Goldberg himself couldn’t have come up with something so complicated.

So let’s go to the main requirement for Lows, CIP-003-6 R2, which is detailed in Attachment 1. Attachment 1 contains four Sections, which are effectively Requirement Parts.[iii] Three of them actually only make sense at the level of the asset itself. Section 1, Cyber Security Awareness, applies to every person who works at that asset. Section 2, Physical Security Controls, requires physical access control for the asset.[iv] And Section IV, Cyber Security Incident Response, is actually an organization-wide requirement.

However, Section 3, Electronic Access Controls, clearly only applies to cyber assets, not the asset itself. You don’t electronically access a generating plant or a substation; you do electronically access the cyber assets within it. However, this requirement couldn’t be made applicable to Low BCS, since that would have required an inventory of the cyber assets. The CIP v6 SDT solved this problem by defining LERC as an attribute of the asset; CIP-003-6 R2, Attachment 1 Section 3 only applies to Low assets that have LERC. By doing that, they made sure that every BCS within the asset would be covered by Section 3, without requiring that the entity inventory all the cyber assets.

This might sound complicated so far, but now it gets even more complicated. This is because some cyber assets that are housed in a Low asset may be routably connected externally, but the BCS in that asset may not be. A simple example would be a Low impact substation that contains some relays that are Low BCS. Their sole connection to the outside world may be a purely serial link to the control center. But there could be a routable connection coming in from the corporate network to one or more computers that the technicians use to check email and to download work orders. How can the substation be said to have LERC if none of its BCS actually have it?

The v6 SDT “solved” this problem (and two similar ones) by including in the LERC definition three conditions that would “break” LERC, although they aren’t explicitly called out as such. The first sentence of the current definition reads “Direct user-initiated interactive access or a direct device-to-device connection to a low impact BES Cyber System(s) from a Cyber Asset outside the asset containing those low impact BES Cyber System(s) via a bi-directional routable protocol connection.”

The first of the implicit conditions that can result in there being no LERC (or LERC being “broken”, as I and many others say) is “denoted” by something that isn’t there. Note that the LERC definition refers to BES Cyber Systems, and says there must be a connection to them. So if the asset contains non-BCS that have external routable connectivity (as in the above example), the asset itself will still not have LERC because none of its BCS do. Assuming the BCS aren’t networked with the non-BCS - i.e. that they are air-gapped from them - then an outside system will not be able to reach the BCS via a routable protocol, and the asset will not have LERC.

The second condition is denoted by the word “bi-directional”. If the routable protocol connection isn’t bi-directional, then there will be no LERC. This uni-directionality is conferred by a device called a “data diode” or a “uni-directional gateway”. If all BCS in the asset are “behind” one of these devices, the asset itself doesn’t have LERC.

The third condition that can break LERC in the current definition is denoted by the word “Direct”. If the external routable connection doesn’t “directly” access any BCS at the Low asset, there is again no LERC. What does “Direct” mean? It is not defined, but it is illustrated by the “reference models” found in the discussion of Requirement 2 in the CIP-003-6 Guidelines and Technical Basis. Two of those models, numbers 5 and 6, depict a device that is inserted into the communications stream in some way (i.e. between the connection from the external device and the BCS itself). These devices in some way break LERC, even though there is still some sort of connection between the external device and the BCS.

The reason that the current (v7) SDT is working on the LERC definition in the first place is because FERC stated in Order 822 that they didn’t understand what “Direct” means in the definition. In other words, they don’t think Reference Models 5 and 6 illustrate a general principle that forms the basis for the word “Direct” (more correctly, they don’t understand what that principle is; they want the SDT to tell them).

Since LERC is meant to be a gating factor for the Electronic Access Control requirement, that requirement – Attachment 1 Section 3.1[v] - only applies when there is LERC. The v6 requirement currently reads “For LERC, if any, implement a LEAP to permit only necessary inbound and

outbound bi-directional routable protocol access.” LEAP is an acronym for Low impact External Access Point – i.e. a device, such as a firewall, that is inserted in the communications stream and permits only necessary inbound and outbound access. Essentially, when the asset has LERC and that isn’t “broken” by one of the three conditions just stated, the entity must implement a LEAP to protect the BES Cyber Systems located at the asset.

To return to our narrative, the unknown (to me) SDT member pointed out that the real purpose of Section 3.1 is to protect against the risk introduced when the Low impact asset has LERC. One way to mitigate this risk is to implement a LEAP. But the three implicit conditions in the LERC definition that break LERC also constitute ways that the risk can be mitigated. So why have three possible mitigations included in the definition, while another is listed in the requirement? Why not define LERC narrowly without any mitigations, and list the four mitigations in the requirement? This was, IMHO, a very perspicacious argument, and it formed the basis for the SDT’s entire approach to meeting FERC’s mandate for a new LERC definition. But this meant that, instead of just changing the definition, the SDT had to change the requirement itself, as well as the discussion in the Guidelines and Technical Basis.

Once the mitigations were removed from the LERC definition, it now states simply that, if any external routable communications cross the (physical) boundary of the Low asset, there is LERC, period.[vi] Having established this definition, the SDT then set out to revise the requirement itself (again, FERC had only mandated that the definition be changed. But since the new definition required removing the implicit requirements from the old definition and moving them to the actual requirement, this meant the requirement itself had to be changed).

The SDT’s first “draft” of the new requirement read something to the effect of “If there is LERC, take one of the following actions to mitigate the risk posed by it.” This was followed by a list of steps the entity could take, including:

“Air gap” the BCS from the external routable communications.
Implement a “data diode” to make the communications unidirectional.
Require re-authentication by some intermediate device, before allowing connection to the Low BCS.
Terminate the routable protocol session and establish a new one to the Low BCS (e.g. in a device like a proxy server).
Implement a device that restricts communications from all devices or users except those authorized to access the Low BCS.

Note that the first two items correspond to two of the three conditions that break LERC, in the current v6 definition. And the last item more or less describes the LEAP, which is currently the only mitigation listed in the requirement. But what happened to the third condition that breaks LERC: the lack of “direct” routable connectivity to the BCS in the asset? This condition has now been “defined” as one of two conditions, namely items 3 and 4 above. In other words, the SDT answered FERC’s question about what “Direct” means by saying it amounts to the routable connection not being interrupted by either a) re-authentication or b) termination and re-establishment of a new session.

This might seem unremarkable, unless you consider one of the big bugaboos of the External Routable Connectivity discussion last year: the concept of the “application layer (or ‘Layer Seven’) protocol break”. This concept first appeared in Reference Model 6 in the Guidance and Technical Basis discussion of CIP-003-6 R2. There was a lot of debate about what that meant (which I discussed in close to ten posts. This one addressed it most directly). And FERC, in their NOPR of July 2015, expressed a lot of skepticism about the term. My final post on this issue (just linked) concluded that there could be no comprehensive dictionary-style definition of what this term means; it can only be “defined” by providing use cases. And that is what the SDT has done. There are now two use cases in the place of the word “Direct” in the LERC definition.

To summarize the discussion so far, the SDT at first decided to define LERC in a way that removed any mitigations from the definition itself and placed all mitigations in the requirement. The new requirement would simply say that, when there is LERC (by the new definition, of course), one of the five mitigations listed above needs to be implemented.

At first glance, I thought this was the solution to the problem. However, someone quickly pointed out that not all of the mitigations are of the same status. For example, the air gap and uni-directional gateway mitigations both can be said to be fairly comprehensive. Not only will they prevent BCS access by non-authorized sources, they will prevent it by all sources. On the other hand, there might be cases where mitigations 3-5 might not be enough by themselves; they might need to be combined (especially 3 and 4) in order to provide adequate protection. But what are the exact criteria that will determine whether one of these mitigations is adequate, and which other mitigation it should be combined with? And who is to say that there might not be other perfectly adequate mitigations, that simply hadn’t been brought up so far?

At this point, it became apparent that to keep the requirement in the prescriptive form – i.e. “If you have LERC, you need to implement one of the following mitigations..” – would take a lot more discussion and would probably never produce a definitive set of mitigations. So a suggestion was made that the requirement be made very simple, with discussion of mitigations moved to the Guidance and Technical Basis. Of course, this meant that it was now going to be up to the judgment of the auditor whether or not the entity had effectively mitigated the additional risk posed by the presence of LERC at the asset.

The requirement now reads something to the effect of, “If you have LERC, you need to take measures to mitigate the risk.” Meanwhile, the Guidance has been rewritten (with new reference models) to accommodate its new role, since all of the “meat” of the LERC definition and the requirement is now in the Guidance (I will probably have a post on the new Guidance when it is available). As I discussed in my previous post, this requirement has now become a non-prescriptive one (or it will be, when approved by NERC and FERC), joining the other two non-prescriptive standards: CIP-007-6 R3 and CIP-010-2 R4. Hopefully there will be more in the future!

What about ERC?

Near the beginning of this post, I mentioned that the new LERC definition could well serve as a model for a new External Routable Connectivity (ERC) definition (which is also something the SDT intends to work on, although it wasn’t strictly required of them in their SAR). Indeed, I think that the SDT may have already done all the heavy lifting required for a new definition. The current (CIP v5) ERC definition reads “The ability to access a BES Cyber System from a Cyber Asset that is outside of its associated Electronic Security Perimeter via a bi-directional routable protocol connection.” Just like the current LERC definition, this definition implicitly includes two possible mitigations: a data diode (which would nullify the “bi-directional” provision) and some limitation on the “ability to access”. This latter is the same kind of open-ended provision as “Direct” in the LERC definition and, just like “Direct”, it has been the source of a lot of confusion (especially when there is an intermediate device like a protocol converter that is interpreted as in some way “breaking” the routable protocol).

Just as with LERC, ERC can be defined in a very minimal way by removing the mitigations. In the same way that the new LERC definition simply says that LERC is present when a routable connection crosses the Low impact asset boundary, the SDT can just rewrite the ERC definition to say that ERC is present when there is a routable connection into the ESP, period.[vii] And the mitigations can be put in the Guidance and Technical Basis, just as they will be for LERC. Specifically, the Guidance can say that a data diode mitigates the risk posed by ERC. And it can also provide use cases for how the “ability to access” can be removed – by steps like authentication and also terminating one routable protocol session and starting another.[viii] This should clear up the still-rampant confusion regarding ERC.[ix]

If the SDT wants to take my advice and use their LERC definition (and Guidance) as a model for ERC, they should have a much easier time addressing the latter. Essentially, the bulk of the discussion simply has to be about what guidance will be provided on how to mitigate the risk of ERC. The definition itself should be a piece of cake.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Note that the LERC definition (and revised requirement) will be balloted and approved by NERC, and approved by FERC, long before the remaining items in the drafting team’s agenda – which effectively constitute CIP version 7 – are approved. This is because FERC set a deadline for NERC to submit the revised definition to them of March 2017. While the drafting team will most likely have developed the first draft of v7 by that time, it will without a doubt be a long way from being completely approved by NERC, let alone submitted to FERC. Effectively, this means that CIP-003-7 will come into effect at least one or two years before the rest of “CIP version 7”.

It also means that NERC entities will have to comply with standards from three different CIP versions – 5, 6 and 7 – at the same time. This in itself isn’t bad or good, but the run-up to CIP v6 showed that many CIP compliance professionals don’t understand that version numbers only apply to individual standards, not to the CIP family of standards as a whole. The danger is that some entities may believe that the new CIP-003-7, and the new LERC definition, won’t come into effect until the rest of “CIP v7” does; they will then stick with the old definition and requirement as they prepare for the Sept. 1, 2018 compliance date for the physical and electronic access controls required by CIP-003-6 R2. Hopefully, a vigorous education process on NERC’s part will prevent this from happening. It is time for NERC to do some education about version numbers, rather than continue to pretend they are still “revising CIP version 5”.

[ii] Since FERC set the March 2017 deadline for NERC to submit the revised LERC definition, and since multiple drafts and ballots will undoubtedly be required before that can happen, the SDT has made the first LERC draft their big priority recently. There is no such deadline for ERC (or indeed for anything else on the SDT’s agenda), so that discussion will follow later.

[iii] You might ask, “If they are effectively requirement parts, why weren’t they just treated as requirement parts in the first place, rather than being put in an attachment? I really can’t give a good explanation of that. For further explanation, I refer you to the well-known NERC CIP expert Lewis Carroll, who provided an excellent explanation of the logic of CIP version 5 in his two great works, Alice in Wonderland and Through the Looking Glass.

[iv] Yes, yes, I know. You’re going to point out to me that the entity has the option of only applying physical access controls to the Low BCS, not to the Low asset itself. For example, if all of the BCS at the asset are in a single room, the entity only needs to control access to that room, not to the whole asset. But this option is purely an artifact of the fact that, strictly speaking, there are no Low assets, any more than there are High or Medium assets. So the physical security requirements (for Highs, Mediums and Lows) have to apply to the BCS. I will leave it to the reader to decide whether it’s a great idea to leave all of the doors of a Low impact generating plant completely unguarded and unlocked, while still protecting the control room. I think it would be a much better idea to lock all of the doors, but strictly speaking that isn’t required by CIP-003-6 R2.

[v] You’ll notice I’ve just pulled a fast one on you. I’ve been saying so far that Section 3 of Attachment 1 constitutes the electronic access control requirement, and now I’m saying it’s actually just 3.1. This is because 3.2 applies to dial-up connectivity. While that is also electronic access (unless someone is calling in with an old crank telephone, where you have to ask the operator to connect you with so-and-so), it isn’t network access. So I should really have referred all along to network-based electronic access control (vs. “telephony-based” electronic access control). But even someone obsessed with correct word usage like me has their limits.

[vi] There was some concern on the SDT that, since the simplified definition introduces the concept of the asset boundary, there will now be a lot of concern about how to define that. Is it the plant’s fence line? Is it the actual walls? Etc. The team did start to work on verbiage for the Guidance and Technical Basis that would try to define what “asset boundary” means. But I pointed out that, given that the term “asset” itself is undefined by NERC, trying to define its boundary is an exercise in futility. Others pointed out that it doesn’t particularly matter where the boundary is drawn, since the LERC definition now doesn’t include any provisions that break it. In other words, if an entity had (inexplicably) installed a data diode between the fence line and the wall of a Low impact generating plant, this would have made a difference under the current definition, since that definition includes the “bi-directional” condition. However, under the new definition that makes no difference at all; there is still LERC on each side of the data diode, which now has become a mitigating factor listed in the requirement, not part of the definition itself.

[vii] The fact that, in ERC, the connection is into the ESP, rather than being “across the asset boundary” as in LERC, is actually a huge advantage. As was discussed in the SDT meeting where the LERC definition was worked out, there is something odd about talking about a virtual concept like a routable protocol connection “crossing” a physical asset boundary. That goes away in the ERC definition, since both the routable connection and the ESP are virtual concepts; they “live” in the same virtual space.

[viii] It is possible that the mitigations that will be recommended in the Guidance for ERC will be stronger than those that will be recommended for LERC. For example, VLANs were one method of separating networks that was discussed at the SDT meeting as a mitigation for LERC. Someone objected that VLANs were not necessarily a secure method of separating networks. I spoke up and agreed with that statement; but I also said I didn’t think the potentially large cost of replacing VLANs in Low assets with separate switches would be justified by the benefits, since we are of course talking about Lows here. In the case of Medium and High impact assets, it might be cost-effective to state that VLANs are not a secure means of separating networks. I’m sure there will be other cases like this – where the mitigations suggested for ERC will be stronger than those suggested for LERC.

[ix] I have stated multiple times, including in this recent post, that the best way to address the ERC (and LERC) definition problem is with a series of use cases: In this case there is ERC/LERC; in this case there isn’t; etc. Effectively, by opting for a minimalist definition of LERC and putting use cases – although they’re called reference models – in the guidance, this is what the SDT has done for LERC. I am now suggesting they do the same thing for ERC, although possibly with different (stronger) use cases.

Monday, July 4, 2016

A Positive Development

This blog’s ultimate genesis was when I attended my first NERC Standards Drafting Team meeting at Exelon’s headquarters in Chicago (where I live) in the summer of 2010. I attended a meeting of the CSO 706 SDT, the team that drafted CIP versions 2, 3, 4, and 5. This meeting was quite interesting, since the SDT had just pivoted from developing what would become CIP v5 to developing what would be v4[i] (of course, they pivoted back to v5 after v4 was developed). I thought this was very interesting, and wrote a white paper which got the attention of a large number of NERC entities. This ultimately led to my current blog.

So it was kind of a return to the beginning when I attended the second meeting of the current SDT, also at Exelon in Chicago, last week. Of course, this is a different SDT, the one that is developing CIP version 7 (more specifically, the “Modifications to CIP” SDT). I must admit I really enjoy being able to participate in these meetings (not being an SDT member I can’t vote, but I am able to participate – and I certainly wasn’t shy about doing so!). In fact, I recommend that every NERC CIP compliance professional try to attend at least one of these, even if you don’t plan to participate. It is incredibly interesting to see how the sausage is made, not just to have to eat it.

I must admit that I had somewhat conflicting opinions on how important would be the work of this drafting team. Anyone who has read almost any of my blog posts since May of 2013 knows that I have a lot of problems with the wording of many of the requirements in CIP v5 (not too much with v6, but that is of course based on v5). So on the one hand, I should be very eager to work with the new SDT to address some of those problems.

On the other hand, since the beginning of this year I have put out a few posts - such as this one and this one - that take a different tack. They don’t criticize the v5 wording, but do criticize the whole idea of what I call “prescriptive standards” – which is what all the CIP versions have been. It may be clear from these two posts that I think the prescriptive approach doesn’t work for cyber security standards, even though it might be fine for the other NERC standards. And since the requirements of CIP v5 and v6 are prescriptive ones (with two exceptions, which I’ll discuss below), it seemed certain that v7 will be prescriptive as well.

Given that, why should I be concerned with v7, when I think the whole prescriptive approach to CIP should be thrown out? My answer was always – until this week – that it was worthwhile to clean up some of the problems in CIP v5 and v6, which is what the SDT is tasked with doing. But I wanted v7 to be the last set of prescriptive CIP standards. As I said at the time, maybe in a couple years there will be enough consensus that NERC can start working on a completely new CIP version, which would be the first “non-prescriptive CIP”.

In other words, I wasn’t expecting any big developments at the SDT meeting last week. The meeting was almost entirely focused on a single item in the SDT’s “agenda”: developing a new definition of LERC, as directed by FERC. The team was focusing on this item because it is the only agenda item that needs to be accomplished by a designated date, since FERC set a deadline for this one item. It needs to be drafted, balloted (almost certainly multiple times), approved by NERC and sent to FERC for approval by March of 2017. Given what has to occur before that date, the team decided they needed to approve the first draft by the week of July 4^th. And to do that, they had to dedicate almost the entire meeting in Chicago to LERC.

I was very impressed with the meeting in general (it was approximately 20 hours over two and a half days). It was well organized and accomplished a tremendous amount – which included not only rewriting the definition of LERC (Low impact External Routable Connectivity), but also rewriting the two requirement parts associated with it, as well as the associated VSLs and Guidance and Technical Basis.[ii] However, the most interesting part for me was how the discussion went in a different direction from what I expected, or probably from what anybody in the room expected. What was drafted was a non-prescriptive requirement! I will recount here how this development came about. In doing so, I’m not trying to be chronologically accurate, but I am trying to recreate the logic that led the discussion to this surprising conclusion.

In Order 822 in January, FERC ordered NERC to revise the definition of LERC, the main part of which reads “Direct user-initiated interactive access or a direct device-to-device connection to a low impact BES Cyber System(s) from a Cyber Asset outside the asset containing those low impact BES Cyber System(s) via a bi-directional routable protocol connection.”

FERC’s main concern with the definition was the word “Direct”. This concern was not due to FERC’s being worried that there was some contradiction between the way the word was used in the definition and the dictionary definition of the word. Rather, it was due to the fact that, in two of the “Reference Models” (numbers 5 and 6) that the SDT had inserted in the Guidance and Technical Basis section to explain LERC, the SDT had concluded that there was no LERC - even though there was a routable connection coming into the asset. Something had happened to cause that connection to no longer constitute LERC – and the only way that would be allowed by the definition (assuming it was a bi-directional routable connection) would be that some intermediate device or other measures was making this no longer a “direct” connection. FERC said they wanted to know what that “something” was – in other words, what can “break” LERC within the Low impact asset.

The SDT’s discussion of LERC had actually started several weeks earlier in weekly phone conversations, several of which I listened to. One thing that SDT members noticed, as they studied the existing LERC definition, was that it actually included at least two implicit “requirements” (or perhaps more accurately, “measures”) in it. First, the word “direct” effectively meant that the previous SDT (that had drafted the definition) was saying that one way to “break” LERC was to make some change (such as inserting a device) in the communications stream that made it no longer direct. Second, the word “bi-directional” meant that, if an entity implemented a unidirectional device like a data diode, LERC would also be broken.

Effectively, the CIP v6 SDT (which drafted the current LERC definition) was saying that the risk presented by the presence of LERC could be mitigated in at least three ways. One was to implement a LEAP (Low impact External Access Point), as stated in CIP-003-6 R2 Attachment 1. Another was to break LERC by implementing a device – like a data diode – that made the connection unidirectional, as implied by the definition. A third way was to break the “direct” connection with some sort of device, as also implied by the definition. In Order 822, FERC essentially said they wanted clarity on what a device could do that would break LERC.

While the SDT wasn’t at this stage disagreeing with the substance of either of the two implicit “requirements” in the current LERC definition, they decided they wanted to have a definition that was free of any requirements. After a lot of discussion, they came up with a new definition (which isn’t finalized yet, so I can’t provide any wording) that essentially says there is LERC if a routable connection from anywhere outside of the Low impact asset crosses the boundary of that asset. It doesn’t make any mention of “bi-directional” or “direct”. The intention was to specifically address these two items in the requirement.

Having decided this, the question then came down to what should be required of an entity that has LERC at a Low impact asset, beyond the three options discussed above. In other words, given the new simplified LERC definition, what should be in the requirement for Low assets that have LERC, in order effectively to mitigate the risk posed by having LERC?

The three options above pointed the way to at least two things that need to be in the new requirement. First, the option to implement a LEAP should still be offered as one way to address the risks posed by LERC.[iii] Also, since “bi-directional” will no longer be in the LERC definition, the entity should be offered the option of implementing a device (like a data diode, although there are equally suitable devices that aren’t called by that term) that eliminates this bi-directionality.

But how can the entity address the third option: doing something to break the “direct” connection? Of course, whatever wording the SDT uses to describe this, it can’t use the word “direct” again, or even similar wording, such as “end-to-end”. However the replacement for “direct” is described[iv], it has to be explained. This led to suggestions for other actions that could be taken by the entity that has LERC, including:

“Air-gapping” any Low impact BES Cyber Systems from contact with LERC;
Requiring re-authentication before the external user can communicate with a Low impact BES Cyber System; and
Requiring an intermediate system[v] in the communications stream that will terminate the session with the external user or device, and open a new session.

At this point, the SDT tried to come up with a requirement that would incorporate all of these actions. The general form would be to say something like, “If there is LERC, the entity must take one of the following actions…” However, as they tried to do this, they realized it wouldn’t be easy to do this at all. For one thing, they realized that not all of the actions they had listed would result in LERC being completely mitigated in all cases. For example, requiring re-authentication would not adequately address the case of machine-to-machine communications. They also realized that one or more of these actions might actually be a special case of another, such as a data diode being a special case of a LEAP. And, of course, there could certainly be other actions that would mitigate LERC, as well.

At this point[vi], the SDT had two main options. The first was to admit that they needed to spend a lot of time (perhaps multiple days) discussing the different mitigations of LERC that would be permitted in the new requirement. But the second, suggested by another observer at the meeting, was to rewrite the requirement so that it simply stated that an entity with LERC needed to take steps to mitigate the risk posed by LERC, and discuss different options for doing this in the Guidelines and Technical Basis section.

My initial reaction to this suggestion was “Great, that completely changes the situation. Now none of these actions can be enforced.” Of course, I said this (to myself) because the statements in the Guidance and Technical Basis section that is included with each of the CIP v5 and v6 standards are only guidance, not enforceable. In other words, this new suggestion would mean there would no longer be any way for an auditor to state definitively whether the steps the entity took to mitigate the risk posed by LERC were “permissible” or not. One SDT member emphasized this concern by asking how compliance with this requirement would be measured, implying that it couldn’t be.

But then I thought for a minute, and realized this is exactly what I want to happen with all of the CIP requirements! By moving any discussion of how to mitigate the risk of LERC to the Guidance, the SDT is making the third section of CIP-003 R2/Attachment 1 a non-prescriptive requirement: The entity will have to take measures to mitigate the risk posed by LERC, but it is up to them to determine the best way to do this. And it is up to the auditor to determine whether or not the entity has successfully mitigated the risk.

But this isn’t a new development, since there are already two non-prescriptive requirements in CIP versions 5 and 6. I wrote about one of these requirements, CIP-007-6 R3, in this recent post. Another example is CIP-010-2 R4, which doesn’t prescribe certain actions for Transient Cyber Assets but provides guidelines for dealing with them.[vii] For both of these requirements, it will be up to the auditor to use his or her judgment to determine whether or not the entity has adequately addressed the risks involved.

I’m sure many will point to the fact that auditor judgment will be required as a reason to reject the non-prescriptive approach to CIP requirements. But aren’t we trying to get away from the bad old days of CIP v3, when there was a lot of concern about different regions and even auditors auditing the standards in different ways? My answer to this assertion is that, due to the many ambiguities and contradictions in CIP v5, auditor judgment is going to be inevitable for most of the requirements anyway. And even for those requirements where the wording is clear enough that there in theory should be no judgment needed, I believe that in actuality the auditors are unlikely to take advantage of this fact to nail the entity for even the smallest deviation from that wording. Au contraire, they are going to be much more inclined to use the limited time available for an audit to discuss how the entity could improve their overall cyber security posture. See the post just referenced for more discussion of this idea.

I am going to stop here. There is certainly a lot more to say about the idea of making NERC CIP a set of non-prescriptive requirements. I intend to revisit this topic a lot in future posts, as well as in a book that I and two other people are starting to write on this subject. My point in this post has been that the revised requirement for mitigating the risks of LERC shows that “non-prescriptive CIP” doesn’t have to be implemented as a big bang. It can be implemented one requirement at a time – and indeed, this is happening already! I predict this will happen naturally, since this and future SDTs will be almost inevitably driven to use a non-prescriptive approach to revise existing requirements and implement new ones. In fact, I hope to have a post out soon making the point that a non-prescriptive approach is the only one that could possibly work for possible future requirements for supply chain security.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I wish I could point you to a blog post that discussed this, but for the first couple years of writing about CIP, I simply wrote some white papers that were distributed through a Honeywell (actually Matrikon) web site. They are no longer available there, so if you are interested in seeing these, please email me at talrich@deloitte.com.

[ii] None of this was finalized, but the remaining work can be done by smaller groups working together online, with final approval by the full SDT. It will be finished this week, though.

[iii] Although I am not sure whether the term LEAP will be retained in CIP v7.

[iv] As I discussed in this post before the SDT meeting, I came to the conclusion last year that there is no single word, phrase, or sentence that encompasses every method that could be said to “break” LERC. I suggested that what would work best in the new definition would be providing some specific use cases (for example, requiring re-authentication before access to the end device is granted) and simply stating that LERC would be broken in these cases. This is more or less what happened, although the “use cases” were only provided in the Guidance and Technical Basis, as will be described below.

[v] Not to be confused with an Intermediate System for Medium and High impact CIP environments, of course. The way I’m using the term here, it doesn’t mean anything more than a device that is found within the communications stream starting from outside the Low impact asset and terminating at a BES Cyber System inside the asset.

[vi] And remember, I’m not speaking of a specific point in time here. I’m talking about the “point” in the chain of logic that led to the realization that a non-prescriptive requirement was needed.

[vii] The formats of these two requirements differ from each other. CIP-007-6 R3 includes three non-prescriptive requirement parts, each of which addresses a different part of the risk posed by malicious code. To understand your options for complying with any one of these, you need to go to the Guidance for suggestions. This is similar to the approach that I believe will be the basis for the revised requirement for mitigating the risks posed by LERC.

CIP-010-2 R4 requires the entity to develop a “plan” for “Transient Cyber Assets and Removable Media” (presumably, for mitigating the risk thereof. However, the requirement is silent about that, which strikes me as a big oversight). The plan has to include each of the applicable sections of Attachment 1. Each of these sections discusses a particular area that needs to be addressed, such as “Transient Cyber Asset Authorization”, and provides some high level suggestions as to how this can be done, without going into any prescriptive details like removing access authorization immediately when an employee is terminated. The Guidance provides more detailed suggestions on how to address each of these areas, but they are of course only suggestions.