Tuesday, July 19, 2016

Son of LERC

My last post was about the CIP version 7 Standards Drafting Team’s discussions regarding their FERC-mandated task of developing a new definition of the term LERC (Low impact External Routable Connectivity) in CIP v6; they are currently finalizing the first draft for posting and a NERC ballot. As I stated in the post, the team decided that their revised definition necessitated revising the requirement to which it applies (which is CIP-003-6 R2, and specifically Section 3 of Attachment 1 – since R2 itself just refers to the attachment).

In the post, I described the discussions that led to this requirement being revised to be what I call “non-prescriptive”. However, in that post I didn’t discuss what is actually in the new definition and requirement. I will do that now. This is important because NERC entities with Low impact assets will have to comply with the revised requirement and definition, not the current one. In other words, CIP-003-7 and the new LERC definition will almost certainly come into effect[i] before September 1, 2018, when entities have to implement physical and electronic access controls at their Low assets.

Because, as of the time I’m writing this post, the SDT has not actually submitted the revised definition and standard for posting, I won’t quote any wording – since it could still change in some small way. But I will paraphrase that wording. The purpose of this post is to let NERC entities understand in general how the LERC definition and requirement have changed, since they may already be in the process of planning for their CIP rollout to their Low impact assets.

There is another purpose of this post as well: While the SDT was not specifically asked to revise the ERC definition (i.e. External Routable Connectivity, that applies to Medium and High impact BES Cyber Systems), I am confident they will take on that task as well – since the definition of ERC is in as much need of revision as that of LERC. In fact, I believe the thinking behind the new LERC definition (and revised requirement) can probably be directly applied to ERC as well – so that the new ERC definition may flow fairly naturally, once the SDT considers it.[ii]

The SDT recently spent two and a half days in Chicago discussing LERC; this was preceded by three or four weeks of phone meetings amounting to at least two to four hours per week. I attended the entire Chicago meeting and the majority of the phone meetings. Early on, some team member (and I don’t know who) pointed out that the current definition (i.e. the one “in CIP v6”) actually includes one or more “requirements”.

To really explain what this person was referring to, I need to delve down into the deep contradiction regarding Low impact assets in CIP-002-5 R1 (while this is related to the more fundamental – or “primary” – contradiction in CIP-002 that I have referred to in other posts including this one, I won’t go into that one now. This post will be long enough as it is). The contradiction is this: Strictly speaking, in CIP v5 there is no such thing as a High, Medium or Low impact asset; there are only High, Medium or Low impact BES Cyber Systems. However, since the CIP v5 SDT wanted to make sure that an inventory of Low BCS would not be required, they took pains to make sure that any Low impact requirements would only apply to the Low assets, not the Low BCS. If they hadn’t done that, auditors would have rightly demanded to see a complete inventory of Low BCS (which would in turn have required an inventory of all Cyber Assets at Low assets).

But the SDT now faced a logical dilemma: Since there are strictly speaking no Low assets but only Low BCS, they couldn’t say that the Low requirements applied to Low assets. They came up with the way-too-cute solution of calling Low assets “assets containing Low impact BES Cyber Systems”; so the Low requirements (and there are just two of them: CIP-003-6 R1.2 and R2) are said to be required for entities with one or more assets containing Low BCS. However, the content of one of the requirements (actually a part of one of the requirements) actually applies to the Low BCS themselves! Got it? I swear, I’m not making this up. Rube Goldberg himself couldn’t have come up with something so complicated.

So let’s go to the main requirement for Lows, CIP-003-6 R2, which is detailed in Attachment 1. Attachment 1 contains four Sections, which are effectively Requirement Parts.[iii] Three of them actually only make sense at the level of the asset itself. Section 1, Cyber Security Awareness, applies to every person who works at that asset. Section 2, Physical Security Controls, requires physical access control for the asset.[iv] And Section IV, Cyber Security Incident Response, is actually an organization-wide requirement.

However, Section 3, Electronic Access Controls, clearly only applies to cyber assets, not the asset itself. You don’t electronically access a generating plant or a substation; you do electronically access the cyber assets within it. However, this requirement couldn’t be made applicable to Low BCS, since that would have required an inventory of the cyber assets. The CIP v6 SDT solved this problem by defining LERC as an attribute of the asset; CIP-003-6 R2, Attachment 1 Section 3 only applies to Low assets that have LERC. By doing that, they made sure that every BCS within the asset would be covered by Section 3, without requiring that the entity inventory all the cyber assets.

This might sound complicated so far, but now it gets even more complicated. This is because some cyber assets that are housed in a Low asset may be routably connected externally, but the BCS in that asset may not be. A simple example would be a Low impact substation that contains some relays that are Low BCS. Their sole connection to the outside world may be a purely serial link to the control center. But there could be a routable connection coming in from the corporate network to one or more computers that the technicians use to check email and to download work orders. How can the substation be said to have LERC if none of its BCS actually have it?

The v6 SDT “solved” this problem (and two similar ones) by including in the LERC definition three conditions that would “break” LERC, although they aren’t explicitly called out as such. The first sentence of the current definition reads “Direct user-initiated interactive access or a direct device-to-device connection to a low impact BES Cyber System(s) from a Cyber Asset outside the asset containing those low impact BES Cyber System(s) via a bi-directional routable protocol connection.”

The first of the implicit conditions that can result in there being no LERC (or LERC being “broken”, as I and many others say) is “denoted” by something that isn’t there. Note that the LERC definition refers to BES Cyber Systems, and says there must be a connection to them. So if the asset contains non-BCS that have external routable connectivity (as in the above example), the asset itself will still not have LERC because none of its BCS do. Assuming the BCS aren’t networked with the non-BCS - i.e. that they are air-gapped from them - then an outside system will not be able to reach the BCS via a routable protocol, and the asset will not have LERC.

The second condition is denoted by the word “bi-directional”. If the routable protocol connection isn’t bi-directional, then there will be no LERC. This uni-directionality is conferred by a device called a “data diode” or a “uni-directional gateway”. If all BCS in the asset are “behind” one of these devices, the asset itself doesn’t have LERC.

The third condition that can break LERC in the current definition is denoted by the word “Direct”. If the external routable connection doesn’t “directly” access any BCS at the Low asset, there is again no LERC. What does “Direct” mean? It is not defined, but it is illustrated by the “reference models” found in the discussion of Requirement 2 in the CIP-003-6 Guidelines and Technical Basis. Two of those models, numbers 5 and 6, depict a device that is inserted into the communications stream in some way (i.e. between the connection from the external device and the BCS itself). These devices in some way break LERC, even though there is still some sort of connection between the external device and the BCS.

The reason that the current (v7) SDT is working on the LERC definition in the first place is because FERC stated in Order 822 that they didn’t understand what “Direct” means in the definition. In other words, they don’t think Reference Models 5 and 6 illustrate a general principle that forms the basis for the word “Direct” (more correctly, they don’t understand what that principle is; they want the SDT to tell them).

Since LERC is meant to be a gating factor for the Electronic Access Control requirement, that requirement – Attachment 1 Section 3.1[v] - only applies when there is LERC. The v6 requirement currently reads “For LERC, if any, implement a LEAP to permit only necessary inbound and
outbound bi-directional routable protocol access.” LEAP is an acronym for Low impact External Access Point – i.e. a device, such as a firewall, that is inserted in the communications stream and permits only necessary inbound and outbound access. Essentially, when the asset has LERC and that isn’t “broken” by one of the three conditions just stated, the entity must implement a LEAP to protect the BES Cyber Systems located at the asset.

To return to our narrative, the unknown (to me) SDT member pointed out that the real purpose of Section 3.1 is to protect against the risk introduced when the Low impact asset has LERC. One way to mitigate this risk is to implement a LEAP. But the three implicit conditions in the LERC definition that break LERC also constitute ways that the risk can be mitigated. So why have three possible mitigations included in the definition, while another is listed in the requirement? Why not define LERC narrowly without any mitigations, and list the four mitigations in the requirement? This was, IMHO, a very perspicacious argument, and it formed the basis for the SDT’s entire approach to meeting FERC’s mandate for a new LERC definition. But this meant that, instead of just changing the definition, the SDT had to change the requirement itself, as well as the discussion in the Guidelines and Technical Basis.

Once the mitigations were removed from the LERC definition, it now states simply that, if any external routable communications cross the (physical) boundary of the Low asset, there is LERC, period.[vi] Having established this definition, the SDT then set out to revise the requirement itself (again, FERC had only mandated that the definition be changed. But since the new definition required removing the implicit requirements from the old definition and moving them to the actual requirement, this meant the requirement itself had to be changed).

The SDT’s first “draft” of the new requirement read something to the effect of “If there is LERC, take one of the following actions to mitigate the risk posed by it.” This was followed by a list of steps the entity could take, including:

  1. “Air gap” the BCS from the external routable communications.
  2. Implement a “data diode” to make the communications unidirectional.
  3. Require re-authentication by some intermediate device, before allowing connection to the Low BCS.
  4. Terminate the routable protocol session and establish a new one to the Low BCS (e.g. in a device like a proxy server).
  5. Implement a device that restricts communications from all devices or users except those authorized to access the Low BCS.

Note that the first two items correspond to two of the three conditions that break LERC, in the current v6 definition. And the last item more or less describes the LEAP, which is currently the only mitigation listed in the requirement. But what happened to the third condition that breaks LERC: the lack of “direct” routable connectivity to the BCS in the asset? This condition has now been “defined” as one of two conditions, namely items 3 and 4 above. In other words, the SDT answered FERC’s question about what “Direct” means by saying it amounts to the routable connection not being interrupted by either a) re-authentication or b) termination and re-establishment of a new session.

This might seem unremarkable, unless you consider one of the big bugaboos of the External Routable Connectivity discussion last year: the concept of the “application layer (or ‘Layer Seven’) protocol break”. This concept first appeared in Reference Model 6 in the Guidance and Technical Basis discussion of CIP-003-6 R2. There was a lot of debate about what that meant (which I discussed in close to ten posts. This one addressed it most directly). And FERC, in their NOPR of July 2015, expressed a lot of skepticism about the term. My final post on this issue (just linked) concluded that there could be no comprehensive dictionary-style definition of what this term means; it can only be “defined” by providing use cases. And that is what the SDT has done. There are now two use cases in the place of the word “Direct” in the LERC definition.

To summarize the discussion so far, the SDT at first decided to define LERC in a way that removed any mitigations from the definition itself and placed all mitigations in the requirement. The new requirement would simply say that, when there is LERC (by the new definition, of course), one of the five mitigations listed above needs to be implemented.

At first glance, I thought this was the solution to the problem. However, someone quickly pointed out that not all of the mitigations are of the same status. For example, the air gap and uni-directional gateway mitigations both can be said to be fairly comprehensive. Not only will they prevent BCS access by non-authorized sources, they will prevent it by all sources. On the other hand, there might be cases where mitigations 3-5 might not be enough by themselves; they might need to be combined (especially 3 and 4) in order to provide adequate protection. But what are the exact criteria that will determine whether one of these mitigations is adequate, and which other mitigation it should be combined with? And who is to say that there might not be other perfectly adequate mitigations, that simply hadn’t been brought up so far?

At this point, it became apparent that to keep the requirement in the prescriptive form – i.e. “If you have LERC, you need to implement one of the following mitigations..” – would take a lot more discussion and would probably never produce a definitive set of mitigations. So a suggestion was made that the requirement be made very simple, with discussion of mitigations moved to the Guidance and Technical Basis. Of course, this meant that it was now going to be up to the judgment of the auditor whether or not the entity had effectively mitigated the additional risk posed by the presence of LERC at the asset.

The requirement now reads something to the effect of, “If you have LERC, you need to take measures to mitigate the risk.” Meanwhile, the Guidance has been rewritten (with new reference models) to accommodate its new role, since all of the “meat” of the LERC definition and the requirement is now in the Guidance (I will probably have a post on the new Guidance when it is available). As I discussed in my previous post, this requirement has now become a non-prescriptive one (or it will be, when approved by NERC and FERC), joining the other two non-prescriptive standards: CIP-007-6 R3 and CIP-010-2 R4. Hopefully there will be more in the future!

What about ERC?
Near the beginning of this post, I mentioned that the new LERC definition could well serve as a model for a new External Routable Connectivity (ERC) definition (which is also something the SDT intends to work on, although it wasn’t strictly required of them in their SAR). Indeed, I think that the SDT may have already done all the heavy lifting required for a new definition. The current (CIP v5) ERC definition reads “The ability to access a BES Cyber System from a Cyber Asset that is outside of its associated Electronic Security Perimeter via a bi-directional routable protocol connection.” Just like the current LERC definition, this definition implicitly includes two possible mitigations: a data diode (which would nullify the “bi-directional” provision) and some limitation on the “ability to access”. This latter is the same kind of open-ended provision as “Direct” in the LERC definition and, just like “Direct”, it has been the source of a lot of confusion (especially when there is an intermediate device like a protocol converter that is interpreted as in some way “breaking” the routable protocol).

Just as with LERC, ERC can be defined in a very minimal way by removing the mitigations. In the same way that the new LERC definition simply says that LERC is present when a routable connection crosses the Low impact asset boundary, the SDT can just rewrite the ERC definition to say that ERC is present when there is a routable connection into the ESP, period.[vii] And the mitigations can be put in the Guidance and Technical Basis, just as they will be for LERC. Specifically, the Guidance can say that a data diode mitigates the risk posed by ERC. And it can also provide use cases for how the “ability to access” can be removed – by steps like authentication and also terminating one routable protocol session and starting another.[viii] This should clear up the still-rampant confusion regarding ERC.[ix]

If the SDT wants to take my advice and use their LERC definition (and Guidance) as a model for ERC, they should have a much easier time addressing the latter. Essentially, the bulk of the discussion simply has to be about what guidance will be provided on how to mitigate the risk of ERC. The definition itself should be a piece of cake.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] Note that the LERC definition (and revised requirement) will be balloted and approved by NERC, and approved by FERC, long before the remaining items in the drafting team’s agenda – which effectively constitute CIP version 7 – are approved. This is because FERC set a deadline for NERC to submit the revised definition to them of March 2017. While the drafting team will most likely have developed the first draft of v7 by that time, it will without a doubt be a long way from being completely approved by NERC, let alone submitted to FERC. Effectively, this means that CIP-003-7 will come into effect at least one or two years before the rest of “CIP version 7”.

It also means that NERC entities will have to comply with standards from three different CIP versions – 5, 6 and 7 – at the same time. This in itself isn’t bad or good, but the run-up to CIP v6 showed that many CIP compliance professionals don’t understand that version numbers only apply to individual standards, not to the CIP family of standards as a whole. The danger is that some entities may believe that the new CIP-003-7, and the new LERC definition, won’t come into effect until the rest of “CIP v7” does; they will then stick with the old definition and requirement as they prepare for the Sept. 1, 2018 compliance date for the physical and electronic access controls required by CIP-003-6 R2. Hopefully, a vigorous education process on NERC’s part will prevent this from happening. It is time for NERC to do some education about version numbers, rather than continue to pretend they are still “revising CIP version 5”.

[ii] Since FERC set the March 2017 deadline for NERC to submit the revised LERC definition, and since multiple drafts and ballots will undoubtedly be required before that can happen, the SDT has made the first LERC draft their big priority recently. There is no such deadline for ERC (or indeed for anything else on the SDT’s agenda), so that discussion will follow later.

[iii] You might ask, “If they are effectively requirement parts, why weren’t they just treated as requirement parts in the first place, rather than being put in an attachment? I really can’t give a good explanation of that. For further explanation, I refer you to the well-known NERC CIP expert Lewis Carroll, who provided an excellent explanation of the logic of CIP version 5 in his two great works, Alice in Wonderland and Through the Looking Glass.

[iv] Yes, yes, I know. You’re going to point out to me that the entity has the option of only applying physical access controls to the Low BCS, not to the Low asset itself. For example, if all of the BCS at the asset are in a single room, the entity only needs to control access to that room, not to the whole asset. But this option is purely an artifact of the fact that, strictly speaking, there are no Low assets, any more than there are High or Medium assets. So the physical security requirements (for Highs, Mediums and Lows) have to apply to the BCS. I will leave it to the reader to decide whether it’s a great idea to leave all of the doors of a Low impact generating plant completely unguarded and unlocked, while still protecting the control room. I think it would be a much better idea to lock all of the doors, but strictly speaking that isn’t required by CIP-003-6 R2.

[v] You’ll notice I’ve just pulled a fast one on you. I’ve been saying so far that Section 3 of Attachment 1 constitutes the electronic access control requirement, and now I’m saying it’s actually just 3.1. This is because 3.2 applies to dial-up connectivity. While that is also electronic access (unless someone is calling in with an old crank telephone, where you have to ask the operator to connect you with so-and-so), it isn’t network access. So I should really have referred all along to network-based electronic access control (vs. “telephony-based” electronic access control). But even someone obsessed with correct word usage like me has their limits.

[vi] There was some concern on the SDT that, since the simplified definition introduces the concept of the asset boundary, there will now be a lot of concern about how to define that. Is it the plant’s fence line? Is it the actual walls? Etc. The team did start to work on verbiage for the Guidance and Technical Basis that would try to define what “asset boundary” means. But I pointed out that, given that the term “asset” itself is undefined by NERC, trying to define its boundary is an exercise in futility. Others pointed out that it doesn’t particularly matter where the boundary is drawn, since the LERC definition now doesn’t include any provisions that break it. In other words, if an entity had (inexplicably) installed a data diode between the fence line and the wall of a Low impact generating plant, this would have made a difference under the current definition, since that definition includes the “bi-directional” condition. However, under the new definition that makes no difference at all; there is still LERC on each side of the data diode, which now has become a mitigating factor listed in the requirement, not part of the definition itself.

[vii] The fact that, in ERC, the connection is into the ESP, rather than being “across the asset boundary” as in LERC, is actually a huge advantage. As was discussed in the SDT meeting where the LERC definition was worked out, there is something odd about talking about a virtual concept like a routable protocol connection “crossing” a physical asset boundary. That goes away in the ERC definition, since both the routable connection and the ESP are virtual concepts; they “live” in the same virtual space.

[viii] It is possible that the mitigations that will be recommended in the Guidance for ERC will be stronger than those that will be recommended for LERC. For example, VLANs were one method of separating networks that was discussed at the SDT meeting as a mitigation for LERC. Someone objected that VLANs were not necessarily a secure method of separating networks. I spoke up and agreed with that statement; but I also said I didn’t think the potentially large cost of replacing VLANs in Low assets with separate switches would be justified by the benefits, since we are of course talking about Lows here. In the case of Medium and High impact assets, it might be cost-effective to state that VLANs are not a secure means of separating networks. I’m sure there will be other cases like this – where the mitigations suggested for ERC will be stronger than those suggested for LERC.

[ix] I have stated multiple times, including in this recent post, that the best way to address the ERC (and LERC) definition problem is with a series of use cases: In this case there is ERC/LERC; in this case there isn’t; etc. Effectively, by opting for a minimalist definition of LERC and putting use cases – although they’re called reference models – in the guidance, this is what the SDT has done for LERC. I am now suggesting they do the same thing for ERC, although possibly with different (stronger) use cases.

No comments:

Post a Comment