Tom Alrich's Blog: September 2016

Monday, September 26, 2016

Fixing the LERC Definition

In June, after watching the CIP v7 Standards Drafting Team complete work on what I thought was an excellent approach to rewriting the definition of Low impact External Routable Connectivity (LERC), I wrote an exuberant post describing the SDT’s approach, and why I thought it was such an important development. So you can probably understand why I was chagrined when I read recently that this approach had been soundly rejected by the NERC ballot body, and the SDT will probably have to start over again on this important task.

Last week, I attended the NERC CIPC meeting in Albuquerque, where LERC was an important topic of conversation, both officially at the meeting and in private conversations I had with a couple very knowledgeable CIP compliance professionals. These conversations have made me understand more clearly what were the main issues that caused the rejection.[i] And I think I know the best way for the SDT to address that rejection in their new draft of the LERC definition. Here’s some background which will hopefully support my proposal:

The current definition of LERC – developed with the CIP v6 standards - is “Direct user-initiated interactive access or a direct device-to-device connection to a low impact BES Cyber System(s) from a Cyber Asset outside the asset containing those low impact BES Cyber System(s) via a bi-directional routable protocol connection.”
There was a lot of controversy in 2015 regarding the meaning of External Routable Connectivity (ERC) for Medium and High impact BES Cyber Systems. While this controversy was specifically about ERC, it applied just as much to LERC (although NERC entities were much more focused on ERC in 2015, since the compliance date for Highs and Mediums was less than a year away). In fact, most of the controversy revolved around the meaning of the words “layer 7 application layer break” in Reference Model 6 (page 36) of the Guidelines and Technical Basis for CIP-003-6 (even though this technically applied to LERC, not ERC). This phrase was used by some entities to justify denying that some Medium impact BES Cyber Systems had ERC; they simply stated that an intermediate device like a protocol converter or an RTU would “break” ERC because it imposed a layer 7 break. Many observers – especially at NERC and FERC – thought this idea was being inappropriately used in cases where the intermediate device really didn’t break ERC at all. For example, since a protocol converter merely converts a routable data stream to a non-routable (serial) stream, it seemed to these observers that ERC wasn’t being “broken” at all, at least in this case.
FERC didn’t have any standing to intervene in the ERC dispute; they had approved the ERC definition along with the rest of CIP v5 and hadn’t ordered any changes. However, I’m guessing that they feared the same controversy would arise regarding Low impact assets as NERC entities started implementing CIP-003-6 Attachment 1 Section 3.1 (where the LERC definition applies); so they decided to clarify the issue by mandating a change in the LERC definition.[ii]
FERC approved CIP v6 – including the LERC definition – in January 2016, in Order 822. But they also directed (paragraphs 65-75) that NERC revise the LERC definition. Specifically, they seized on the word “Direct” and ordered that NERC clarify the meaning of that word to reflect the commentary in the Guidelines and Technical Basis. It seems to me that FERC thought that the controversy over the meaning of “layer 7 application layer break” could be resolved if NERC made it clear under exactly what circumstances there was direct communication and under what circumstances there wasn’t direct communication. Once that was clear, there would be no dispute over whether a particular intermediate device “breaks” LERC or not – all that would need to be done was to determine (using the clarified meaning of “direct”) whether there was direct communications or not. If the communications continued to be direct, the device clearly didn’t break LERC; if it was no longer direct, then the device did break LERC. In my post on Order 822 (second-to-last paragraph), I expressed skepticism that this approach would work, since I didn’t see any good way to define a “layer 7 break” without invoking some concept like “direct”, leading to a circular argument. As it is, my point became moot when the SDT decided to change not only the LERC definition, but the requirement itself.
As I just said, the SDT didn’t simply redefine LERC (which now stands for Low impact External Routable Communications). They also modified the “requirement part” where LERC was applied (CIP-003-6 Attachment 1 Section 3.1) to make it non-prescriptive. You can see the whole account of what they did in the post referenced at the top of this post. The principal change was that they defined LERC as external routable connectivity that crossed the “asset boundary” of the asset; as long as there is any routable traffic across that boundary, there is LERC. In other words, a device that imposes a layer 7 protocol break won’t actually break LERC itself.
At first, this might seem like an outrageous change. What was the SDT doing, thinking they could take away everything that breaks LERC? Why, using the new definition, even data diodes won’t break it! I know some people thought this way, but this was simply because they weren’t looking at everything the SDT had done. True, the SDT took away everything that “broke” LERC, but they also changed the requirement that LERC applies to (CIP-006-3 Attachment 1, Section 3.1). It currently reads “For LERC, if any, implement a LEAP to permit only necessary inbound and outbound bi-directional routable protocol access.” They changed it to read “Implement electronic access control(s) for LERC, if any, to permit only necessary electronic access to low impact BES Cyber System(s).”
Of course, this is now a non-prescriptive requirement. It no longer says you have to implement a LEAP (Low impact Electronic Access Point) whenever there is LERC.[iii] It simply says you have to take steps to mitigate the risk posed by LERC. And what steps can you take? The Guidance and Technical Basis now lists a number of those steps (in a new set of “Reference Model” diagrams), but the SDT makes clear other steps are possible as well (and it is possible that some ways of implementing the steps in the diagrams may not fully mitigate the risk posed by LERC; the auditor will have to determine whether this happens, in each case). The possible steps that mitigate the risk posed by LERC now include the different measures that used to “break” LERC, including network separation and data diodes. These were both able to break LERC because the LERC definition implicitly allowed for that possibility. If there is network separation and the BCS are on a network that doesn’t communicate with the outside world, there obviously can be no “direct” communications from the outside, so the LERC definition no longer applies. By the same token, a data diode will eliminate “bi-directional” routable communication, so the LERC definition no longer applies in that case, either. It would have been helpful if the v6 SDT had made these two provisions explicit, rather than implicit. On the other hand, there are a lot of cases - at least 40 or 50 - of other v5 and v6 requirements being implicit in the wording of the standards. It would be nice to see all of those made explicit, but I doubt that will happen.
Besides these two measures now being included in the set of suggested LERC mitigations included in the Guidance and Technical Basis, those mitigations also include a number of measures that used to be subsumed in the idea of the “Layer 7 application layer break”. These include requiring re-authentication of the user, terminating the communications and starting a new one, and requiring network- or host-based inbound and outbound access permission. So the SDT solved the ambiguity of “Layer 7 application layer break” by eliminating the term, and listing specific steps in the Guidance that might be sufficient to mitigate the risk posed by LERC.
The bottom line is that, in spite of the changes the SDT made, all of the steps that used to legitimately “break” LERC – and therefore remove the requirement to implement a LEAP – have been re-christened as mitigations to the risk posed by LERC. If you take one of these steps, you will still not need to implement a firewall, although firewalls and similar devices (which used to be called LEAPs) are also listed as possible mitigations. The only steps that would no longer be valid ones would be those that depended on a device – like the protocol converter discussed above - that was claimed to “break” the routable protocol, but which didn’t perform one of the other steps listed above, such as re-authentication or terminating one session and starting a new one (which is what a terminal server often does). And frankly, I believe FERC was aiming at precisely this result when they ordered NERC to rewrite the LERC definition. I really think the SDT hit the nail on the head in their first draft on this issue, which makes its overwhelming rejection disappointing to me.
All of this has been to say that I don’t believe the objection that the SDT had “taken away” the steps that “break” LERC in the first draft were valid. Those steps (network separation and data diodes, devices that require re-authentication or start a new session, and network or host-based access permissions) are still valid measures to take, but they are defined as mitigating the risk of LERC rather than breaking it altogether. The end result, however, is exactly the same – an entity that deploys one of these measures will most likely have sufficiently mitigated the risk posed by inbound routable communications that they will not be in violation of the requirement. The idea that the SDT “took away” data diodes and network separation as ways to comply with the requirement regarding LERC is mistaken.
But this now leads us to the more serious objection to what the SDT did: The SDT “defined” LERC as always being present when a routable connection crosses the asset boundary.[iv] A lot of entities were concerned that they are now going to have to declare LERC at a Low impact asset, even if there are not any OT cyber assets connected to it. And you know what? This is exactly the case.[v]
But what are the consequences of this? Let’s look at the case where there is a routable external connection to a network of IT cyber assets (email server, work order server, some personal workstations, etc) at a Low impact asset; there are also some BCS that are on their own network without an external routable connection at all. Yes, there is LERC, according to the new definition. But the entity merely needs to point out that the risk posed by LERC is mitigated by the network separation; I find it very hard to believe that this statement will not be accepted, unless the auditor has reason to believe there is in fact a connection between the two networks. In other words, the new LERC definition (and the revised requirement for mitigating the risk posed by LERC) will at most impose a small paperwork burden on the entity; it will not require any additional compliance steps.[vi]
However, some might say, “Why should we even impose this small burden? Let’s go back to the old way, where LERC could be broken by various means like network separation, and you didn’t have to do anything more if that were the case?” And the answer is, because FERC (and NERC, to be honest. See their Memorandum on this issue from April 2015 – except you can’t see it online, since it has been withdrawn) is very concerned about the word “direct” being misinterpreted to allow the possibility of pure protocol converters (as well as similar devices) being considered as breaking the “direct” connection. We simply can’t leave the definition as it is in CIP v6.
On the other hand, I don’t think politically the SDT can simply take their first draft on this issue and resubmit it as the second draft, even with more of an education campaign and some beefed up Guidance to show people that they were wrong in voting down the first draft. While I would like to think everybody’s minds might be changed by this, I realize the SDT has to make some changes in the next draft.

So here are my suggestions for the next SDT draft on the LERC issue:

Go back to the idea of LERC being an external routable connection to a BCS at a Low impact asset (meaning the “asset boundary” wording goes away) that can be “broken” in various ways. However, all of the items that break LERC need to be explicit, not implicit in the definition. In other words, there should be a statement saying that LERC is not present if

No Low BCS is routably connected to this external routable communications (i.e. network separation); or
The routable communications is not bi-directional (i.e. there is a data diode in place).

There should also be a provision in the definition dealing with the issue with smartphones, discussed in footnote vi below.
Of course, the word “direct” needs to stay out of the definition, since the legitimate applications of that word (re-authentication and new session) are now included as possible mitigations for LERC, and since the “illegitimate” applications of the word – for devices like protocol converters – are no longer allowed as mitigations by themselves.
Leave the revised requirement (which is found in paragraph 3.1 of Attachment 1 of CIP-003-7) as it stands from the first draft.
Remove from the set of Reference Models in the Guidance and Technical Basis the two models that deal with network separation and data diodes (since these are once again included in the definition as things that can break LERC, although that inclusion will now be explicit, not implicit).

I think the approach I’ve just described meets the primary objections to the first draft fairly well, with only a small loss in elegance in the LERC definition. When I started writing this post, I was thinking there would be a more serious consequence of doing this – requiring an inventory of all Low BCS – but now I realize that isn’t the case. All in all, it isn’t a bad solution, and I hope the SDT will seriously consider it at their meeting in Winnipeg, Manitoba this week (which I cannot attend).

One other note to the SDT: This whole episode shows the danger of being too fascinated with the idea of elegance in wording (and I admit I was fascinated with that idea too, as my original post shows), especially when it involves a fairly radical shift from the wording people are already used to.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I will be the first to admit that I haven’t actually read the voluminous comments that were submitted to NERC on the LERC issue. But the official discussion by Dave Revill of the SDT meshed very well with the private comments I heard, which leads me to believe I understand the two main objections.

[ii] I am going even further to guess that FERC realized that whatever new definition NERC came up with for LERC would end up applying to ERC as well, since it would be hard to have two different definitions for what is essentially the same thing. We’ll see if that happens or not; I know the SDT is discussing changes to the ERC definition, but they have to deal with LERC first because FERC set a deadline of early 2018 to get them the revised definition.

[iii] In fact, the SDT eliminated the definition of LEAP, since a firewall or similar device is now only one of many ways to mitigate the risks posed by the presence of LERC.

[iv] There was another objection that I haven’t discussed here, which is also a valid one: there were no mandatory criteria determining how to draw the asset boundary. This caused more than one entity to worry that an auditor would issue them a PV because his or her idea of the asset boundary didn’t correspond with the entity’s idea. While the SDT deliberately avoided saying the boundary has to be the fence line, the line around the building, etc, there was one criterion they discussed in the meeting I attended in June that didn’t make it into the guidelines: the asset boundary has to encompass all BES Cyber Systems at the asset. If the SDT had put this in the Guidance, I think it would have given an entity a good idea whether or not their definition of the boundary was valid and defensible.

[v] There is a related objection, which is that in a Low asset that is shared by multiple entities, if one entity has a routable external connection to BCS that it owns, the other entity will have to declare LERC, even though its own BCS don’t participate in that connection. This is certainly true. However, it will not impose any great regulatory burden on the other entity, who will simply point to the network separation as sufficient mitigation of the risk imposed by LERC at the asset. I will point out that there are a bunch of serious issues with CIP v5 and v6, caused by the fact that the standards currently take no official cognizance of the fact that there can be assets shared by multiple NERC entities. For several months, I have been promising a few people (and it seems that Florida is Ground Zero for this problem, since I’ve heard more about this issue from FRCC than any other region) that I will address this in a post. I still intend to do that, although I simply haven’t been able to take the time to research the different issues that make up the problem.

[vi] At the CIPC meeting, I heard another objection which sounds more serious: Since almost every smartphone is able to communicate routably, and that routable connectivity won’t usually end when its owner crosses the asset boundary of a Low impact asset, it is possible the entity would have to declare LERC many times every day, literally when almost anybody, employee or not, walks into the asset. Of course, the entity would simply have to declare in every case that the smartphone wasn’t communicating with any Low impact BCS in order to show that the LERC risk was mitigated; but it would be a royal pain to have to do this so often. I do think this is a serious objection, but it could easily be dealt with by putting some wording in the LERC definition that eliminated this possibility. I do agree that entities shouldn’t have to deal with this, since all of these instances would add up to a huge paperwork burden, in spite of the fact that there would be no further compliance obligation.

Friday, September 9, 2016

Reminder: Virtualization Webinar with Cisco and UTC

One of the most important developments in IT in the past ten years has been the rapid growth of virtualization – compute, network and storage. Use of virtualization has led to huge cost savings, as well as large efficiency gains, in IT environments – especially data centers. Even more importantly, virtualization greatly expands IT's repertoire of services they can call upon to enable new business initiatives.

However, electric utilities subject to NERC CIP requirements are still struggling to take advantage of virtualization in their OT environment, even though they realize they would receive huge benefits – especially in control centers. This is because the CIP standards are totally silent on this topic – and this silence continues in CIP versions 5 and 6. Many utilities are too worried about inadvertently falling afoul of some CIP requirement to try virtualization in OT.

At the same time as utilities implement compliance with CIP versions 5 and 6, NERC and the Regions have made it clear they want utilities to feel comfortable introducing virtualization. However, they have not provided any definitive guidance on how to do this in a CIP-compliant manner. NERC has ordered the new "CIP v7" Standards Drafting Team to develop revised requirements or guidance, so that CIP will finally address this topic. But it will be close to three years before the new version comes into effect.

Where does this leave the utilities? This webinar will try to answer that question.

Tom Alrich and Joe Andrews of Deloitte will discuss how virtualization can work under CIP versions 5 and 6, and how v7 may finally settle this issue.
John Reno of Cisco will discuss the many advantages that electric utilities and IPPs can realize through implementing virtualization on their OT networks. This includes server, switch, and storage virtualization
Steve Sumichrast of Northern Indiana Public Service Company will discuss some of the lessons learned from NIPSCO's successful implementation of virtualization in their control centers in 2011.

Date: Thu, Sep 15, 2016
Time: 2:00 PM EDT
Duration: 1 hour

Host: Bob Lockhart

Presenters:
Tom Alrich, Deloitte & Touche LLP
Tom Alrich is a Manager in Cyber Risk Services with Deloitte Advisory, part of Deloitte & Touche LLP. He has worked in cyber security for 16 years, and with NERC CIP since CIP version 1 was approved in 2008. He has worked with over 30 NERC entities to understand and implement CIP versions 1 through 6. He writes a popular blog on developments in CIP.

Joe Andrews, Deloitte & Touche LLP
Joe Andrews is a Manager in Cyber Risk Services with Deloitte Advisory, part of Deloitte & Touche LLP. He spent five years as a CIP auditor with the Western Electricity Coordinating Council (WECC). Previously, he worked in cyber security for the US Department of Defense, based in the US, Europe and Japan. He holds many certifications, including CISSP, CISA and PSP.

John Reno, Cisco
John Reno manages product and solutions marketing for Cisco IoT. Previously, John directed the product marketing group at Silver Spring Networks, drawing on over fifteen years of experience in software applications, infrastructure management and system design. For the past ten years John has launched and led go to market initiatives for network and data security companies such as Securify (acquired by Intel/McAfee) and EMC/RSA

Steve Sumichrast, NIPSCO
Steve Sumichrast is the Lead System Engineer for NIPSCO's Operations Technology department, and has worked in the department since 2010. He is responsible for implementation and adherence to NERC CIP standards for all server, workstation, storage and virtualization infrastructure used by real-time systems. He holds numerous industry certifications, including certification from Cisco, NetApp and VMware.

To register, go here

Wednesday, September 7, 2016

The Virtualization Conundrum, Part II

I wrote a post in May in which I pointed out that many NERC entities – and perhaps even the majority of them – are holding off implementing virtualization (server, switch and storage) within their ESPs. They are doing this in spite of encouragement from both NERC and the regions. Their reasoning is that, since the CIP standards through version 6 have been entirely silent on virtualization, there is a good chance they will fall afoul of one or multiple CIP requirements if they do this. Of course, this is a shame, since there are huge benefits to be realized through virtualization – as almost any IT manager can attest!

In the post, I pointed out that help is coming on two fronts. First, NERC is quite concerned about the fact (which they don’t dispute) that CIP is inhibiting technological innovation in several important areas, including virtualization. So it is likely that your region will bend over backwards to make sure you don’t create any egregious faux pas (like mixing ESP and non-ESP VMs in a single server). I and several others speakers will be making this point in a webinar on September 15.

Second, there is a permanent “fix” to the problem coming, in that the Standards Drafting Team currently at work on CIP version 7 (although I’m not sure I’m allowed to use that phrase!) has virtualization on its plate. In other words, v7 will finally a) recognize that virtualization is an important technology that can help NERC entities become much more efficient in their OT operations (especially in their control centers), and b) provide definitions and modified requirements that let entities know what the rules are for virtualized environments – so that the current huge level of uncertainty will finally be removed. This is the good news; the bad news is that it will be 2-3 years before the revised CIP standards are in effect.

The SDT has held a number of discussions of this virtualization question, both in weekly calls and in their monthly face-to-face meetings. I confess that I hadn’t had time to attend most of these, so I didn’t follow the discussion much until I attended the SDT’s three-day meeting in southern California in August. At that meeting, I was able to listen to (and participate in) the virtualization discussion, and found it very interesting.

I will point out that a good technical understanding of virtualization technologies is definitely above my pay grade, so frankly some of the discussion was over my head. But I understood enough to realize that my fairly simplistic idea of what needs to be done to incorporate virtualization into CIP is wrong.

I had previously thought that the biggest problem with server virtualization in CIP was that the definition of Cyber Asset includes the word “device”. In my mind a device is something that, if you drop it on your foot, it will hurt. Even if you can figure out how to drop a VM on your foot, it won’t hurt if you do so; ergo, a VM can’t be a Cyber Asset. Strictly speaking, this means that VMs are completely outside of CIP, even if implemented within an ESP. You should be able to virtualize all you want, while applying no protections at all to the VMs – and never face any consequences due to CIP non-compliance. Of course we all know that, even if an entity actually tried to do this, they would face overwhelming pressure from their NERC Regional auditors to either treat VMs as Cyber Assets (and BES Cyber Systems, EAPs, PACS or PCAs if applicable), or simply remove them from the ESP altogether.

This is why I previously thought that the fundamental step the SDT needed to take toward incorporating virtualization into CIP was to revise the Cyber Asset definition to include virtual cyber assets. Once they did that, I reasoned, a lot of the other requirements wouldn’t need tweaking at all, and the few that did would be fairly easy to modify. However, the discussion at the August SDT meeting quickly convinced me that incorporating virtualization into CIP will be a much bigger job than I had thought.

The clue for me came during a discussion of separating ESP VLANs from non-ESP VLANs in a virtualized switch environment. It was pointed out that simply setting up the VLANs so that they are separated isn’t good enough; that separation needs to be maintained over time. If these were physical networks implemented on separate switches, maintaining the separation would be much easier – since that separation could only be changed if there were some sort of hardware change in the switches (e.g. a new cable is plugged in, connecting an ESP switch with a non-ESP switch).

But since VLANs are virtual networks, not physical ones, maintaining their separation requires maintaining the switch software configurations so that an errant command doesn’t cause the two networks to become one. Therefore, there probably needs to be an ongoing software configuration management requirement for VLANs, that does not apply to physical networks. Of course, the SDT can certainly write such a requirement, but it will certainly not be an easy task.[i]

But there was yet another turn of the screw[ii] in this discussion. If software configuration management is required to maintain the integrity of an ESP, then the ESP definition itself may have to be changed, to call attention to the fact that an ESP can change or even be eliminated, due to a single errant switch configuration change. Again, this definition change is certainly doable, but it will not be a piece of cake as I’d originally anticipated.

The moral of this story is that dealing with virtualization alone – let alone the other items on the SDT’s plate – may turn out to require a good deal more time than the SDT is currently anticipating. I believe they are hoping to have the first draft of the new standards finished by the end of this year; that may be a stretch, just because of the issues that need to be addressed for virtualization.

But there is another moral to this story. It’s one that I have been harping on a lot lately, and you can anticipate I will continue to harp on it in the future. The big problem here isn’t just that the CIP standards don’t address virtualization now; there are a number of other technologies (like the cloud) that they don’t address either. Rather, the problem is that the prescriptive nature of the NERC CIP standards makes extending them to address new areas very hard.

I believe that the right approach would be to make the NERC CIP standards what I and others call “threat-based” (or perhaps “outcomes-based”. Even though at first glance it might seem these two terms are very different, I think in this case they might turn out to be effectively synonyms). Entities would be required to address certain cyber security practices like patch management and network separation[iii]; and they would have to do this across certain domains like OT systems.

Including a new technology like virtualization in threat-based standards would be easy. There would be an Applicability section, which described what the requirements applied to. Adding virtualization to the CIP scope would only require creating a definition and stating that virtualized systems and networks were now also in scope.[iv]

For a deeper discussion of virtualization and how it interfaces with NERC CIP, you should consider attending the upcoming Deloitte/Cisco/UTC webinar on this topic.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] I would think there will be need for a similar requirement applying to virtualized servers. Since VMs can easily be migrated from one network to another, there probably needs to be a requirement to ensure that a VM isn’t migrated from an ESP into a non-ESP network, or vice versa. As with VLANs, this requirement can be written, but it won’t be particularly easy.

[ii] Of course, this is a deliberate reference to Henry James’ great ghost story (actually a novella), The Turn of the Screw.

[iii] And while the entity would be required to address each practice, there wouldn’t be prescriptive requirements saying, for example, that entities have to assess new patches for applicability within 35 days, and woe betide you if you take 36 days for one particular patch! There would be guidelines for how to structure – again as an example - an effective patch management program; the auditor would need to decide whether or not the entity’s program was effective, based on the evidence.

[iv] Hopefully, it wouldn’t even take an SDT to make this sort of change. I envision an industry body that would manage the CIP program and be responsible for decisions like this. It would be constituted by the CIP standards, but it wouldn’t need to amend those standards for changes like this one.