Tom Alrich's Blog: May 2017

Tuesday, May 30, 2017

The News from RF, Part V: Felek Tackles Jungle Muxes

I’ve already mentioned that the highlight of RF’s CIP workshop in April was the joint presentation by Felek Abbas of NERC and Lew Folkerth of RF, which addressed various current topics in CIP. Felek gave a very good presentation on CIP-003-7, which includes the revised requirement for electronic access control at Low impact assets that have external routable connectivity (CIP-003-7 was approved by the NERC ballot body and Board of Trustees, and is now awaiting FERC approval).

Felek spent a lot of time on the Reference Models that provide examples of different ways to comply with this new requirement; these are found in the Guidance and Technical Basis for CIP-003-7. When he discussed Reference Model 5 (found on page 39), he pointed out - and I agree with him - that this model addresses FERC’s concern that led them to order NERC to clarify the meaning of the word “Direct” in the LERC definition in CIP v6 (as you probably know, the current CIP Modifications drafting team decided to eliminate the defined terms LERC and LEAP altogether, instead incorporating them into the requirement itself. For a description of what they did, read this post, although if that isn’t punishment enough, you could read this one). I’d like to clarify why Reference Model 5 does actually address what FERC asked for, although this isn’t the point of this post. You can skip the next two paragraphs if this historical item isn’t of interest to you.

Reference Model 5 shows a routable communication stream entering a Low asset, but then being converted to a non-routable protocol for connection to a Low impact BES Cyber System. In 2014 and 2015, as NERC entities realized the clock was ticking on coming into compliance with CIP v5, there was a lot of discussion about whether or not just converting a routable to a serial connection (e.g. with a protocol converter) was enough to “break” External Routable Connectivity for Medium BCS. The regions and NERC made it fairly clear that merely converting the protocol wouldn’t break ERC, but it was also clear that the ERC definition itself didn’t provide an answer to this question.

Of course, at the time there was no requirement regarding electronic access control at Low assets (and also no LERC), since the CIP v6 drafting team was still working on this. CIP v6 was approved in January 2016, and by then FERC was worried enough about this issue that they explicitly included it in their order approving v6; that is, they required NERC to clarify the meaning of “Direct” in the LERC definition. Reference Model 5 shows (using the green dotted line) that the protocol conversion by itself doesn’t change the fact that there is external routable connectivity to a Low BCS. However, the fact that the “non-BES Cyber Asset” in Reference Model 5 also performs electronic access control is deemed sufficient mitigation of the risk caused by the presence of external routable connectivity to one or more Low BCS – which is the objective of the revised requirement. So Reference Model 5 shows one way to comply with the revised requirement in CIP v7 (indeed, the other Reference Models all show alternative ways to comply).

At this point, somebody in the audience raised the question of “jungle muxes”. These are devices – often found in substations – that accept a routable external connection, but then convert that into a number of serial connections to a set of serial devices (often relays). The question was whether, since the mux does communicate routably, it has to have an Electronic Security Perimeter around it.

Felek answered this question by going back to a Lesson Learned that dealt with communications between two Medium and/or High assets, at least one of which doesn’t have an ESP (i.e. it doesn’t have any internal routable network). The Lesson Learned was occasioned by the fact that Section 4.2.3.2 of all the CIP v5 and v6 standards (it’s also found in v7, at least in CIP-003-7) says that cyber assets involved in communications between “discrete” ESPs are exempt from CIP; at the same time, under v5 and v6 (unlike in v3) there can be assets subject to CIP that don’t contain any routably connected devices. The Lesson Learned addressed the question whether the cyber assets that are associated with the communications network between two Medium and/or High assets, one or both of which doesn’t have an ESP, are also exempt from CIP, and if so where the “inter-asset” communications stream begins in an asset without an ESP.

That Lesson Learned introduced the idea of the Demarcation Point (which is an old idea from the days when dinosaurs roamed the Earth and data communications was mostly over phone lines), as the device where the external communications carrier “takes over” from the building’s internal communications network. The LL said that an entity complying for an asset without an ESP should designate the device that is best described as the demarc point. All cyber assets that are “inside” of this device are potentially in scope for CIP; all cyber assets that are “outside” of this device are not.

Felek said that, even though the jungle mux does have a routable connection, the fact that this connection is used purely for external communications – and that there are no other devices at the Low asset that communicate routably – means the mux itself can be considered the demarcation point. Thus, it doesn’t need an ESP.[i]

I thought Felek’s answer was very good, but I was even more impressed by the fact that he was willing to state it in a meeting with 200 or so representatives from NERC entities in attendance. There are some who might consider what he said to be an “interpretation” of the wording of a standard, rather than simply implementation guidance. NERC and regional employees are permitted to provide the latter but not the former; in practice, this has meant that a lot of questions about CIP v5 and v6 have only been answered in private conversations (primarily in the private Small Group Advisory Sessions that were conducted in 2015 and 2016) or not at all. Felek wasn’t afraid to publicly state his opinion on a disputed issue like this; I wish it would happen more often.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i] Of course, this isn’t to say that the serial devices that connect to the mux don’t have external routable connectivity. Whether they do or don’t depends on what happens at the jungle mux. The definition of ERC is “The ability to access a BES Cyber System from a Cyber Asset that is outside of its associated Electronic Security Perimeter via a bi-directional routable protocol connection.” If all the jungle mux does is convert the routable communications stream to serial and send it to the proper device, then there is ERC. If the mux performs some sort of authentication, then this probably removes that ability, and there is no ERC.

However, I’ll admit this is still an ambiguous area, and that points to an interesting irony. That is because FERC’s qualms about the ERC definition (discussed above in this post) led to a change; but the change was in the LERC definition, not ERC. When FERC ordered NERC to clarify the meaning of “Direct” in the LERC definition – in the order approving CIP v6 in Order 822 in January 2016 – they weren’t reacting to problems with LERC since it wasn’t in effect yet. They were really responding to the discussions about ERC in 2014 and 2015 that I described above (I linked above to just one of the posts I wrote on this issue. Others include this one, this one, this one (in the order I wrote them). I believe FERC was especially concerned about the idea that mere protocol conversion “breaks” ERC.

If you want to plow through these three posts, you will see that, by the third one, I had concluded (in conjunction with Morgan King, a WECC auditor who has weighed in a lot on technical issues like this) that mere protocol conversion didn’t break ERC, but requiring re-authentication would. I thought when I wrote the third post that this was the last word on the subject. However, in this post, which was written a few days after the third one, I reported that another auditor said this wasn’t enough to break ERC – that there had to also be “reformulation of the user’s commands and the acting as a proxy for the user”.

At this point, I threw up my hands and decided ERC was a black hole – I could write 100 posts and never get to the bottom of the problem, which was that the definition needed to be rewritten. I believe this is one of the tasks on the CIP Modifications SDT’s agenda, and I hope they do this. I think they should build on the very good work they did with the LERC issue last year, and base their new definition of ERC on use cases similar to the ones they developed for CIP-003-7 (although some of those cases won’t apply because the new Section 3.1 “defines” what used to be LERC in terms of the asset’s boundary, in order to avoid requiring a list of BCS at Lows). I think any attempt to write a dictionary-style definition will fail. They just need to say “In these cases, there is ERC. In these cases, there isn’t ERC.”

Monday, May 29, 2017

The News from RF, Part IV: The Causes of CIP Non-Compliance

I’m still working through the list of posts I wanted to write about interesting things I learned or observed at the RF Spring CIP Compliance Workshop in April. I hope to have them finished by the fall workshop in October, at which point I’ll no doubt have another set of posts to write.

The workshop started out with a very interesting presentation called “2016 CIP Violation and Themes Update” (to get the slides, go here and find the “Spring CIP v5 Workshop” under Seminars/Workshops 2017. This presentation is the first one, starting on slide 2). Rather than introduce it, I’ll refer you to an article about it by Peter Behr in the daily Energywire newsletter published by Energy and Environment News (that is a subscription service, but I highly recommend it as having the best original reporting – as opposed to restating press releases - of any of the energy news services).

In addition to what is said in the article (which includes a quotation from me toward the end), here are some random points I noted as I listened to the presentation:

The presentation discusses five primary causes (they use the word “themes”) of CIP violations. These are compliance silos, disassociation, inadequate tools, outsourcing and lack of awareness.
Regarding silos, this means both different “vertical” silos – HR, IT, etc. – but also horizontal silos, such as executives/managers/field people, etc.
Horizontal silos can lead to “analysis paralysis”, in which self-reports and other documents take an excessively long time to work their way through the different layers of the organization.
Another reason that silos develop is acquisitions. RF recommends learning all you can about the acquired company and their culture before you simply impose your compliance program on them; a different program may be warranted.
Here are three symptoms of lack of awareness: First, middle management only provides good news to executives, not bad news. Second, experts aren’t in the right roles. Third is inadequate root cause analysis of violations. A sign that this is the problem is when there are a lot of self-reported violations that are attributed to user error, which list training as the mitigation.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

Friday, May 26, 2017

What are “Security Patches”?

CIP-007 R2 is my poster child for an overly prescriptive requirement; I know many NERC entities are spending huge amounts of time complying with just this requirement. So I always have my antennae up for policies that might make the burden even greater than it needs to be.

In a post at the end of March, I reported on a presentation by Eric Weston, one of the WECC CIP auditors, at their spring CIP User Group meeting. In that presentation, Eric stated that the phrase “security patches” in CIP-007-6 R2 means more than just vendor patches that are explicitly designated as such. He said it is the entity’s responsibility to read the descriptions of all patches issued by the vendor – that apply to the system in question – to determine whether they patch a security vulnerability. If a patch does address a vulnerability, then it is a security patch and needs to be treated as such.[i]

At first, I was upset by this, since it seemed to me that this meant that the entity could be found in violation if they had missed a security patch that wasn’t explicitly identified as such. But Eric explained that, as an auditor, he wanted to make sure that the entity’s patch program entailed finding more than just patches that were explicitly labeled as security ones, but that the entity wouldn’t necessarily be held to be in violation if they had missed a patch that addressed a security vulnerability but hadn’t been labelled a security patch. That seemed fair to me.

Another auditor had a slightly different take on this, although I don’t think he differs in principle from Eric. He said

“We all know that vendors do not always identify software updates as security related. A Microsoft Service Pack will most definitely include security patches. A Cisco IOS update may or may not include a fix to a vulnerability. A SEL update most likely will not, but it has happened. You need to read the release notes and other available information, and not just rely on the title or abstract description of the update.

“Some folks rely on the National Vulnerability Database to identify vulnerabilities and tie them to available updates. That works only to the extent that the vendor posts information to the NVD. You need to fill the gaps with other sources.

“If you do not look at updates released by your favorite vendor, how do you know if any are security patches or are version updates that include security fixes? If you do not look at release notes, the NVD, or other available documentation on a software patch or update, you have no way of knowing which updates are applicable security patches. So either do the research or install everything just in case. And, yes, if you miss something and my team can determine that, there will be a finding. If your process is faulty and you are at risk of missing something in the future, there will be an Area of Concern.”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i] However, this does not apply to functionality enhancements that provide a new feature that can be used to improve security on a device – for example, extending the maximum password length. I wrote a post on this distinction in 2015. Functionality enhancements are of course important to apply from a good security practice point of view, but this isn’t a requirement of NERC CIP.

An auditor elaborated on this question when he said “First of all, a patch fixes a bug. Ergo, a security patch fixes a security bug. That is an exploitable defect in the code. A feature update is not a security patch, even if the new feature improves overall security. A feature update is not correcting a vulnerable code defect. That said, I will discuss with the entity the difference between focusing on compliance as opposed to focusing on protecting their systems when they argue that have no obligation to do anything with the new feature because the CIP Standards do not require it.”

Tuesday, May 16, 2017

What Systems Should be in Scope for CIP?

In my post yesterday, my second on Wannacry, I was addressing the emergency patches made available on Friday for Windows XP, Vista and Server 2003 – all out-of-support operating systems. While I had already published a “Public Service Announcement” on the need to apply that patch on all systems at High and Medium impact assets in scope for CIP, an auditor had emailed me to point out that, for security not compliance reasons, the patch should be applied to all devices on the OT network that run one of the three old OS; this includes devices found in Low impact assets, Distribution substations and generating plants that are below the threshold for inclusion in the BES. The auditor’s reasoning for suggesting this was good: Just because these devices aren’t directly networked with Cyber Assets in scope for CIP (if they were directly networked, they’d at least be PCAs), if they become infected with Wannacry they will still pose a substantial risk to the BES.

Of course, many NERC entities will argue that they already have great defenses protecting their networks subject to CIP – those in High and Medium impact Control Centers, and in Medium impact substations and generating stations – from their networks that aren’t subject to CIP. And I’m sure this is almost always the case, to a large degree due to CIP, which does require thorough separation of ESPs from other networks. But this didn’t deter the auditor from still advocating (coming as close to “requiring” as he could) that the discontinued OS’s on non-ESP networks also should be patched.

And the reason for this is simple: There is no such thing as a 100% effective security measure. For a threat as serious as the one posed by Wannacry, however small the chance that it could spread from say a distribution substation to a High impact Control Center, almost any security measure would be justified to prevent that from happening.

But if this is the case, why aren’t these other systems subject to CIP? If there’s even a small chance that they could be the vector for an attack like Wannacry that could lead to a serious event on the Bulk Electric System, shouldn’t there be at least some protections (e.g. patching, in the event of a serious threat like Wannacry) that would apply to them?

Or to use another attack as an example, the Ukraine attacks in December 2015 didn’t originate on the OT network; they started with phishing emails opened by people who had no connection at all to actual operations. Yet by opening these emails, these people inadvertently made it possible for the attackers to have free rein of the IT network and search diligently for a way to get into the OT network – which they inevitably found.

As I’ve said before, I do think IT assets need to be included in CIP in some way. I also believe that non-CIP OT assets (such as the ones discussed above with reference to patching) should also be included. More generally, I think that every cyber asset either owned or controlled by the NERC entity should be included in scope for CIP. But there are a few caveats to that:

I certainly don’t want these new assets to be treated as BES Cyber Systems or Protected Cyber Assets. This would impose a huge burden on NERC entities, for a much-less-than-proportional benefit.
The only way the new assets should be included is if CIP – and the enforcement regime that goes with it – is totally rewritten, along the lines of the six principles I discussed in this post.
My fifth principle is “Mitigations need to apply to all assets and cyber assets in the entity’s control, although the degree of mitigation required will depend on the risk that misuse or loss of the particular asset or cyber asset poses to the process being protected.” In practice, I think there need to be at least two categories[i] of cyber assets in scope: direct and indirect impact. Direct impact cyber assets are those whose loss or misuse would immediately impact the BES; these are essentially BCS, but I would of course change the definition to fix some of the current problems. Indirect impact cyber assets are those that can never themselves directly impact the BES but can facilitate an attacker, as happened in the Ukraine (and as would have happened had any utilities been compromised by WannaCry – since their OT networks aren’t connected to the Internet, the initial infection would have been on the IT network). Essentially, all systems on the IT network, as well as systems at Low impact BES assets and at Distribution assets, would fall into this category.[ii]

As I said in my Wannacry post from Friday, I’m now leaning more to the idea of having a separate agency - within probably DHS - regulate cyber security of critical infrastructure. This includes the power industry, oil and gas pipelines, water systems, chemical plants, etc. I’m not doing this to punish NERC, but because I believe there will be a lot of advantages to having one regulator overseeing all of these industries, as opposed to separate regulators for each one. For one thing, there would be a lot of synergies, since the similarities among critical infrastructure in these industries are much greater than the differences between them (for example, if you look at my six principles, you’ll see they don’t refer to power at all). For another, I think the power industry, which has had by far the most experience with cyber regulation, would be in a good position to share their lessons learned with the others.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i][i] Note these categories don’t have anything to do with the High, Medium and Low impact categories in the current CIP v5/6/7 (and soon 8!). As I pointed out it seems like 50 times a few years ago when I was digging into the swamp known as CIP-002-5.1 R1 and Attachment 1, those are really not categories of BES Cyber Systems (even though they are identified as such in the requirement); they’re categories of BES assets (substations, etc). I think I first pointed this out when FERC issued their NOPR saying they’d approve CIP v5 in April 2013 (see footnote vii as well as my response to the first comment at the end of the post).

[ii] I’m not ruling out the possibility that there might need to be other categories, or sub-categories of these two.

Monday, May 15, 2017

Follow-Up on Wannacry

I received a few interesting comments on my Saturday post on the Wannacry worm, which I would like to share with you.

First, an auditor wrote me regarding the Public Service Announcement from Lew Folkerth of RF, that I included in this post. That announcement pointed out that there is now a patch for Windows XP, Vista and Server 2003. If you have Medium or High impact BES Cyber Systems and have BCS or PCAs running one of those OS’s, you are now on notice that a security patch is available for them (for the first time since support was discontinued, I believe). You’re required to install that patch per the schedule in CIP-007 R2. But you should really install it ASAP, not wait 35 days. This isn’t required by CIP, but it should be required by common sense, and reading the news reports.

However, the auditor also wants me to point out that NERC entities that have any systems with one of the three discontinued OS’s running on their OT networks – say, systems in Distribution substations or perhaps Low impact generating stations – should also quickly patch them. For one thing, you shouldn’t want Distribution outages any more than you want Transmission ones (even though the latter are the only kind that might involve CIP violations). But for another, even if your only concern were Transmission assets and you in theory have these wonderfully isolated from the Distribution and corporate networks, if for some reason you’re wrong and there is a connection you didn’t know about between your Distribution and Transmission networks, a Wannacry infection on the former could lead to real disaster (both for your utility and your NERC peers) on the latter.

This observation does point out to me an implication for the Big Picture in NERC CIP. And since I’m a Big Picture sort of guy, I’d like to elaborate on that. However, I’ll spare you this elaboration until my next post.

Second, a security manager for the High impact Control Centers of a large utility pointed out an interesting caveat regarding the section of the post entitled “Not a Public Service Announcement, but still Interesting”. This referred to the fact that there is a “kill switch” embedded in the code for the worm, which requires it to look for a certain domain name on the web; if it finds something at that domain, it de-activates itself. An unknown cyber security researcher found this domain name wasn’t registered, registered it, and linked it to some existing system. That act killed the worm and probably prevented a lot of further damage, especially in the US.

The security manager pointed out that at his Control Centers he has disabled DNS recursion and forwarding (I would imagine he’s not alone, when it comes to Control Centers with High or Medium BCS). Of course, this is normally a very good thing, since if a machine at the Control Center becomes infected with almost any other malware and starts trying to phone home to a command and control server, it won’t be able to get through.

However, this does mean that, assuming he doesn't take any other precautions, if any of his machines within the Control Center do get infected with Wannacry, this security measure would in theory end up enabling the worm to run. The worm would try to locate the domain name in question, but of course it would receive a message saying it can’t be found. But that means the kill switch would be ineffective, and the worm would proceed on its merry way to try to infect all of the machines in the Control Center. Of course, that won’t happen since the machines are fully patched against Wannacry, and he has beefed up his antivirus DAT files to be sure to catch Wannacry if somehow it does get into the ESP. But it does show how you have to think these things through.

Finally, this is a comment I received from myself, regarding the final section of the post titled “Also not a Public Service Announcement, but also still Interesting”. This regarded the nation-state whose security services are suspected of being behind the Shadow Brokers, the group that stole hacking tools from the NSA and dumped them online. That same nation-state was (by far) the biggest victim of Wannacry (at least as of Saturday). I want to point out that the Good Book, which isn’t normally my number one source of information on cyber security issues, has this one nailed: “…whatsoever a man soweth, that shall he also reap.”

However, that quotation also needs to be applied to another large country that was severely impacted but with a one day delay – the biggest impact in that country was today, Monday. The country has a lot of pirated Windows software that of course isn’t receiving regular patches. As a result of that lack of patching, systems across that country booted up today and found their files had been encrypted.

But before I get on a high horse and start being smug about other countries bringing their troubles on themselves, I do want to point out that the Original Sin in all of this is the fact that a serious software vulnerability was discovered by a government agency in the US, but not reported to the vendor. If it had been reported, it could have been patched before the bad guys also discovered it. Instead, the aency used the vulnerability as the basis for a potent cyber weapon. Sounds like a great idea at first glance, but that assumes knowledge of the vulnerability will never leave your control. Unfortunately, that’s exactly what happened here.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

Saturday, May 13, 2017

Makes you WannaCry (and a Public Service Announcement)

Yesterday’s events were a real eye-opener to me. And I think they should be an eye-opener for anybody involved in critical infrastructure security. Here are some initial thoughts:

This was an infrastructure event, not just a bunch of individual computers that fell prey to ransomware. Sure, reports say up to a billion dollars may need to be paid in ransom, but that isn’t what’s significant, IMHO. What is significant is the fact that at least one critical infrastructure, that of the National Health Service in the UK, was severely impacted[i]. If nobody lost their life because of this, it will be a miracle. But there were certainly a lot of people whose health will suffer in various ways due to their lack of access to care yesterday.
As far as I know, all ransomware until yesterday has infected only individual machines (some were servers, of course, which impacts many users). And in all cases, what was affected was data. It was of course painful to pay the ransom, but that restored the data (in most cases), and there were few if any further direct effects. Even a successful ransomware attack on a US electric utility last year didn’t have any impact on operations.
Compare this to yesterday’s events in the UK, in which surgeries and regular doctors’ appointments had to be cancelled, people were turned away from the ER, patient records and test results couldn’t be accessed, etc. Even though it wasn’t intended as such, this turned out to be an attack on the UK health care infrastructure. This is all due to the fact that WannaCry (and there have been some variants appearing as I write this) is a worm[ii] and a very fast-spreading one at that[iii].
Now suppose that other critical infrastructure in the UK, such as the power grid, water systems, traffic systems, etc. had also been successfully attacked by WannaCry. If a lot of people had been sickened by impure water, or had traffic accidents when the stoplights in London suddenly went out, where would they have gone for treatment? And with the lights out and the Underground shut down, how would they have gotten there anyway?

So I’d say there are at least two major lessons from this, for the critical infrastructure community. First, an infrastructure attack doesn’t have to be deliberately caused – it can be a side effect of an attack with another purpose. Specifically, a worm-based ransomware attack can have a huge CI impact, even though it was never intended to do this.

Second, the need for coordination among critical infrastructures – both locally and nationally – is greater than ever. In fact, I’m beginning to think that it’s now becoming an unaffordable anachronism to have separate cyber regulatory structures for the Bulk Electric System, electric power distribution, natural gas pipelines, natural gas distribution, water treatment, health services[iv], etc. Maybe there should be a single organization – perhaps under DHS – that regulates cyber security of all critical infrastructures.

Public Service Announcement

Lew Folkerth of RF emailed me this afternoon to ask me to point out that there is now a security patch for Windows XP, Vista and Server 2003 (Microsoft released the patch yesterday). As Lew points out (and this applies to all NERC regions, not just RF), “This means there IS a patch source for those systems, and entities need to identify the source, assess the patch for applicability, and install the patch (or create/update a mitigation plan[v]).” Of course, this only applies to High or Medium impact systems running this software.

Not a Public Service Announcement, but still Interesting

You’ll notice the Binary Defense link I just provided thanks “MalwareTechBlog” for initiating the kill switch that shut the worm off. It points out that this move undoubtedly saved lives. I think the idea is that by shutting the malware off early (US time) on Friday morning, this move greatly inhibited its spreading here, since most workers weren’t in their offices yet and able to open the phishing emails that spread the worm.

But it turns out that the unnamed person behind MalwareTechBlog didn’t actually know he was killing it – you can read the story here. Of course, he still deserves lots of accolades (if he were willing to come forward) and perhaps a Presidential Medal of Freedom. But it just proves an adage I’ve repeated since I was a boy (20 years ago): “Rational planning is good, but in the end there’s no substitute for dumb luck.”

Also not a Public Service Announcement, but also still Interesting

The exploit that made WannaCry so effective was one that had been stolen from the NSA and dumped online by the Shadow Brokers group; this group has been linked to a certain country’s intelligence services. And guess which country – as of today, anyway – is listed as the number one victim of WannaCry? Hmmm…

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i] Other infrastructure events included factories that had to be shut down, and multiple government bodies in Russia that had to curtail operations.

[ii] More specifically, it is delivered on a “worm delivery system” built on the EternalBlue exploit.

[iii] Although the all-time champ for speed of spreading has to be 2003’s SQL Slammer, which infected its 75,000 victims worldwide within ten minutes. In fact, I read somewhere that this figure was something like 85% of the potential victims (MS SQL systems that hadn’t received a recent patch) worldwide. Talk about efficiency!

[iv] When I speak about health services, I’m not talking about patient data privacy. Cyber regulations like HIPAA in the US are already addressing that. What they aren’t addressing now is the infrastructure required to keep the health system running smoothly. Of course, individual hospitals, doctors’ offices, ambulance services, etc. have a lot of incentive to protect the systems required for their individual operations. But I don’t believe there’s any organization – like NERC for electric power – that is specifically charged with regulating cyber security for the purpose of maintaining reliability of the health care system.

[v] And if you’re not sure what should be in a mitigation plan, see my previous post.

Friday, May 12, 2017

What is a Patch Mitigation Plan?

Recently, a NERC entity emailed me with a question about CIP-007 R2, patch management. Specifically, the question was whether the mitigation plan needs to do more than simply explain why the patch can’t be installed at the time, and state that it will be installed by a specific future date; it seems their auditor had informed them that wasn’t enough.

I knew the answer to this, but I reached out to an auditor for his opinion and I was glad I did – he had some very helpful suggestions. Here is his response in full:

“The requirement is to create (or update) a mitigation plan if the patch cannot be implemented within 35 days of it being determined to be applicable. The Registered Entity is expected to document when and how the vulnerability will be addressed, and the expectation as expressed in the Measures is to specifically document the actions to be taken by the Responsible Entity to mitigate the vulnerabilities addressed by the security patch and a time frame for the completion of these mitigations. Simply stating the patch will be installed sometime in the future is not an action that mitigates the vulnerability in the interim.

“The Registered Entity needs to understand what the vulnerability is and how it can be exploited in order to document what mitigating controls are in place to reduce the risk of exploit until the patch can be installed. Often, but not always, the proper implementation of the CIP Requirements will mitigate the risk. For example, if the vulnerability can be exploited across the network, tight firewall rules will likely be a mitigation as long as there is no requirement for broad access to the Cyber Asset that counteracts the control. The Registered Entity might also update its anti-malware signature files more frequently and/or increase monitoring of the impacted Cyber Asset.

“But, if the exploit requires physical access to the Cyber Asset, asserting the device is behind a firewall is meaningless. Rather, the mitigations would include physical access restrictions, possibly current or enhanced restrictions on the use of removable media; in other words mitigation steps that counter the exploit mechanism. And, while not stated as an explicit requirement, the Registered Entity really needs to monitor the vulnerability until the patch is installed in case the exploit risk changes, possibly requiring additional protections. That would be a good cyber security (best) practice.”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

Thursday, May 11, 2017

Webinar: Third-Party Cyber Risk Management

A very big concern today – in almost all industries - is third-party cyber risk. Of course, this often manifests itself in the form of vendor risk, which is why NERC is now finishing development on CIP-013 and related changes in two other CIP standards. Vendor cyber security can pose a risk both to the Bulk Electric System (which is of course why we will have CIP-013) and to your organization itself (a great example of that is the Target breach, which started because one of their suppliers had unneeded access to the actual production network).

On Tuesday, May 23 from 12:30 – 1:30 EDT, Deloitte and the law firm Morgan Lewis will present a webinar on Third Party Risk Management. This webinar will address:

• The third-party risk landscape

• How third parties exacerbate an organization’s cyber risk

• The growing regulatory and legal importance of managing third-party cyber risk

• The complexity and impacts of responding to a third-party cyber risk incident

• Solutions for managing third-party cyber risk

To register, please go here.

I have said before that Deloitte’s Cyber Risk Services group is one of the largest, if not the largest cyber security consulting organization in the world, with over 3,000 US-based cyber consultants. However, we are part of a much larger organization, Deloitte Advisory, which advises organizations on dealing with many kinds of risk, including Financial, Regulatory, Legal, and Third-Party.

This webinar is a joint effort of the Third-Party and Cyber Risk groups. I hope you will find it gives you a perspective on the larger problem that CIP-013 is trying to address. Feel free to forward this post to anyone in your legal, risk management, supply chain or other departments who you think would be interested in attending.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

Wednesday, May 10, 2017

The News from RF, Part III: What Happens on 9/1/18?

My last post was the first of at least three or four dealing with interesting things I learned from the presentation by Felek Abbas of NERC and Lew Folkerth of RF at RF’s CIP compliance workshop last month in Baltimore (the download includes all of the presentations at the workshop. Felek and Lew’s presentation starts on slide 19).

Felek’s part of the presentation started with a discussion of compliance dates for the Low impact requirements in version 6. Since those should be well known to you already, I won’t discuss them now. What I found interesting was slide 29, which says in part:

CIP-003-7 was filed with FERC on March 3, 2017
However, CIP-003-7 is very unlikely to come into effect before September 1, 2018. You will need to comply with the CIP-003-6 version of these requirements beginning September 1, 2018, until the effective date of CIP-003-7.

This was interesting because, in the whole LERC discussion last year, it had never even occurred to me that CIP-003-7 wouldn’t be in place by 9/1/18; I never thought there would be a serious possibility that entities would have to comply with version 6 of this standard, and then version 7 (as I’ll discuss below, having to do both doesn’t change what you have to do to comply, but it certainly does change the language you need to use to document compliance). However, this was probably because I hadn’t bothered to read the implementation plan that got passed with CIP-003-7(i) this year.

When I read the plan after hearing Felek’s discussion, I realized that the words “very unlikely” on the slide should have been replaced with “mathematically impossible”. This is because the plan says “…Reliability Standard CIP-003- 7(i) shall become effective on the first day of the first calendar quarter that is eighteen (18) calendar months after the effective date of the applicable governmental authority’s order approving the standard…”

So let’s do the math. CIP-003-7(i) was filed with FERC on March 3. If FERC had approved it that day (which I doubt has ever happened for a NERC standard, or almost anything else), the effective date would have been October 1, not September 1, 2018. Of course, I doubt this would have been a big deal, since the Regional Entities wouldn’t have issued any PNCs (the successor of PVs) for turning in V7 documentation during the month of September, 2018. And even if FERC had taken just 3-6 months, I think the Regions would still follow the same approach.

However, unless you’ve been living in a cave in the Himalayas for the past year, you have probably heard that there is a new administration in Washington and they have been very slow to make high-level appointments to almost any Federal agency. In FERC’s case, this situation was made worse by the fact that a key resignation in January left the Commission with only two Commissioners (out of a normal five), meaning they don’t have a quorum to conduct business.

And since one of the remaining commissioners has announced her intention to resign, two more new Commissioners need to be appointed and confirmed by the Senate, then get comfortable in their new jobs, before there is any chance at all that CIP-003-7(i) will be approved. So it’s almost certain now that it will be summer 2019 at the earliest before the new standard comes into effect, and that there will be a period of at least nine months during which entities will have to comply with CIP-003-6 (and when I’m talking about CIP-003 v6 or v7, I’m specifically talking about Section 3.1 of Attachment 1 of CIP-003. The single sentence in this section is the only substantial change between the two versions).

However, as I implied above, this shouldn’t require entities to implement procedures or technologies to comply with CIP-003-6, then rip them out when CIP-003-7 finally comes into play. As I discussed in this post last November, almost everything – with one exception that I’ll discuss below – that you could do to comply with Section 3.1 of Attachment 1 of CIP-003-6 will still work under CIP-003-7.

The one exception to this statement is if you had had your heart set on the fact that there is a routable-to-serial protocol conversion within say a substation, as reason why you don’t have to take any more protections on the routable connection to the substation. I believe this was the one case that FERC had in mind when they ordered NERC to eliminate the word “direct” from the LERC definition, when they approved CIP v6 in Order 822. So don’t plan on being found compliant if you do this.

Given that practically everything else you can do to comply with CIP-003-6 will work for CIP-003-7 as well, this means that the only difference will be your documentation. But you definitely can’t use the same words to describe the same setup in both versions. For one thing, you need to lose the word LERC. It sleeps with the fishes. If you’re looking for clues on how to do the v7 documentation, you might look at the post from last November, referenced two paragraphs above.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

Friday, May 5, 2017

The News from RF, Part II: As Usual, Lew Hits the Nail on the Head

I have already said that Reliability First’s CIP workshop two weeks ago was the best regional CIP meeting I have attended. Probably the highlight of the meeting for me was the joint presentation by Lew Folkerth of RF and Felek Abbas of NERC (you can find their slides here, in the single file that includes all of the day’s presentations. Their slides begin at slide 19). I’ll have at least a few more posts on points that were made by Lew or Felek.

Lew addressed a number of interesting topics, including the RSAW for the new CIP-003-7 standard; of course, the standard itself is awaiting FERC approval. One of his points was a real lightbulb moment for me, which I’d like to share here. On slide 68 in the second section, Lew listed what the new RSAW says regarding auditing Attachment 1, Section 5 of CIP-003-7 (this is the new requirement that addresses Transient Cyber Assets used at Low impact assets): “For Transient Cyber Assets managed by the Responsible Entity in an ongoing manner, verify that the Transient Cyber Assets have an effective means of mitigating the risk of the introduction of malicious code onto the Transient Cyber Asset.”

Lew emphasized the word “effective”, then pointed out that he thought this is really the key to auditing non-prescriptive, results-based requirements (although I prefer the term “objectives-based[i]”), such as this one. That is, since this type of requirement only specifies an objective that needs to be met, not the method to achieve it, there has to be some criterion that the auditor uses to determine what is an acceptable method and what is not.

For example, in CIP-007 R3 (another objectives-based requirement), the entity is required to achieve the objective of mitigating the threat posed by malware to BCS. Suppose an SME at an entity told the auditor that, based on the advice of his brother-in-law, his method of mitigating the malware threat to one or more BCS is to say a certain chant every morning at 7 AM. I think the auditor would be justified in finding the entity in violation - not just issue an Area of Concern, as might be the case if the entity had chosen IDS signatures over anti-virus or application whitelisting methodologies. In the latter case, the auditor might issue an Area of Concern and ask the entity to either justify this decision or implement a different solution. IDS signatures are a plausible methodology for effectively mitigating the malware threat, whereas chants are not (and please don’t send me emails arguing why chants are probably likely to be as effective as IDS signatures! I pride myself on having a fairly open mind, but I do have my limits).

Lew wrote an article about auditing non-prescriptive CIP requirements for the January/February RF newsletter, and I wrote about that article in my own post. I just checked to see how the use of “effective” as a criterion fits into what he said in that article. He lists four components of a good evidence package for the requirement he wrote about in that article, CIP-010-2 R4 (of course, another non-prescriptive requirement). The third component is that the plan must show “how methods documented in the plan achieve the objectives” (the “plan” Lew refers to is the one required by R4. You could say that the plan is the same thing as the objective of this requirement).

Of course, the word “effective” isn’t in here, but I would argue that “methods that achieve the objective” is the same thing as saying “effective methods for achieving the objective”. So I call this a match (not that I would hold it against Lew if his thinking had evolved since he wrote the article. My thinking is always evolving - to put it kindly - and my unofficial motto is “Often wrong, but never in doubt!”).

To sum up this post, I think that the word “effective” (or an equivalent word or phrase) should be understood (and if possible, explicitly stated) in every non-prescriptive, objective-based requirement. This will effectively (I couldn’t help that one. Sorry) indicate that the entity must not just utilize one or more methods to achieve the objective, but that the chosen method must be effective. Of course, none of the current non-prescriptive CIP requirements (such as CIP-010 R4 and CIP-007 R3) currently use this word, but I imagine the RSAWs effectively (OK, I did it again!) remediate that omission. In any case, you should always understand that this word is at least implicitly in place.

As a postscript, I want to point out that one questioner at the RF CIP workshop implied to Lew that the use of the word “effective” would increase use of “auditor discretion”, and thus was a bad thing. I can’t remember Lew’s answer, but I know my answer – if I were in Lew’s place - would be: “The fact that this requirement is non-prescriptive means auditor discretion will definitely be required, whether or not the word ‘effective’ (or its equivalent) is present in the requirement – and the decision to make the requirement non-prescriptive was made by the Standards Drafting Team, not me. However, as I discussed in this post, auditor discretion is already required to audit most of the current CIP v5 and v6 requirements – both prescriptive and non-prescriptive - due to the presence of many ambiguities and missing definitions. The auditor is expected (perhaps with assistance from the Regional Entity) to use whatever training they have in legal logic to audit in spite of these flaws.

“The difference with non-prescriptive requirements is that the auditor is required to use discretion regarding matters of cyber security, including making judgments about whether the entity has used an effective methodology for addressing a particular requirement. Since the auditors are chosen for their posts in part because of cyber security expertise, not legal training, I think it is much preferable to have them exercising judgment in cyber issues, rather than legal/logical ones. But in any case, non-prescriptive requirements are clearly here to stay. No CIP drafting team has drafted prescriptive requirements since v5; I predict that no more will be drafted, regardless of what happens with the current prescriptive requirements[ii] and the current compliance regime.”

Note: When I showed a draft of this post to Lew, he commented that he wasn’t sure what his answer was, but he should have referred to GAGAS, which requires use of professional judgment when performing audits. So his answer would be something like “Use of professional judgment isn’t the exception in auditing, but the rule. Even the “Bible” of our profession requires that we exercise professional judgment, since no requirement ever perfectly addresses every possible case you may throw at it.” Amen.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i] After reading a draft of this post, Lew commented that he prefers this term as well.

[ii] I used to view the current CIP v5 and v6 requirements as being almost entirely prescriptive, except for a few notable exceptions like CIP-007 R3 and CIP-010 R4. I now think that the majority of the current requirements and requirement parts are non-prescriptive, perhaps the great majority. I hope to sit down in the not-too-distant future and determine whether each requirement and/or requirement part is prescriptive or not. However, in my opinion there are a few very prescriptive requirements – including CIP-007 R2 and CIP-010 R1 – that require NERC entities to devote inordinate amounts of resources to them, way out of proportion to whatever benefits they provide.

Monday, May 1, 2017

A Clarification to my Last Post

In my last post, I concluded with this paragraph: “And this, folks, is the canary in the coal mine, which has just keeled over dead: It’s clear (to me, anyway) that no new compliance area (like supply chain or virtualization) can be incorporated into CIP unless this is done with non-prescriptive requirements. Indeed, no new prescriptive CIP requirements have been proposed since 2012, when CIP v5 was approved by the NERC membership. Yet it is also now clear to me that no new non-prescriptive standards (or requirements) will ever be freely accepted by the NERC membership until the CIP compliance regime itself has changed. I will be watching to see what happens.”

To summarize the argument in this paragraph:

1. New “compliance areas” won’t be incorporated into CIP unless the requirements implementing them are non-prescriptive.[i] The two examples of new compliance areas that I used are supply chain security and virtualization.

2. But (as discussed in the last post) no new non-prescriptive requirements will be “freely accepted” by the NERC membership until the overall NERC CIP compliance regime has been changed. Of course, I used the phrase “freely accepted”, because I was arguing in that post that, even though CIP-013 was very likely to be approved by NERC, it was going to be approved either without a passing vote by the NERC ballot body (i.e. through use of the Section 321 process) or as a result of a substantial effort by an important industry trade group to “persuade” their members to vote yes on the next ballot, despite what are likely to be substantial misgivings.

3. Therefore (by implication, although I didn’t explicitly state it), only a compliance regime change will allow CIP to be expanded to address these and other new compliance areas.

While I still agree with my conclusion in point 3, I do have two misgivings about point 2:

First, it is still possible for new non-prescriptive requirements to be freely accepted by the NERC ballot body. The changes in CIP-003 (the “LERC” requirement and the new requirement for Transient Cyber Assets used at Low assets) that were recently approved by both the NERC ballot body and the NERC Board were freely approved, although I think the fact that a FERC deadline was looming for the LERC requirement helped get that one passed. I was frankly surprised, given the negative comments by many entities on the second ballot, that this requirement nevertheless received the required super-majority.

However, I now don’t want to go so far as to say that a new area like supply chain can never be implemented through exclusively non-prescriptive requirements – i.e. that the non-prescriptive requirements required to incorporate a new compliance area (such as supply chain security) into CIP will never be freely accepted by the NERC membership. I believe CIP-013 would eventually have been balloted, modified and re-balloted enough times that it would have passed without any external intervention; it is just that FERC’s very aggressive one-year deadline for developing the standard precluded this from happening.

But virtualization is an example of an area that can only be addressed through modifications of a number of existing prescriptive requirements, as well as new requirements and definitions. I see no end to the balloting that will be required to effect all of the changes that might be required to implement virtualization, to the extent that I am close to certain the current CIP Modifications SDT will never complete the mandate in their SAR to incorporate virtualization into CIP.

Second, it may seem that I am implying in the second point above that only if a new standard or requirement is “freely accepted” will it be “legitimate”. I agree that, given the standards approval process described in the NERC Rules of Procedure, the only “normal” process for approving a new standard is one in which it receives the required super-majority of votes; this implies that standards not approved by a super-majority in a ballot are somehow less legitimate than others. However, while I am a great fan of democracy in general, I’m not saying that this is the only way to develop mandatory cyber security requirements. In my opinion, it would be quite legitimate if for example a group composed of stakeholders (mainly NERC entities) would both draft and approve new CIP standards; these would then be submitted to FERC for final approval. Of course, this would require a change to the NERC Rules of Procedure.

What I was actually trying to say by using the phrase “freely accepted” was that, if there isn’t broad consensus among NERC members that a particular standard is a legitimate one worthy of their attention, the entire compliance process – which requires a huge amount of cooperation by the entities themselves – will be thrown into chaos. It will be very hard to convince the NERC membership that they need to comply with a standard if a majority of members believe complying with this standard will simply be a waste of time and money with no benefit to the reliability or stability of the BES. As long as NERC is charged with developing new or revised standards for cyber security, any standard that is approved in such a case will almost inevitably end up being changed or withdrawn before it ever reaches FERC for approval. There is going to have to be substantial input by the entities into the final product, whether or not it is expressed in an actual ballot.

However, implementing a new area of compliance like virtualization (which is, of course, on the docket for the current CIP Modifications drafting team) will require a lot of changes to existing prescriptive requirements, as well as some new requirements, which presumably could be non-prescriptive. To push beyond the points I made in this post, I believe just getting the needed non-prescriptive requirements approved will take literally years (I’m guessing there might be at least five or six new non-prescriptive requirements required to incorporate virtualization into CIP, each of which will take at least six months of the SDT’s full attention[ii]).

But modifying the many existing prescriptive requirements that will need to be changed to accommodate virtualization will take far longer, both because there will be more of them but also because there are likely to be epic battles over the wording of the changes; in prescriptive requirements, the wording is everything. This is the main reason why I believe the drafting team will never complete their virtualization mandate.

The same reasoning would apply to any future effort to allow entities to put BCS in the cloud (as opposed to putting BCS Information in the cloud, which is already permitted by the existing CIP requirements), although such an effort is not even being contemplated at this point. Besides probably some new requirements, this effort would require substantial changes to almost all of the existing CIP requirements, prescriptive and non-prescriptive. I think trying to make these changes in the existing set of CIP requirements would require something like a generation of effort, so SDT members would need to be recruited fresh out of college with the somewhat slim hope that they will finally accomplish their goal before they retire. Not exactly a formula for success, IMHO.

To summarize, in this post I tried to clarify the conclusion of my last post (not sure I succeeded, though). Instead of implying (as I did in that post) that there can never be any expansion of CIP to address new areas of compliance, I want to say that there can be expansion that doesn’t require changes to existing CIP requirements (as is the case for CIP-013. Even though the new draft will include changes in several existing CIP requirements as well as a revised CIP-013, the wording of the existing requirements themselves wasn’t modified, just added to). But for areas like virtualization and BCS in the cloud that do require changes to existing requirements, I believe that expanding CIP to cover these areas will never be possible until all of the CIP standards are rewritten from scratch, and incorporated into a new compliance regime similar to the one I outlined in my last post.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte.

[i] In fact, as I will point out in my next post, no new prescriptive CIP requirements at all have been drafted since CIP v5. This is not by chance: Both NERC and FERC now realize that non-prescriptive requirements are the only way to go when it comes to cyber security.

[ii][ii] I’m basing this estimate on the fact that the LERC requirement – which was non-prescriptive – took more than six full months of the SDT’s time, using my informal observations. And this requirement was a modification of an existing prescriptive requirement. The SDT made it non-prescriptive and thus allowed entities more options for complying with it, without removing any options they previously had, yet it still was greeted with a lot of concerns and even outright hostility. So my guess is that getting a brand new non-prescriptive requirement approved would take significantly more of the SDT’s time than six months – which is why I think my estimate of six months for each new requirement is quite conservative.