Tom Alrich's Blog: July 2018

Monday, July 30, 2018

A smoking gun

A long-time colleague wrote in to me last Friday regarding Thursday’s post. He pointed out to me that, not only were the statements from DHS staff members in the briefings on the Russian hacking of the grid misleading, but at least two slides they showed had text that directly conflicted with the statement from a DHS spokesperson, which I had quoted in Thursday’s post: “While hundreds of energy and non-energy companies were targeted, the incident where they gained access to the industrial control system was a very small generation asset that would not have had any impact on the larger grid if taken offline.”

Yet here are the statements from slides 18 and 19 of the presentation at the Wednesday briefing:

(slide 18) “Used initial compromised vendor to access several U.S. energy utilities and IT service providers”
(slide 19) “Leveraged early victim to gain entry to two previously accessed utilities and one new victim”

The combination of these two statements leads to the conclusion that a minimum of three “energy utilities” were “accessed”, as opposed to the one small generating plant (which most likely wasn’t owned by an electric utility at all) in the DHS spokesperson’s statement.[i] If DHS wants to come out and say the spokesperson’s statement was wrong and three utilities were actually accessed, so be it. But I certainly haven’t heard of that happening (Kristjen N, if that is in fact the case, please email me at the address below).

If even three electric utilities had their control centers (and presumably their EMS systems) compromised, that would be a bad thing, since a simultaneous attack on all three could possibly lead to three widespread outages, although probably not a cascading outage (like in 2003); there would then be justification for raising the alarm flags. But here we’re talking about the control room of a single very small generating plant that by DHS’ own admission doesn’t have any real impact on the “larger grid”. In my opinion, this fact, combined with the fact that hundreds of “utilities” were attacked by the Russians, leads me to believe that the industry’s defenses are in pretty good shape, not the exact opposite. This is a wakeup call, but not to cyber weakness in general at utilities. Rather, it’s a call for all utilities and IPPs to beef up defenses against supply chain attacks (as I pointed out in the first post in this series).

Yet the idea that the exact opposite is indeed the case seems to be spreading very rapidly. I had two new articles called to my attention today, including this one contributed by John Hargrove of Sam Houston Electric Coop, and this one contributed by another friend. I’m sure there will be others. Both of these articles include a quote (in fact the same one, even though it was delivered by email) by Robert Lee of Dragos. Taking DHS at their word that “utilities” had had their “control rooms” penetrated[ii], Robert points out that the activities in question – purely reconnaissance – wouldn’t be enough to be able to cause an outage.

However, Robert didn’t need to go this far. It turns out no utility control centers were penetrated, period. And even if the generating plant whose control room was penetrated was a very large one, and even if several similar generating plants were also penetrated, this would be far from a danger to the grid itself, as discussed in this post.

In other words, even though the DHS people who put together the briefings (and didn’t provide any immediate corrections when the alarming news stories started flying) were only trying to call attention to a problem, by exaggerating what had happened they have damaged their credibility for future advisories. I hope it isn’t fatally damaged, because they (specifically, the ICS-Cert) do a lot of really excellent work!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013. And if you’re a security vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss any of this, you can email me at the same address.

[i] I suppose you could interpret “accessed” to mean the attackers got into the IT network of the utility, but not the OT network; but of course this doesn’t mean they’re any closer to achieving their goal of being able to manipulate control systems (which are on the OT network, or should be) to cause an outage. In any case, if this is what DHS meant by “access”, they certainly have never stated that.

[ii] As I pointed out in my first post about this problem, to speak of an electric utility’s “control room” is essentially a non-sequitur, such as speaking of the Pope’s yarmulke. A control room controls an individual generating plant or substation, and is usually located at that asset. Utilities have control centers, which control many assets that generate and transmit power, as well as the assets like distribution substations that deliver that power to customers. But the single small generating plant that was actually penetrated is almost certainly not owned by a utility (most plants are owned by independent power producers, especially the small ones), and in any case its control room doesn’t control anything more than the plant itself.

Saturday, July 28, 2018

The Russians are coming! The Russians are coming!

The above is the title of a really hilarious film I remember from my childhood, in which – at the height of the Cold War – a Russian sub runs aground near a small island off New England. Crew members head into town to find a boat to pull them off, and in the process some of the townspeople become convinced they are the spearhead of an invasion, and almost ignite World War III. It seems we have a modern-day version of this film playing out with DHS, since their exaggeration of the success of Russian hackers in penetrating the US power grid is unfortunately becoming a fast-spreading meme that may be unstoppable.[i]

Here is an excerpt from an article that appeared on the New York Times website on Friday:

This week, the Department of Homeland Security reported that over the last year, Russia’s military intelligence agency had infiltrated the control rooms of power plants across the United States. In theory, that could enable it to take control of parts of the grid by remote control.[ii]

Yesterday evening, after seeing this article, I sent the following letter to the news editors of the Times (which by the way I think is a great paper, very dedicated to finding the truth. But being dedicated to the truth doesn’t mean you can’t be misled by people in government who have more information than you do, and have exaggerated an already-serious situation, for whatever reason):

Please stop promoting the story that the Russians have substantially penetrated the US power grid. While that was the tenor of DHS' initial briefing, it turns out DHS was wildly exaggerating. While hundreds of power plants (not utilities per se) were targeted by the Russians, they succeeded in penetrating the control systems of exactly one very small generating plant, which by DHS' own admission would have no significant impact on the power grid:

"While hundreds of energy and non-energy companies were targeted, the incident where they gained access to the industrial control system was a very small generation asset that would not have had any impact on the larger grid if taken offline." (this is a quote from DHS spokesperson Lesley Fulop, which appeared in an article on Power magazine's website on July 24)

Of course, it is true that the Russians are targeting the power grid constantly, and as your article points out, this has stepped up lately as election hacking seems to have fallen out of fashion in Russia. However, so far they have made no significant headway. Electric utilities in the US have invested very heavily in cyber security and continue to do so. While the utilities need to step up their efforts even further - and they are doing so - there is no need for Americans to lose sleep worrying whether a major cyber attack will bring down the US power grid. It isn't going to happen.

I sincerely doubt we’ve heard the end of this story.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] The biggest difference between the film and the current situation is that the Russian hackers are actually malign – or at least they’re being paid to be such. The Russian sailors in the film had nothing but good will toward the Americans, and the film had a very happy ending.

One reason the film was so funny is that one of its stars was Jonathan Winters, perhaps the funniest man that ever lived. He could have read the phone book and had you in stitches.

[ii] While the article does go on to point out that the hackers made no attempt to actually take control of the plants (which is also what DHS said), it repeats the canard that a large number of “control rooms” were penetrated – leaving open the possibility that malware has been implanted, so that just a single future signal would bring down scores of generating plants. This is simply not true. One very small plant was penetrated, and I’m sure it’s probably been made one of the most secure power plants in the world after this incident was discovered.

Thursday, July 26, 2018

WTF???

I’ve been meaning to tell everybody about a wonderful group called the Western Transmission Forum…OK, that’s not really what this title refers to. It really describes my feelings when I found out today that the number of assets that were actually penetrated by the Russian attackers, that DHS has been thoroughly publicizing this week, wasn’t “hundreds” (as at least some people who attended the Monday DHS webinar thought was said, including the Wall Street Journal, whose article on Tuesday kicked off a frenzy); and that it also wasn’t just multiple assets (as was clearly implied in the webinar I attended yesterday. I estimated in my post yesterday that under 25 generation assets were impacted, and they were all either Low impact BES assets or distribution assets, meaning they were rated at less than 75 MW).

No, I learned today, from an article on Power Magazine’s web site, and confirmed with a source who knew the contents of Congressional briefings by DHS, that the true number of assets compromised was….envelope, please….one. And by the way, it was a very insignificant generating plant whose loss would have no impact on the grid.

Here is a quote in the Power article from Lesley Fulop of DHS: “While hundreds of energy and non-energy companies were targeted, the incident where they gained access to the industrial control system was a very small generation asset that would not have had any impact on the larger grid if taken offline.”

I can’t speak for what was said in the Monday webinar, since I didn’t attend that (evidently there were some technical problems during the webinar, so some people may not have heard it all and may have extrapolated “facts” that weren’t actually presented). But here are some of the points that I know were made in yesterday’s webinar (although of course these aren’t exact quotes since I don’t have a transcript),

Hundreds of assets were “targeted or affected”. Probably having seen the WSJ article, which came out the day before, the presenters were trying to dispel the idea that hundreds of assets were affected when they said this. However, a much better way to describe the situation would have been to say “Hundreds of assets were targeted, but fortunately only one was affected”. Even in our current “post-truth” political environment, this is a little bit too much of an exaggeration to be inadvertent.
Generation, transmission and distribution assets (note plural) were “targeted or affected”. If the DHS people had really wanted to be accurate, they would have said “Generation, transmission and distribution assets were targeted, but only one small generation asset[i] was affected.”
“All victims” had externally-facing, single-factor-authenticated VPN systems (of course, one of the points of the webinar was that multi-factor authentication would have prevented these attacks – although “this attack” would have been more accurate – from occurring). The plural of victims certainly indicates that more than one asset was compromised.
In some cases, victims’ (note plural) primary remote-access systems had two-factor authentication but they also had single-factor-authenticated systems as well – and this was how the attackers got in. Again, it’s hard to reconcile this sentence with the fact that there was only a single victim.

What does this mean for my post yesterday? In the post, I pointed to two primary lessons to be learned. The first was “If anybody had any doubt that supply chain security is the number one cyber security issue for the electric power industry today – as well as for probably most other industries as well – there is now a smoking gun.”

I still stand by this lesson 100%, although it’s clear that the smoking gun described by DHS was actually a pellet gun that had given one victim a superficial skin wound. Starting with the Target breach, and going forward to NotPetya and other breaches, it’s now clear that cyber attackers who are aiming at sophisticated targets (as opposed to “spray and pray” attackers like ransomware or cryptominers) realize that the way to achieve their goals isn’t to mount a full assault on the front gates of the castle, but to break the single lock on the small back door where the tradesmen come in – in other words, the supply chain. The fact that the Russians only succeeded with one target so far doesn’t mean they and others won’t keep trying, and refining their methods.

My second lesson learned, set out in the last paragraph in the post, was that NERC, FERC and the trade associations should look at whether the CIP requirements applying to Low impact assets should be made stronger. I still stand by this, because I know that these parties are always considering that question. They may at some point decide to take further steps (FERC raised that possibility in their NOPR for CIP-003-7 last fall, although they dropped the idea when they actually approved CIP-003-7 in April) – but I certainly don’t believe now that there is any sort of emergency requiring action (and as you’ll see if you read the last paragraph of yesterday’s post, I didn’t believe it was an emergency then, either).

In the last sentence of the post, I pointed out that “..the PUCs need to start thinking seriously about how to get owners/operators of purely distribution assets more concerned about supply chain security.” I still stand by that conclusion, since a) the one asset compromised was obviously a distribution asset (a generating plant < 75 MW), and b) while a few PUCs have developed cyber regulations for their utilities (the best of which is New Jersey’s, although I’ll admit it’s a year or two since I’ve looked into this, so some other state may have stepped up), I don’t think any PUC has implemented supply chain cyber security regulations for their utilities.

For DHS (specifically the ICS-Cert and NCCIC, who did the investigation and conducted the briefing), I’d just like to say that you people have clearly done a great job of tracking how the Russian attackers worked (and presumably are working now); I highly recommend that anyone who didn’t attend one of the briefings this week attend one of the two briefings next week, and/or download the alert that was put out in March.

On the other hand, DHS, I can’t understand why you would want to pretend that a lot of assets had been penetrated, when it was only one small one. By doing so, you raised this threat from one that all power industry asset owners should be aware of and should be taking steps to prevent, to something approaching an imminent threat to our national security. And it just isn’t that.

8/1 - It turns out that the "hundreds of utilities" that might have been compromised is now down to less than the single small plant that I believed when I wrote this post. It's now just a couple of wind turbines (which are of course part of a wind farm that might be hundreds or thousands of turbines), as was revealed by DHS in their meeting with CEOs (and Mike Pence and Rick Perry) in New York City yesterday. It is simply amazing that the DHS people who presented at least the first two briefings didn't do anything to dampen down the erroneous news articles about what had happened, and indeed encouraged it by the misdirection in what they said.

Here is an excellent article on that meeting - which Blake attended - by Blake Sobczak of E&E News)

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] Presumably, this single generation asset was the compromised asset, a screen shot of whose HMI was shown in the webinar yesterday – the presenters said it had been uploaded by the attackers. Of course, in the webinar the presenter didn’t mention that this was the only asset that was compromised; from what he said, it sounded like taking screen shots was the modus operandi of the attackers, which DHS had seen in multiple instances. Obviously, unless an asset was actually penetrated, not simply targeted, there would be no screen shot available.

Wednesday, July 25, 2018

What lessons can we learn from the Russia hacks?

Yesterday, the Wall Street Journal published an article titled “Russia hacks its way into U.S. utilities”.[i] Based on a Department of Homeland Security briefing, it says that Russian hackers “claimed hundreds of victims last year” in a campaign that “put them inside the control rooms of U.S. electric utilities..” Of course, DHS has said before that Russian hackers are targeting US electric utilities, but the scale of the attacks, and the fact that many “control rooms” were seemingly penetrated, hadn’t previously been disclosed. The article was based on a DHS briefing on Monday, which was repeated today (I attended it, and found it very good). It will be repeated next Monday and Wednesday; information should be available on the ICS-CERT web site.

Clearly, there are lessons to be learned from this, both for power industry cyber security in general and for the NERC CIP standards in particular, although some of these lessons will be contingent on getting further information from DHS. But there is one lesson that can be stated unequivocally: The attacks uncovered by DHS came through suppliers – often “smaller companies without big budgets for cybersecurity”.

So there you have it: If anybody had any doubt that supply chain security is the number one cyber security issue for the electric power industry today – as well as for probably most other industries as well – there is now a smoking gun. I wish I could say that CIP-013 is coming just when it’s needed most, and that it will go a long way to solving this problem, but I’m afraid that would be a big overstatement. CIP-013 will certainly contribute to solving the problem, but will accomplish nowhere near what it could if it were written differently, or even if NERC took a different approach to enforcing it than the approach that it appears they’re taking (I realize I am speaking darkly here. I hope to have a post on this question out in the very near future).

There is a good reason why supply chain security is so important: The bad guys seem to have figured out that their prime targets – bigger organizations, and utilities in particular – are doing a pretty good job of securing themselves. So instead of battering their heads against the high barriers that have been set up to keep them out, they’re targeting the soft underbelly of these organizations – the supplier organizations that they trust. Now the “electronic security perimeter” needs to be extended – in some way – to any organization that interacts with your organization electronically.

But beyond this, it’s impossible to draw any further conclusions; this is because there are a few questions that DHS needs to answer first. In the rest of this post, I’ll state what those questions are. But, since I know that answers from DHS might not be forthcoming immediately, I’ll describe several “scenarios” based on assumptions of what those answers might be. And for each of those scenarios, I’ll point out the conclusions that I think can be drawn if that turns out to be the correct meaning of what DHS said. Here are my questions for DHS:

First, I’d like to know if the assets that were compromised[ii] were distribution or Bulk Electric System assets. If the former, it means that it would be very hard to cause a widespread outage (let alone a cascading outage), unless the “hundreds” of assets were in one concentrated region. And it also means the assets are under the jurisdiction of the state Public Utility Commissions; any sort of regulations or even guidance should probably come through the PUCs.[iii] If the assets were BES assets (or even if they were a mixture of the two types), it would be a much more serious problem, since – depending on the assets involved and their locations – a widespread outage could be the result of a coordinated attack, even a cascading one. In the rest of this discussion, I’ll assume we’re talking about BES assets.

Second, I’d like to know the true nature of the attack. In the second paragraph of the WSJ article, it says that state-sponsored groups “broke into supposedly secure, ‘air-gapped’ networks…with relative ease” by first penetrating supplier networks. This presumably means that the attackers took advantage of the fact that these suppliers had access into important cyber systems at the assets being attacked. There are two ways that this access could be facilitated at the asset end.

The first way is interactive remote access, meaning a human being (in this case a hacker with stolen credentials, so they’re impersonating a real user) logs into an OT system located at the targeted asset. If the asset is subject to the NERC CIP standards as Medium or High impact, then it should have an Intermediate System set up to intercept the communications, authenticate the user using two-factor authentication, then proxy the user’s communications with the OT system. This means that, if Medium or High impact BES assets are being compromised, either the asset owners aren’t complying with CIP-005 R2 or the attackers have somehow bypassed the two-factor authentication (which of course would be big news in itself)[iv]. On the other hand, if the assets being compromised are Low impact (and if they’re BES, they have to be High, Medium or Low impact), it will be incumbent on NERC, FERC and the trade associations to look at either more regulation for Lows or perhaps some sort of strong guidelines.

The other way that remote access can be facilitated is machine-to-machine, in which a vendor system has direct access to an OT device at the asset. This isn’t currently covered by the CIP standards at all, but there will be controls required for machine-to-machine access (CIP-005 R2.4 and R2.5); they will be implemented when CIP-013 is implemented. During today’s webinar the main speaker made clear that all of the remote access conducted by the attackers was interactive.

Third, you say “utilities” have been attacked, yet you speak of “control rooms” being penetrated. A control room is usually found in a single generating plant or substation, since it’s defined by NERC as just controlling a single Bulk Electric System asset; the majority of generating plants aren’t owned by utilities but by independent power producers. A control center controls multiple BES assets; these are the real “heart” of the BES, and if even tens of control centers were compromised (let alone hundreds!), that would be a very big problem indeed. Do you really mean “utilities”? And if so, do you really mean “control rooms”? In that case, many of the generating plants that were attacked were undoubtedly owned by IPPs, not by utilities. On the other hand, if you mean “control centers”, about how many of those were penetrated? This is important for me to know, since if hundreds of utility control centers are compromised by the Russians (or even ten large control centers), I’m going to slaughter my chickens and book the next flight to New Zealand.

On the other hand, if you didn’t mean just utilities were being targeted, and you also meant it when you said control rooms, then this would most likely primarily be either an attack on generating plants. I pointed out at the end of this post that it would be close to impossible to actually cause a big outage just by attacking generation – you would have to attack a large number of small plants (or single units of larger plants) in one region simultaneously, and even then any outage would probably be quickly corrected by diverting power from other regions. But regardless of whether a big outage could be caused, it would be a matter of great concern if many BES generating plants had been penetrated.

If the generating plants being penetrated are Medium impact and haven’t been segmented so that they have no Medium BES Cyber Systems, then they are supposed to be complying with CIP-005 R2. Either they’re not actually complying or the attackers have figured out how to break or bypass two-factor authentication. And if the generating plants are Low impact (or segmented Medium-impact plants), once again it will be incumbent on NERC, FERC and the trade associations to look at either more regulation for Lows or some sort of strong guidelines.

What if a significant number of the assets being compromised are transmission substations? That’s a different story. If an attacker were to thoroughly “own” a single significant transmission substation (perhaps controlling multiple lines rated at 345 kV or higher), I believe they could cause a wide-scale outage.[v] And if they owned about 3 or 4 such substations – again in the same region – it could be a really serious outage.

Again, if the attackers broke into Medium impact transmission substations, then the owners/operators of those substations were in violation of CIP-005 R2 – since that requires two-factor authentication, and in the webinar today the presenters said that all systems penetrated used single-factor authentication (i.e. username and password). This means that, if the attackers broke into transmission substations at all, it must have been Lows. And since Lows don’t have to deploy an Intermediate Server for CIP compliance, once again it will be incumbent on NERC, FERC and the trade associations to look at either more regulation for Lows or some sort of strong guidelines.

What do I think is the most likely of all the above scenarios? I would guess the number of assets actually penetrated is no higher than 25. Further, I would guess that they were mostly generating plants (the webinar presenters showed an HMI screen – heavily redacted, of course – that was uploaded by the attackers from one “victim”. It was clearly a generation asset), and they were all either distribution assets or Low impact BES assets.

So if I’m right, the other main lesson to be learned from this briefing (besides the lesson about the critical importance of supply chain security) is that FERC, NERC and the trades should decide whether increased regulation is appropriate for Lows (and the at least partial extension of CIP-013 to Lows would probably be the right vehicle for that, since FERC is already thinking about that anyway. Of course, the extension wouldn’t be included in the first version of CIP-013, but FERC would order it for the next version, meaning it’s 5-6 years away now), or whether some heavily-guided “voluntary” standard would be more appropriate. Plus, the PUCs need to start thinking more seriously about how to get owners/operators of purely distribution assets more concerned about supply chain security.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] The WSJ’s online edition is behind a paywall, so I can’t provide a link to the article. I have it in hard copy, so if you want to drop me an email at the above address, I’ll send you a scan of it.

[ii] Even though the WSJ article stated there were “hundreds of victims”, in the DHS briefing today they made it clear that a “victim” wasn’t necessarily compromised, just targeted. There definitely were some grid assets that were compromised, but it’s not clear how many there were.

[iii] Unfortunately, it’s a great simplification to say that the utilities in each state are subject to that state’s PUCs, since in fact it’s only the investor-owned utilities that are. If DHS wanted to promulgate any mandatory standards for all distribution assets, they would probably have to first get authorization from Congress. On the other hand, I can certainly see the trade associations (especially APPA, NRECA, EEI and EPSA) getting together and writing “voluntary” standards for distribution assets, working with DHS.

[iv] In the webinar today, the DHS people said that the attackers always got in through single-factor-authenticated systems, meaning they didn’t bypass two-factor authentication. Yet they also mentioned that they (DHS) were urgently looking to see if the attackers had penetrated any two-factor-authenticated systems, and if so how. So it seems they’re saying that, just because the attackers didn’t succeed in bypassing (or hacking) 2FA systems, they might have at least tried to.

They pointed out that some of the assets that were penetrated had an intermediate system using 2FA, but those assets had also deployed some “ancillary” remote access systems that used 1FA. Of course, if any of these assets were Medium impact for CIP, they were in violation if they did that; under CIP-005 R2, all interactive remote access must go through an Intermediate System with 2FA.

[v] In 2008, a fault at a single transmission substation in Florida caused an outage that affected most of southern Florida, and led to $20 million in fines by NERC and FERC, plus $5 million in remediation. I’ve heard that the BES effects were felt within a second in Canada.

Monday, July 23, 2018

Debate with an auditor on CIP-014

In my last post, I lauded the NERC CIP Modifications drafting team for coming up with two great ideas for incorporating virtualization (or any new technology that affects fundamental definitions – the cloud is another example) into the CIP standards. That post was about the second of those ideas, the idea that the truly prescriptive CIP requirements need to be made non-prescriptive (although I don’t agree with their use of the term “objectives-based”, since that requires a measurable objective, and cyber security objectives aren’t measurable). Since almost all new CIP requirements since CIP v5 have been plan-based, I think that is the correct term to use now.

My concern with the SDT’s idea was that just making a prescriptive requirement (like CIP-007 R2 or CIP-005 R1) non-prescriptive isn’t the whole story on what needs to be done. It is important to keep in mind – as if anyone has forgotten! – that NERC’s auditing procedures are very prescriptive; you either did exactly what the requirement says or you didn’t. This works well for the 693 standards (in fact, it’s really the only way you could audit those). But it really misses the mark on the non-prescriptive CIP requirements, since if one of those isn’t written carefully, it becomes un-auditable.

The first example I used was CIP-014. I said “Three good examples of this are CIP-014 R1, R4 and R5. In a post last year, I discussed two entities (from the same Region) that both told me the same story: They had been dinged by an auditor for not taking specific steps to protect transformers located in their substations in scope for CIP-014. Their mistake was taking the words of these three requirements literally, since all three only talk about protecting the substation itself, not any equipment located in it.”

In the post I referred to, I had pointed out that auditors (from the same region) gave one of these entities a Potential Non-Compliance (PNC) finding (which can lead to a violation finding), and the other an Area of Concern (which is a non-mandatory recommendation to remediate a problem discovered by the auditor) because they had focused on protecting the whole substation, not particular pieces of equipment in it (in particular transformers). The problem is that all three of these requirements refer only to protecting the substation; nothing about equipment in it. Each of these entities had engaged an outside firm (different ones) to develop their threat and vulnerability assessment (mandated by CIP-014 R4), and the threats identified in that were all just to the total substation. So their physical security plans (mandated by R5) just focused on mitigating those threats.

Both of these entities were cited for not specifically including the transformers in their physical security plans. Yet R5 just states that the entity needs to develop a physical security plan “that covers their respective Transmission station(s), Transmission substation(s), and primary control center(s).” Notice there’s nothing about protecting transformers or other equipment here.

I got an email the next day from an auditor, who said that “CIP-014 requires a risk assessment and then a physical security plan for those assets that are identified in the risk assessment. The plan has to address physical security measures that ‘deter, detect, delay, assess, communicate, and respond’ to potential physical threats and vulnerabilities that were identified during the vulnerability assessment conducted upon the identified assets.”

He then went on to describe specific physical threats against transformers, and said these need to be protected against in the physical security plan. Since what he said sounds like good advice for anyone protecting a substation, I am reproducing it below. But in response to his sentence I just quoted, I responded that, while both entities deserved to receive an Area of Concern notice (since CIP-014 came about because of the Metcalf attack, which disabled transformers), neither of them had violated the strict wording of any of the CIP-014 requirements, so a PNC should be out of the question.[i]

The auditor’s reply led with the assertion that “the substation is simply a container of stuff, and CIP-014 expects you to protect the stuff.” He went on to give some good physical security observations (which I also reproduce below), and then concluded “Just remember, administrative law is largely based on what a reasonable, qualified person would do. The auditor has to determine if what the entity did was enough to meet the stated objective of the requirement. If the auditor finds that the entity failed to achieve the objective, the auditor will find a PNC. We really do not need highly prescriptive requirements in order to audit.”

My reply simply said that either he or I might be right, but that my point in writing the last post (which I now realize I didn’t actually state in the post – my bad) was to provide advice to the SDT, that might let them avoid a mistake like the CIP-014 SDT seems to have made[ii] by not explicitly stating in R4 that the threat and vulnerability assessment needs to look at threats to the Facilities (i.e. the equipment) in the substation, not just the total substation itself. If they had just included a sentence to that effect in CIP-014 R1, R4 and R5, we wouldn’t have to talk about these auditing problems with plan-based requirements like these, and probably in a couple years with CIP-013-1 R1.1 (see the second end note for more discussion on this).

The auditor’s CIP-014 compliance advice

(from the auditor’s first email)

“CIP-014 requires a risk assessment and then a physical security plan for those assets that are identified in the risk assessment. The plan has to address physical security measures that “deter, detect, delay, assess, communicate, and respond” to potential physical threats and vulnerabilities that were identified during the vulnerability assessment conducted upon the identified assets.

“So, what does that mean. Yes, you need to mitigate against the threat of a malicious actor entering the physical confines of the substation. However, you also have to consider and address threats and vulnerabilities that can be exploited from outside the perimeter fence line. For example, I can take a .50 cal Barret rifle and punch holes in a transformer from a considerable standoff distance, as long as I have line of sight target acquisition ability (although a good old AK-47 works quite well as was demonstrated in the Metcalf attack). That is a vulnerability. How do I address that? By blocking or preventing the line of sight in some manner.

“Transformers are, in a sense, big boxes. And, they are high dollar, extremely long lead time items to replace if destroyed (last I heard, it can take 18 months or longer to get a new 500 kV transformer, and they are not built in the USA). My substation perimeter fence will deter and delay someone from gaining physical access to the transformer. And I can have sensors and camera systems to detect a breach of the fence line. The fence will not deter a standoff shooter who can see the “box” in the weapon’s sights. So, I have to somehow prevent the line of sight target acquisition to mitigate that vulnerability. I do that with tall ballistic barriers (e.g., concrete walls) around the substation perimeter if the terrain is flat and there are no high points that can peer over the barrier. But if there are hills, trees, or other high points offering a shooter a look down – shoot down advantage, I have to move the barriers closer to the potential target (it is all about the angles). Of course, if I own the land, I can cut down the trees. I can put anti-climb devices on the nearby transmission towers. I have to do something to deter a shooter from afar. The vulnerability assessment, if properly performed, will have identified the target lanes where a shooter can acquire the transformer as a target. If I do not address that vulnerability, then my plan is inadequate.”

(from the auditor’s second email)

“As a furtherance of my comments, I would point out that the entity if required to perform a vulnerability assessment. So what are some possible vulnerabilities? Immediately coming to mind are:

· Physical intrusion into the substation yard (with or without gaining entry into the control house)

· Can include climbing the fence, cutting the fence

· Can include vehicle-based breach

· Weapons discharge into the substation yard from outside the fence line

· Lofted bombs (explosive, incendiary) from outside the fence line

· VBIED (Vehicle-Born Improvised Explosive Device) – inside the perimeter after penetration, or outside the fence line where the blast perimeter reaches to critical equipment

· Airborne (drone) delivered explosives

· Launched or airborne delivered metallic material designed to short out equipment

Some things you can protect from by mitigating measures (airborne threats not so much). Relying on local law enforcement response is a non-starter since response time far exceeds the exploit time. Response time is very much dependent on where the substation is in relation to LEO and could easily exceed 30 minutes in many places. Therefore relying on cameras and sensors to watch an attack unfold as your primary defense is not all that helpful. To the extent that you can deter an attack, the better off you are. That means penetration-resistant perimeter barriers, line-of-sight obscuration, outward-looking camera systems with analytics capability, lighting considerations (including normally dark, only lighting up when a perimeter breach is detected), two-way voice communication from the SOC, etc. Wall heights and barrier placement are driven by local conditions. Lighting and audible alarms/communications may be limited by local ordinances). But the bottom line is that a chain link fence with a padlock is effective for a few seconds at best. Hardly a delaying property and certainly not a deterrent.”

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] I believe the entity that got an AoC was originally going to receive a PNC, but they successfully fought back against this auditor. The other entity either was cowed into not doing this (I believe the auditor was the same person), or perhaps the fact that they were one of the first entities audited for CIP-014 in this region worked against them.

[ii] Although I pointed out at the end of the post from last year that this mistake – and another I discussed briefly – can readily be excused by the fact that FERC gave NERC only 90 days to draft, ballot and approve the standard, and have it to FERC to sign. When you set a very aggressive deadline like that, it’s almost inevitable that mistakes will be made – and in this case the biggest mistake seems to have made CIP-014 R1, R4 and R5 mostly, if not completely, un-auditable.

Unfortunately, the CIP-013 drafting team seems to have made a very similar mistake. As I’ve pointed out previously, CIP-013 R1.1 mandates that the entity “identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services…” – but it doesn’t provide any guidelines on what types of risk need to be addressed.

So an auditor who thought that a particular risk like “the vendor will buy chips from the cheapest source, without fully vetting those sources for trustworthiness” should be addressed in the entity’s plan will be in the same position as the regional auditor who felt strongly that the two entities I wrote about last year should have included measures to protect their transformers, not the whole substation. The auditor might be absolutely right from a security point of view, but if the requirement doesn’t state particular classes of risk that need to be addressed (as is done in CIP-010 R4 Attachment 1, which I think is the best plan-based requirement so far), then there is nothing that can be audited, other than whether or not the entity produced any sort of credible plan.

Wednesday, July 18, 2018

The SDT Breaks new Ground – Part 3

In my last post, I continued to applaud the CIP Modifications standards drafting team for breaking out of what had seemed to me to be a dead-end program for introducing virtualization into the CIP standards. I pointed out that they accomplished this “breakout” by coming up with two Great Ideas (this is my term. It’s not in the NERC Glossary). But I also pointed out – very nicely, of course – that one of their two Great Ideas was not likely to succeed unless another change was also made.

This particular Great Idea was that the prescriptive requirements in the currently-enforced CIP standards (like CIP-007 R2, CIP-005 R1 and CIP-010 R1) should all be written non-prescriptively, in the manner of CIP-007-6 R3. The question I asked in the post was “How will these requirements be audited?” And I wasn’t alone in asking this question. In the SDT’s webinar on virtualization recently, SDT members said that – or some variation of it – was the most frequently submitted question.

And well it should be. The problem is that NERC’s auditing procedures – like those of almost any other organization that does audits – ultimately come down to determining whether the entity has or hasn’t done specific things that are required. This of course works fine for prescriptive requirements, where the whole point of the audit is determining whether or not the entity has done – or not done, as the case may be – the particular things specified in the requirement.

But this doesn’t work so well for two reasons. First, it doesn’t work well when there is ambiguity in a prescriptive requirement, or in a definition that underlies it. For example, CIP-010 R1.1.3 requires the entity to include in its baseline any “custom software” that is installed on the system. Of course, a program written for a particular purpose definitely counts here. But how about other pieces of software like scripts? There isn’t a definition of “custom software” that can settle this question, so an auditor can’t issue a PNC (or at least it won’t be upheld) in a case where an entity isn’t including scripts in their baselines.

The other reason is because a requirement is non-prescriptive. Three good examples of this are CIP-014 R1, R4 and R5. In a post last year, I discussed two entities (from the same Region) that both told me the same story: They had been dinged by an auditor for not taking specific steps to protect transformers located in their substations in scope for CIP-014. Their mistake was taking the words of these three requirements literally, since all three only talk about protecting the substation itself, not any equipment located in it.

Unfortunately, if you take away all the prescriptive requirements that have ambiguous language in CIP, as well as all the non-prescriptive requirements, there aren’t many requirements left! What this means is there are very few current CIP requirements for which there isn’t any question on how they will be audited.

Of course, despite this fact, I haven’t heard too many horror stories about an entity and an auditor being completely at loggerheads about a particular issue, to the extent that the entity ends up getting a PNC or even just a very stern warning. Why is this the case?

It certainly isn’t the case because NERC provided lots of guidance for both the entities and the auditors. NERC has tried to provide CIP guidance that would be in some way “binding” for entities and auditors for many years, but in the end it has always had to be removed[i]. And for good reason: There simply is no way, under the NERC Rules of Procedure, for NERC itself to provide any document – other than the standards themselves - that would guide the content of an audit. The only way to fix a hole or an ambiguity in a requirement is to write a Standards Authorization Request to draft a new requirement or revise an existing one[ii]; and that’s part of the CIP Modifications team’s current agenda.

No, the reason auditing has worked fairly well, despite the many ambiguous and non-prescriptive requirements in the currently-enforced CIP standards, is that the Regions have stepped up and provided the “guidance” that NERC can’t provide. They will almost never do this in writing, but you can usually get questions answered verbally (and one Region takes this a step further, by actively offering “assist visits” to help entities put in place good cybersecurity practices. And you can be quite sure that those practices are compliant!).

But this is by no means an ideal situation. Anytime an auditor speaks to you about a requirement – or speaks in a Regional workshop – they will make it clear that what they say is simply their personal opinion, not that of the Region (and perhaps not that of the other auditors). And they mean this; I’ve heard multiple stories about entities who were told one thing by an auditor at their office, but another thing by an auditor (in one case it was the same one), when they came onsite to do an actual audit. In cases where the auditors themselves are clearly confused by a requirement, I think it’s highly unlikely that an entity will receive a PNC for violating it, but I’m sure this wears on CIP compliance professionals who have to live with this uncertainty.

Why am I bringing all of this up now? Because the SDT is now proposing to make prescriptive requirements like CIP-007 R3 and CIP-005 R1 non-prescriptive. Non-prescriptivity is a good thing, but it needs to be accompanied by a different audit regimen than NERC has now. With non-prescriptive requirements, there’s no getting around the fundamental problem: If a requirement just mandates that you do something like develop or implement a plan, but it doesn’t provide a specific list of topics that you need to include in your plan, there is nothing to audit – except whether or not you developed or implemented the plan at all.

Of course, if you clearly didn’t develop a serious plan, or you barely tried to implement it, you could still be held in violation. But for anything else, like missing a particular element in your plan that the auditor thinks should be there, you can’t receive a violation – although the auditor can still give you a hard time. Again, this situation is wearing on CIP compliance professionals. And to be honest, if the SDT simply changes a few prescriptive requirements like CIP-007 R2 to non-prescriptive ones, I think they’re going to have a hard time getting this passed. I don’t think too many CIP professionals are out looking for another anxiety-producing requirement. They may prefer just to keep the prescriptive ones, simply because by now they know what the auditors want to see, and have put all that in place. A prescriptive requirement may require a lot of work, but at least there isn’t a lot of uncertainty with that work – the old story of sticking with the devil you know, rather than the devil you don’t know.

So what’s the solution to this problem? I’m not at all advocating that the SDT give up the idea of making CIP standards as non-prescriptive as possible, but I also don’t want to see more CIP teams having to order Maalox™ by the case load.

The ideal solution would be if the CIP enforcement process could move from one based on auditing to one based on the Regions and the entities working together to secure the BES. I gave some ideas of what I mean in this post, and you can be sure it will figure prominently in the book I’m writing. But I will be quite honest and say I see just about zero chance of NERC making radical changes in the CIP enforcement procedures anytime soon.

So is there a Plan B? In other words, is there a way that the SDT could develop non-prescriptive versions of current prescriptive requirements, that doesn’t also require changes to the enforcement process? I do have a Plan B, and it’s based on several observations.

First, non-prescriptive requirements are definitely the way to go. However, just being non-prescriptive isn’t enough.

Second, since CIP v5 was implemented, all subsequent requirements that have been developed (plus one requirement that was part of v5 itself) have been what I call “plan-based”. In these, the entity isn’t told to achieve an objective (which as I said above, is always unachievable and unmeasurable, when it comes to cybersecurity), such as “protect your assets against the threats attendant on use of Transient Electronic Devices and Removable Media”. Instead, they are told to develop and implement a plan to manage risks attendant on TEDs and RM. Completely protecting your assets against threats from TEDs and RM isn’t achievable or measurable, but developing and implementing a plan is definitely achievable. Does requiring a plan suffice to make a requirement auditable?

No it doesn’t. Just requiring a plan doesn’t make a plan-based requirement auditable in any meaningful sense. If the requirement just mandates a particular type of plan - say a supply chain cyber security risk management plan - an entity could put together a really minimal plan, but as long as the plan addressed supply chain cyber security risk management in some way, it would have to pass.[iii]

So how can a plan-based requirement be made auditable? The requirement (not just a “guidance” document) needs to provide a list of elements that must be included in the plan – and it’s best if these elements are threats that the plan should mitigate. CIP-010 R4 is my poster child for a good plan-based requirement. R4 reads “Each Responsible Entity…shall implement…one or more documented plan(s) for Transient Cyber Assets and Removable Media that include the sections in Attachment 1.”

Attachment 1 lists three types of devices that must be in scope for the plan, as well as between two and five “topics” (my term) that must be addressed for each of the three types. For example, under the Removable Media device type, there are two topics: Removable Media Authorization and Malicious Code Mitigation. Each topic lists 2-5 mitigations that must be included in the plan. For example, under Removable Media Authorization, the entity is required to authorize use of Removable Media (thumb drives) by both user and location. Under Malicious Code Mitigation, the user is required to scan the RM for malicious code before using it with a Medium or High impact BES Cyber System, as well as “Mitigate the threat of detected malicious code”.

You might ask, “How does CIP-010 R4 differ from a prescriptive requirement? It includes a number of steps you need to take; that sounds pretty prescriptive to me.” I’m glad you asked that question. Here’s how it differs:

The individual mitigations aren’t prescriptive. For example, the first mitigation under Malicious Code Mitigation is “3.2.1. Use method(s) to detect malicious code on Removable Media using a Cyber Asset other than a BES Cyber System or Protected Cyber Assets.” The methods aren’t specified, as they would be in a prescriptive requirement.
The “preambles” to the mitigations under most topics make clear that the objective in each topic is to mitigate a particular threat. For example, Section 3.2 reads “Malicious Code Mitigation: To achieve the objective of mitigating the threat of introducing malicious code to high impact or medium impact BES Cyber Systems and their associated Protected Cyber Assets, each Responsible Entity shall…”
Under most topics, it is made clear that the mitigations listed aren’t the only ones that can be used to address the threat in question. For example, the mitigations under Section 2.1 Software Vulnerabilities Mitigation, read “Review of installed security patch(es); Review of security patching process used by the party; Review of other vulnerability mitigation performed by the party; or Other method(s) to mitigate software vulnerabilities.”

To summarize, I definitely support the SDT’s idea to make certain prescriptive requirements non-prescriptive, in order to facilitate incorporating virtualization into the CIP requirements. However, this is how I would proceed with each requirement that is going to be rewritten (using CIP-007 R2 as an example):

a) I would state its objective, based on threat mitigation. For CIP 7 R2, I would state the objective as “On a risk-adjusted[iv] basis, mitigate the threat of software vulnerabilities”.

b) I would then probably refer the entity to an attachment (as CIP-010 R4 does), rather than try to include everything in the requirement itself. But the important thing is that the attachment has to be called out in the requirement; if that doesn’t happen, then the attachment is just another piece of guidance, with no special importance.

c) The attachment might list a set of “sub-threats” to be addressed. Each sub-threat would be in its own section, followed by one or more suggested mitigations (but always saying in the last bullet point “Or any equally effective mitigation strategy”). In this case, one of the sub-threats might be “The threat of vulnerabilities in commercial software.” Probably the only mitigation listed would be “A patch management program, including patch source identification, patch discovery, patch assessment, patch application or mitigation plan development, mitigation plan implementation and regular review, etc.” Another sub-threat would be “The threat of vulnerabilities found in software developed by the entity itself”; in this case, one mitigation would probably be identification and implementation of a secure SDLC.

Unfortunately, I also have a poster child for a flawed plan-based requirement: CIP-013 R1. I used to think that this was the perfect plan-based requirement, because of its ZEN-like simplicity:

R1. Each Responsible Entity shall develop one or more documented supply chain cyber security risk management plan(s) for high and medium impact BES Cyber Systems. The plan(s) shall include:

1.1. One or more process(es) used in planning for the procurement of BES Cyber Systems to identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services resulting from: (i) procuring and installing vendor equipment and software; and (ii) transitions from one vendor(s) to another vendor(s).[v]

Let me critique CIP-013 R1 in terms of the three criteria for a good plan-based CIP requirement, which I listed above:

a) R1 starts out with a statement of the objective of the requirement, development of a supply chain cyber security risk management plan. Notice this doesn’t use the word “threat”, but since a risk management plan should certainly talk about threat mitigation, this is OK.

b) R1 doesn’t refer to an attachment, but that’s probably because of c).

c) While R1.1 does list three threats[vi] that need to be addressed in the plan, these are quite general. Each of these three threats (and especially the first one) will of course have many sub-threats. If each of these threats listed 5-10 sub-threats that must be mitigated in the plan[vii], then CIP-013 R1 would receive “Tom’s Seal of Approval for an Auditable Plan-based Requirement”.

So since CIP-013 R1.1 doesn’t list specific sub-threats that need to be mitigated in the supply chain cyber security risk management plan, it doesn’t get my coveted Seal of Approval – which means it really isn’t auditable, as long as the entity produces a plan that could be credibly called a supply chain cyber security risk management plan. I won’t belabor this point now, since I’m going to have a new post on CIP-013 soon.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] I drafted a short history of NERC’s efforts to provide real guidance for ambiguities in CIP requirements in writing this post, but I’ve decided it doesn’t need to be in the current post, which is already very long. I’ll post it separately soon.

[ii] There is the Request for Interpretation process. In this process, an entity requests an interpretation; it is drafted and balloted (multiple times) by a NERC drafting team; then it is approved (or not, as happened with two CIP Interpretations in 2012) by FERC, and finally it’s implemented. But this takes almost as much time as the SAR route, and in any case an Interpretation can never modify the words of a requirement – just interpret the current wording. The auditing problems we’re talking about come to pass because the current wording of a requirement doesn’t give enough information to an auditor to audit it.

[iii] I’m being a little too absolute here. An auditor who was confronted with a minimal plan like this would still be justified in giving a PNC for this requirement, since they can always use “professional judgment”. If an entity has developed what is clearly an inadequate plan, the auditor can exercise that judgment to point out that it is grossly inadequate, and therefore doesn’t constitute a serious plan at all. However, if the entity has actually developed more than a minimal plan, but has still left out some topic that the auditor thinks should be in the plan, the auditor can’t give a PNC in this case, and if they do, it most likely won’t be upheld.

[iv] By “risk-adjusted”, I mean that the same controls don’t have to apply to all systems or devices in scope, regardless of the degree of BES risk each one poses. For example, a relay that operates the circuit breaker for a 345 kV line poses a higher risk to the BES than one which operates a 138 kV line. For CIP-007 R2, the entity might decide that a 30-day patching cycle is needed for the former, but a 60-day cycle is perfectly adequate for the latter (of course, when I talk about risk here, it has nothing to do with impact level. You could have BCS with three or five different levels of risk, all installed at a single Medium-impact substation, so the BCS would all be Mediums); this is, of course, not allowed in CIP-007 R2 now.

[v] Of course, R1.2 goes on to list six specific things that need to be included in the plan. But the plan doesn’t consist of these six things; the six things are there because they were ordered by FERC in Order 829. The heart of R1 is R1.1.

[vi] They’re called risks here, not threats. Since most people would do the same thing, I’m not going to vehemently object to this, but I’ve developed my own definitions of what threats, vulnerabilities and risks are and how they relate to each other. The book I’m writing now is carefully following these definitions. But since the CIP-013 SDT (and FERC in Order 829) is using a different idea of risk than mine, that’s fine. But I will keep pointing out that I don’t agree with that idea, since it leads to confusion when you start talking about the amount of risk that a threat poses (using the more popular idea of risk, you would have to start talking about the amount of a risk that a risk poses, which of course is very confusing).

[vii] Of course, each of the three threats will have a lot of serious “sub-threats”, perhaps hundreds. How will the SDT come up with a manageable list of around ten? One way to do this is to aggregate them into a higher level, and just list those higher-level sub-threats. Another is to just take the ten most serious sub-threats and list them – on the idea that the ten of them might account for most of the risk posed by the overall threat.