Tom Alrich's Blog: January 2019

Wednesday, January 30, 2019

A new record!

I think a lot of my readers will already know this, but if you don’t – NERC just announced the largest-ever CIP fine, which adds another decimal place to the previous largest fine: $10 million even (in fact, I imagine this figure, being the smallest possible eight-digit amount, was deliberately chosen for its ability to strike terror into the hearts of utility compliance folks nationwide). It’s all outlined in a voluminous four-part Notice of Penalty totaling over 700 pages. I’ve only seen the first part, available here, and that alone is 250 pages! Naturally, I’ve only skimmed through it, and I’m not sure when I’ll read the whole part 1, let alone all four parts.

Of course, the name of the entity (or really entities. In fact, the organization is always referred to as “The Companies”) isn’t provided. Beyond that, NERC has redacted all information that might refer to a particular NERC Region (although it’s clear there were at least two or three Regions involved); NERC clearly believes it would constitute a big threat to the BES to provide any information that might lead to identification of the entity.

However, I’m much more interested in what the violations were, and what overall lessons can be learned by other utilities. There are 127 violations, covering all currently-enforced CIP standards including CIP-014. The details of those violations are up to you to read, but I call your attention to pages 10-13, which discuss a) Facts common to the violations (i.e. common causes); b) Risks common to the violations; and c) Mitigations common to the violations.

Since the PDF is high security, I can’t copy any text to paste it here, but I’ll summarize. First, the common causes they point to are:

Lack of management engagement and support for the CIP program;
Program deficiencies, including deficient documents, training, and implementation;
Lack of communication between management levels in the company; and
Lack of communication between business units on who is responsible for which tasks.

The entity committed to:

Increasing senior leadership and oversight;
Centralized CIP oversight department;
Conducting industry surveys and benchmarking regarding best compliance practices (I admit I have a hard time understanding this one. I have never yet seen any sort of comprehensive industry survey of compliance practices – mainly because for a utility to provide that information, it will almost always require providing BES Cyber System Information at the same time);
Continuing to develop an in-house CIP program and talent development program;
Investing in enterprise-wide tools (configuration management, etc.);
Adding security and compliance resources;
Instituting annual compliance drills (that’s an interesting idea; I hadn’t heard of that before); and
Creating three levels of security and compliance training.

These are the common mitigation actions the entity committed to:

Revising their corporate IT compliance program so that it meets the requirements of all stakeholders;
Requiring each business units to revise their procedures and controls so that they follow the corporate IT program;
Each business unit will document and track its controls for CIP compliance; and
Documenting how each non-compliance listed in the settlement agreement was mitigated, and how this will prevent recurrence of the violation (of course, that document will be about three times the length of the NOP. There’ll be a whole lotta writin’ going on!).

I must say that I have yet to hear of any utility that couldn’t also benefit from at least a few of these same practices. Go thou and do likewise.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013; we also work with security product or service vendors that need help articulating their message to the power industry. To discuss this, you can email me at the same address.

Tuesday, January 29, 2019

We need an investigation!

This is a post I’ve been intending to write ever since I wrote this post a few weeks ago, about the Wall Street Journal’s most recent article on the Russian cyber attacks on the US power grid. I thought I would take my time (and I don’t have a lot of free time lately, due to my day job) to write it, since there were still questions in my mind about the position I wanted to take. I wanted to make sure I provided enough supporting evidence for my position.

However, there was a development today that provided all the supporting evidence I could possibly need. Specifically, this was a report in the New York Times about the testimony before the Senate Intelligence Committee (and don’t tell me that name is an oxymoron!) by Gina Haspel, the CIA director, Christopher Wray, FBI director, and Dan Coats, the director of national intelligence. They were discussing the 2019 “Worldwide Threat Assessment”, which was released today. Of course, the testimony covered a lot of different topics, but what struck me were these two paragraphs from the Times article:

The assessment also argues that while Russia’s ability to conduct cyberespionage and influence campaigns is similar to the one it ran in the 2016 American presidential election, the bigger concern is that “Moscow is now staging cyberattack assets to allow it to disrupt or damage U.S. civilian and military infrastructure during a crisis.”

It specifically noted the Russian planting of malware in the United States electricity grid. Russia already has the ability to bring the grid down “for at least a few hours,” the assessment concluded, but is “mapping our critical infrastructure with the long-term goal of being able to cause substantial damage.”

So why is this so important? You’ve heard it before, right? Specifically, you may have noted, in the above-linked post on the recent WSJ article, that I quoted this paragraph from that article:

In briefings to utilities last summer, Jonathan Homer, industrial-control systems cybersecurity chief for Homeland Security, said the Russians had penetrated the control-system area of utilities through poorly protected jump boxes. The attackers had “legitimate access, the same as a technician,” he said in one briefing, and were positioned to take actions that could have temporarily knocked out power.

The quote from Jonathan Homer first appeared in the July WSJ article by Rebecca Smith, one of the two reporters who wrote the recent article. Of course, the July article set off a firestorm of amplifications by many other news outlets, and a chain of events that I wrote about in ten posts last summer, starting with this one.

Here is as brief a summary of previous events as I can make, while still providing the important facts:

DHS (specifically the NCCIC, which incorporates what was the ICS-CERT. And if you think this is TMA – too many acronyms – I couldn’t agree with you more!) announced a series of four briefings to update on the Russian cyber attacks against the US electric power industry, which they had first announced last March. Even though the March report said only generation was the target, and the Russian’s hadn’t penetrated any control systems at the plants[i], the first briefing on July 23 painted a very different picture, which was vividly described in the first WSJ article. It seemed very clear from what was said (as quoted in the article – I didn’t attend that first briefing), that the Russians had penetrated control centers (definitely plural) of US utilities, where they had most likely planted malware; and that malware might well be used at some point to cause a major grid disturbance.
I was skeptical that actual control centers of power transmission or distribution utilities had been penetrated, and I said in my post the day after the WSJ article appeared (linked two paragraphs above) that what the presenters must have meant was that control rooms of generating plants were penetrated. This can’t produce a major grid outage, but having a bunch of plants go down at one time would certainly be annoying; given the alarmist tone of the first briefing, I assumed there must have been a number of substantial plants penetrated (at the control system level, of course) – I guessed up to 25. But my biggest reason for skepticism about the WSJ article was that, if it were really true that a bunch of utility control centers were penetrated, there would have been alarm bells ringing at the highest level of government, and utilities would pretty much have been told to drop everything and look for malware on their control systems, as well as take further steps to beef up their already-strong defenses. Given that that those bells never rang, I found it very hard to believe the statements quoted in the article. I assumed the statements in that first briefing were the product of a few DHS people getting overly excited, and thinking that exaggerating the seriousness of the situation would make utilities pay a lot more attention to cyber security (and it would be hard to see how they could pay much more attention than they already are!).
However, the day after that post – July 26 – it was reported that a DHS spokesperson announced that, not only were no utility control centers penetrated, but the only control systems penetrated were those in a small generating plant that couldn’t have any significant grid impact. This I found very surprising, to say the least. Yea, greatly was I wroth, and I rent my garments in frustration. But I continued to attribute the tone of the July 23 briefing to over-zealousness on the part of the NCCIC staff members who led it.
I continued in that belief even though a friend pointed out to me the next day that the slides from the July 23 briefing directly contradicted the later statement that only one small plant was penetrated. And I continued to continue in that belief when Rebecca Smith wrote a new article that seemed to still follow the narrative from the first briefing, and didn’t mention the DHS walkback at all. I expressed amazement that she wouldn’t have changed the tone of her articles, and attributed this to her being either naïve or having lived in an inaccessible cave for the past few days (I now greatly regret the tone of my remarks about Rebecca, and want to apologize to her. It seems I may have been the one living in a cave, not her. Continue reading, to see what I mean).
Not being satisfied with just putting out three different stories of what the Russians had achieved, DHS put out another story – which contradicted the other three – at a July 31 briefing for top utility executives in New York, which the Secretaries of DHS and DoE both participated in. This time, the story was that only two wind turbines had been penetrated. I later castigated DHS for being so confused in their stories, and in particular for not stepping forward to point out what seemed to be the errors in the WSJ story, and the flurry of news reports based on it. But I continued to believe there was no way the original DHS briefing could be true.
And I’m proud to report that I witnessed firsthand the promulgation of yet another DHS story, trying to walk back the original briefing story. This one came at the Software and Supply Chain Assurance Forum in McLean, VA in late September. There, a fairly low-level NCCIC employee – although the head of NCCIC had already addressed the same meeting, and may have been still in the room – stated that the confusion was that, in the first briefing, the speakers didn’t understand the difference between vendors and utilities. Therefore, when they were saying that utilities were penetrated, they really meant vendors. Since there’s no dispute that vendors were penetrated (and the latest WSJ article describes how in vivid detail), the speaker implied (although he didn’t state it) that this is why the original briefing was so different from the true story – which would presumably be one of the three DHS walkbacks already described. I found this statement amazing, especially because the speaker was able to keep a straight face when he said it. I couldn’t have done that.
What was even weirder was that, despite DHS' frenzied efforts to walk back the dire narrative in the first briefing, in the second briefing - two days after the first one - I heard what seemed to be pretty much the same story as in the first briefing (which I didn't attend). And the following week, when the third and fourth briefings were given (DHS had known up front they would be very well attended, so they scheduled four, all covering the same material), they didn't differ from the first one either. Yet this was all after a different DHS spokesperson had directly contradicted what was said in the first briefing.

So now we’re back at the recent WSJ article, from which I also quoted this paragraph:

Vikram Thakur, technical director of security response for Symantec Corp., a California-based cybersecurity firm, says his company knows firsthand that at least 60 utilities were targeted, including some outside the U.S., and about two dozen were breached. He says hackers penetrated far enough to reach the industrial-control systems at eight or more utilities. He declined to name them.

This completely turns things around, in my opinion. After all, “eight or more utilities” isn’t two wind farms or one small CT plant, period. So either Mr. Thakur isn’t telling the truth (and he worked with DHS in investigating the Russian attacks), or both he and the speakers at the original DHS briefing (especially Jonathan Homer) are the ones telling the truth. If so, this means that the four later attempts by DHS to walk back this story are themselves based on “alternative facts”.

However, as I mentioned above, I was still hesitant to write something about this until I was sure I had all the facts straight about who said what when - that is, until I read the NY Times article a couple of hours ago. Now it seems the national intelligence community is firmly on the side of Mr. Thakur and Jonathan Homer. Even then, I find it very hard to conclude that they’re right, simply because there hasn’t been any huge hue and cry over this penetration of our grid. I think that would truly constitute a national emergency (in contrast to the “national emergency” currently being discussed). You remember all the frenzy that (rightly) surrounded the announcement of the first Ukraine attack in 2015? This would be literally ten times as great, and it should be.

So I think there need to be two investigations. The subject of the first, and by far the more urgent one, is whether it’s really true that malware has been implanted in utility control centers by the Russians. Of course, if that’s the case, there needs to be a major effort to remove it, and to hold Russia accountable (in fact, the relatively weak response so far to the undisputed fact that they have been trying so hard to penetrate the US grid – whether or not they’ve succeeded – is something I also don’t understand. Or maybe I do understand it, which is even scarier). And there’s probably a lot more that needs to be done, including perhaps with the CIP standards.

The second investigation isn’t as urgent, but in my mind it’s even more serious: How did it happen that DHS was quickly falling all over itself to walk back what was said in the first briefing last July, if in fact that briefing was largely correct – and the Russians had penetrated utility control centers? That is something for the Department of Justice, since it’s definitely a criminal investigation - one involving national security. But it's only needed if in fact the first investigation finds that there was indeed penetration of utility control networks.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] Although I just noticed a quote where it seems someone from DHS did imply in March that utility control centers were penetrated and malware had probably been implanted. I must have missed that part, as I assume the rest of the industry did as well - since I don't remember any big hue and cry then, either.

Thursday, January 24, 2019

What is the purpose of CIP-013?

Lew Folkerth of RF published an article about CIP-013 in December in the RF newsletter, which I wrote about in this post and in this one. In that article, Lew said that the supply chain cyber security risk management plan required by CIP-013 R1.1 needs to demonstrate that it achieves the objective(s) of the standard. And what are they? In his article, Lew repeated the four objectives that FERC had outlined, both in their Order 829 of June 2016 that required NERC to develop a supply chain security standard and in Order 850 of last October, which approved CIP-013. These objectives are

1. Software integrity and authenticity;

2. Vendor remote access protections;

3. Information system planning; and

4. Vendor risk management and procurement controls.

However, being very bright (and to prove that’s true, my mother always said I was bright!) and an astute reader, I pointed out that there’s an even simpler statement of CIP-013’s purpose, in Section 3 near the beginning of the standard: “To mitigate cyber security risks to the reliable operation of the Bulk Electric System (BES) by implementing security controls for supply chain risk management of BES Cyber Systems.” I pointed out that all of FERC’s four items are included in this statement, so I thought this should really be the objective that entities must achieve in their plan(s).

But, after having done some pretty intensive reading of various documents having to do with CIP-013 and supply chain security, I came to realize that FERC’s statement is pretty good after all, and has the advantage of at least providing some substance to the meaning of the words “cyber security risks” in the Purpose statement. In other words, the Purpose statement is pretty broad, and doesn’t provide a lot of guidance to the entity in developing the plan, or to the auditor in auditing it. With FERC’s four things, the auditor has at least something to go on in the audit, while at the same time the entity has a (very) broad outline of what its plan needs to address. So I am now fine with Lew’s statement that FERC’s four objectives constitute the purpose of CIP-013.

Of course, these four things are far from being a roadmap to compliance with CIP-013! Lew’s article does give some clues to that roadmap as well, which I elaborated on in the two posts already linked. I’ll continue to elaborate on the roadmap in the next post in that series. But I do want to point out now that these four items don’t have equal standing, in my opinion. The last two constitute the two broad areas of risk that must be addressed in the supply chain risk management plan, while the first two are simply two of the individual risks that are included under the third objective. So FERC’s four objectives could be summarized by just listing the last two.

This all means that your CIP-013 R1.1 supply chain cyber security risk management plan must address risks of “information system[i] planning” and “vendor risk management and procurement controls”. And you need to show the auditors that your plan addresses both types of risk.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] It’s unfortunate that FERC used the term “information system”, when they should really have said “control system” (although I initially thought there might be some significance to the fact that they did, as I discussed in this post after FERC issued Order 829 in 2016). Of course, NERC CIP doesn’t deal at all with information systems, whose purpose is to store and process information. The power grid, and other critical infrastructures, is controlled by control systems. These are what CIP protects.

Thursday, January 17, 2019

Lew on CIP 13, part 2 of 3

This post is the second of a set of posts on an excellent article by Lew Folkerth of RF (the NERC Region formerly known as ReliabilityFirst) on CIP-013, the new NERC supply chain security standard; the first post is here. That post dealt primarily with how Lew characterizes the standard and started to discuss what he says about how to comply; this one continues the discussion about how to comply with the standard. The third post will continue the compliance discussion and also discuss how CIP-013 will be audited.

I also want to point out that what I am saying in this series of posts goes beyond what Lew said in his article, for two reasons:

Lew doesn’t have much space for his articles, as I do for my posts. So where he has to use ten words, I can write five paragraphs. And I have no problem with doing that, as any long-time reader will attest.
While I firmly believe everything I say in this series of posts is directly implied by what he said in his article, it’s natural that I would be able to discuss these topics in more detail, because I’ve had to figure out a lot of the details already – since I’m currently working with clients on preparing for CIP-013 compliance. Of course, what I write in these posts is by necessity very high level; there are details and there are DETAILS. These posts provide the former (see the italicized wording at the end of this post to find out how to learn about the latter).

What risks are in scope?

As I have pointed out in several posts in the past, and also pointed out in part 1 of the first post in this series (in Section A under the heading “Lew’s CIP-013 compliance methodology, unpacked for the first time!”), CIP-013 R1.1 requires the entity to assess all supply chain risks to BES Cyber Systems, but it doesn’t give you any sort of list (even a high-level one) of those risks. So R1.1 assumes that each entity will be quite well-read in the literature on supply chain security risks and will always be diligently searching for new risks; then they’ll put together a list of all of these risks and assess each one for inclusion (or not) in their plan.

Note: If you're one of the two people that read the previous post closely, you'll remember I pointed out that I have a problem with how CIP-013 uses the word "risk" in two senses. One is in the sense of what I call a threat: a situation that can potentially cause serious harm. The situation itself doesn't have a magnitude, so you can't talk about a big threat or a small threat. But that's where the other sense of "risk" comes in, since you can estimate the magnitude of the risk posed by a particular threat. Because I don't want to confuse people too much, I have mostly used risk in both senses in this post, although I have a few times talked of threats when I felt it was important to do that. To quote Ralph Waldo Emerson, "Consistency is the hobgoblin of little minds." I don't ever want to be accused of having a little mind!

This might be a good idea if every NERC entity with Medium or High impact BCS had security staff members who could devote a good part of every day to learning about supply chain security risks, so that they could always produce a list of the most important risks whenever required. While this might be true for some of the larger organizations, I know it’s not true for smaller ones. What are those people to do?

I’ve repeatedly expressed the hope that an industry organization like NATF or the NERC CIPC would put together this list of supply chain risks, although I’ve seen no sign of that happening yet. Another idea would be if the trade associations, including APPA, EEI, NRECA and EPSA, each put together a comprehensive list for their own members. While APPA and NRECA developed a good general discussion of supply chain security for the members of both organizations, it doesn’t contain such a list; I hope they will decide to do that in the future as well.

In the meantime, NERC entities subject to CIP-013 need to figure out on their own what their significant supply chain security risks are. Where can you go for ideas? Well, there are lots of documents and lots of ideas – and that’s the problem; there are far too many. There’s NIST 800-161 and parts of NIST 800-53, for starters. There’s the NERC/EPRI “Supply Chain Risk Assessment” document, which was issued in preliminary form in September and will be finalized in February; there’s the excellent (although too short!) document that Utilities Technology Council (UTC) put out in 2015 called “Cyber Supply Chain Risk Management for Utilities”; and there’s the APPA/NRECA paper I just mentioned. There are others as well. None of these, except for 800-161, can be considered a comprehensive list, though. And 800-161 is comprehensive to a fault; if any utility were to seriously try to address every risk found in that document, they would probably have to stop distributing electric power and assign the entire staff to implementing 800-161 compliance!

One drawback of all of these documents, from a CIP-013 compliance perspective, is that they don’t identify risks directly. Instead, they all describe various mitigations you can use to address those risks. This means that you need to reword these mitigations to articulate the risks behind them. To take the UTC document as an example, one of the mitigations listed is “Establish how you will want to monitor supplier adherence to requirements”. In other words, while it’s all well and good to require vendors (through contract language or other forms of commitment like a letter) to take certain steps, you need to have in place a program to regularly monitor that they’re taking those steps.

We need to ask “What is the risk for which this is a mitigation?” The answer would be something like “The risk that a vendor will not adhere to its commitments regarding cyber security”. This is one of the risks you may want to add to your list of risks that need to be considered in your CIP-013 supply chain cyber security risk management plan. You can get a lot more by going through the documents I just listed.

So – in the absence of a list being included in Requirement CIP-013 R1.1 itself, and in the absence of any comprehensive, industry-tailored list put out by an industry group - this is one way to list the risks you need to assess in your CIP-013 supply chain cyber security risk management plan. The main point of this effort is that you need to develop a list that comes as close to covering (at least at a high level) all of the main areas of supply chain cyber risk as possible.

But I know there’s a question hidden in every NERC CIP compliance person’s heart when I bring this point up: If I develop a comprehensive list of risks, am I going to be required by the auditor to address every one of them? In other words, if my list includes Risk X, but I decide this risk isn’t as important as the others, so I won’t invest scarce funds in mitigating it, am I going to receive an NPV for not mitigating it?

And here’s where Uncle Lew comes to the rescue. He points out “You are not expected to address all areas of supply chain cyber security. You have the freedom, and the responsibility, to address those areas that pose the greatest risk to your organization and to your high and medium impact BES Cyber Systems.” There are two ways you can do this.

The first way is that you don’t even list risks in the first place that you believe are very small in your environment. For example, the risk that a shipment of BCS hardware will be intercepted and then compromised during a hurricane emergency is very low for a utility in Wyoming, while it might be at least worth considering for a utility in South Carolina. The former utility would be justified in leaving it off its list altogether, and doesn’t need to document why it did that. Any risk that has almost zero probability doesn’t need to be considered at all – there are certainly a lot more that have much greater than zero probability!

The second way in which you can – quite legally – prune your list of risks to a manageable level is through the risk assessment process itself. R1.1 requires that you “assess” each risk. What does that mean? It means that you assign it a risk level. In my book, this means you first determine a) the likelihood that the risk will be realized, and b) its impact if it is realized. Then you combine those two measures into what I call a risk score.

Once you’ve assessed all your risks, you rank them by risk score. And guess what? You now need to mitigate the highest risks on the list. You can also mitigate some risks below these (perhaps mitigate them to a lesser degree), but in any case there will be some level on your risk list below which you won’t even bother to mitigate the risks at all. And you don't have to justify this by saying anything more than "We decided that the threats whose risk scores are lower than this level will not be mitigated, due to our not having enough funds to mitigate them."

Will you get into trouble for not mitigating the risks at the bottom? No. As Lew said, you need to “address those areas that pose the greatest risk to your organization and to your high and medium impact BES Cyber Systems.” The direct implication of these words is that you don’t need to address the risk areas that pose the least risk.

Why are you justified in not mitigating all of the risks listed in your supply chain cyber security risk management plan? Because no organization on this planet (or any other planet I know of) has an unlimited budget for cyber security. Everyone has limited funds, and the important thing is that you need to allocate them using a process that will mitigate the most risk possible. That process is the one I just described (at a very high level, of course).

You may notice that this is very different from the process to mitigate risk that is implicit in all of the other NERC standards, as well as the majority of requirements for the CIP standards. That process – a prescriptive one – tells you exactly what needs to be done to mitigate a particular risk, period. You either do that or you get your head cut off.

For example, in CIP-007 R2, you need to, every 35 days, contact the vendor (or other patch source) of every piece of software or firmware installed on every Cyber Asset within your Electronic Security Perimeter(s), to determine a) whether there is a new patch available for that software, and b) whether it is applicable to your systems. Then, 35 days later, you need to either install the patch or develop a mitigation plan for the vulnerability(ies) addressed by the patch. It doesn’t matter if a particular system isn’t routably connected to any others, or if the vendor of a particular software package has never issued a security patch in 20 years; you still need to check with the vendor every 35 days. You can’t have two schedules, say every 15 days for the most critical systems and those routably connected to them, and quarterly for all other systems. Needless to say, if CIP-007 R2 were a risk-based requirement like CIP-013 R1.1 (or CIP-010 R4 or CIP-003 R2, for that matter), you would have lots of options for mitigation, not just one.

As an aside, I do want to point out here that in CIP you never have complete freedom to choose how you will mitigate a particular risk, even when the requirement permits consideration of risk, for two reasons:

1. The mitigation always has to be effective, as Lew pointed out a couple years ago; and

2. If you’re using a mitigation different from the one normally used – e.g. you’re not using patch management to mitigate the threat of unpatched software vulnerabilities, or you’re not using antivirus or application whitelisting software to mitigate the threat of malware – you can rightfully be asked to justify why you took an alternative approach.

A final question you might ask about identifying risks for R1.1 is “Where do I draw the line? You said that I can draw a line through the ranked set of risks, so that all risks below that line don’t need to be mitigated at all. Where do I do that, and how do I justify this to the auditor?"

Of course, if you organization has allocated $X to supply chain security, and you have determined that this amount will cover mitigation of say the top ten supply chain threats on your list, you should point this out as justification for not mitigating threats 11 and below. But what if your utility is particularly short on cash this year - say there has been a natural disaster, for which it's very possible you won't get rate relief - and you only have funds available to mitigate say the top three threats on the list? And further suppose that threats 4-6 on the list pose fairly high risk, and you would definitely want to mitigate these if you could? Could the auditor give you a Notice of Potential Violation for this? And will your mitigation plan for this violation require that you to go back and get more funds to address these risks, by threatening your management with multi-million dollar fines if they don't cough up the funds?

It’s interesting that you bring this up, since I have considered this question a good deal myself. I think the answer is that it all gets down to reasonableness. If you can demonstrate to the auditor that your organization really can’t afford to spend more on supply chain cyber security risk mitigation, they will hopefully agree this is a reasonable request.

I realize that stating that the auditors will "hopefully agree this is a reasonable request" might cause some of you to laugh cynically, and think that the CIP-013 approach may not be such a great thing after all. But think about what would happen if we were talking about CIP-007 R2 instead. Suppose you were suffering from the same financial constraints, and you decided that you'd have to cut back on the resources devoted to patch management, meaning you'd have to check for new patches once a quarter, not every 35 days. Do you think you'd get a pass on that, even if the auditor personally considered this to be a reasonable request?

I really doubt it. Reasonableness isn’t something an auditor is allowed to consider when auditing a prescriptive requirement (unless we’re talking about a reasonable interpretation of a particular term in the requirement, or something technical like that); on the other hand, it’s inherent in the idea of a risk-based requirement.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

My offer to NERC entities of a free webinar workshop on CIP-013, described in this post, is still open! Let me know (at the email address below) if you would like to discuss that, so we can arrange a time to talk.

Monday, January 14, 2019

The Russian attacks: A new WSJ article puts them in a whole new light

This week, I intended to write the second part of my last post on Lew Folkerth’s great article on CIP-013. However, I believe this topic has more urgency. I will write a second post on this topic, then get back to Lew's article (I hope) next week.

Last Friday morning, I opened my subscription copy of the Wall Street Journal to see a front-page article entitled “Russian Hack Exposes Weakness in U.S. Power Grid”.[i] Then I read the article, very carefully. What was my first reaction? It was “Well, there goes my weekend.” I realized that this article is very important, for two reasons. First, it points the way to an important cyber attack vector that the industry, and especially the NERC CIP standards, hasn’t paid too much attention to. And yet it turns out that this was the primary vector the Russians are using, not the one I thought they were, based on the first WSJ article and DHS briefings last July. That is the subject of this post and the one to follow.

The second reason why this article is important is that it makes me (and I’m sure it will others as well) far less certain that the DHS briefings in July constituted a gross exaggeration of the success that the Russians had. Those briefings implied that the Russians had penetrated a number of utility control centers, where they would have had the opportunity to plant malware that they might call into action at a later date. I expressed great skepticism about this conclusion, and two days later DHS put out a completely different story, in which they said that only one “insignificant” generating plant (presumably gas-fired, going by a diagram that was shown) had actually been penetrated (i.e. at the control system level). Yet this was followed up a week later by a different story: that in fact just two wind turbines had been impacted, not a whole plant.

In a post in early September (which was preceded by others, and followed by one more), after describing the timeline that produced these three mutually contradictory explanations from DHS, I stated that I continued to believe that statements made at the initial briefings were wildly exaggerated – if not actually factually wrong, since the wording seemed to be very carefully chosen. I also emphasized that I really wished DHS would come out with a straight story on what really happened. However, last Friday’s article makes me question that conclusion, so that I now think it’s possible that the initial briefings were correct, and the Russians did penetrate a number of utility control centers. My third (and probably fourth) posts will discuss how Friday’s WSJ article caused me to rethink my conclusion, and will go on to address some of the huge implications, if it’s actually true that utility control centers were penetrated. These implications aren’t so much cyber implications as political ones.

Before I get on with the discussion of the cyber implications of Friday’s story, I want to point out that this is a great reporting job, by Rebecca Smith and Rob Barry. Ms. Smith is a veteran WSJ writer on the electric power industry and cyber security, and is the author of the article last July that caused a firestorm in the US and elsewhere, with its implications that the Russians had used the supply chain to penetrate a number of U.S. utilities and plant malware in their control centers. The big difference between Friday’s article and the one in July is that the latter was primarily based on the first DHS briefing. Ms. Smith published it the day after the briefing, and there was certainly no time to follow up with other industry sources, try to verify some of the statements made by DHS, etc.

By contrast, Friday’s article is based on a lot of really dogged reporting (which has probably been going on since soon after the briefings), tracing in great detail, with lots of quotations from victims, how the Russian attacks actually proceeded through a number of small vendors to actual utilities (the article names five utilities that were attacked). In the article, Ms. Smith provides evidence that convinces me that my original scenario for how the attacks unfolded is incorrect.

The July briefings and WSJ article didn’t directly provide a scenario for the attacks, but I made a few assumptions in developing my own implicit scenario. I never wrote it down, but it was behind all of the articles I wrote on the Russian attacks last year. This scenario was:

The attackers were aiming for the Big Prize of cyberattacks on the US power grid: causing a cascading outage in the Bulk Electric System (this is obviously the way to cause the greatest total damage to the US economy). This means they would necessarily attack only transmission-level assets (i.e. BES assets), not distribution-only ones. You can’t cause a cascading outage by just attacking the latter.
Because of this, the best way to proceed is to try to obtain direct access to the control systems that control or power the transmission grid – i.e. control systems located at control centers, generating plants over 75 megawatts (including larger wind farms), and substations connected to the grid at greater than 100 kilovolts. In NERC CIP terms, these are High, Medium and Low-impact BES Cyber Systems, located at High-Medium and Low-impact assets (control centers, substations and generating plants).
Getting access to these systems is a formidable challenge. High- and Medium-impact assets (i.e. the more important control centers and substations, along with a small number of large or otherwise strategic generating plants) are almost all protected by two strong defenses (both required by NERC CIP).
The first of these defenses is well-managed firewalls, which make it very hard to make a direct frontal attack on the network in the asset. Largely due to NERC CIP compliance, these firewalls will have very few, if any, open and unprotected ports that a hacker could exploit.
The second defense at these assets is a well-protected system for Interactive Remote Access (IRA), including an Intermediate Server and two-factor authentication. This means that an attacker attempting remote access out of the blue will probably never get through the IRA system, unless they have found a way to break two-factor authentication – and I know of no verified cases to date in which an attacker has done that.
Low impact assets don’t necessarily have these two strong protections (some do), so they are easier to penetrate. On the other hand, they’re classified as Low impact because if compromised their loss will cause a much less severe impact on the grid than the loss of a Medium or High-impact asset. So the poor Russians won’t even come close to causing a cascading outage if they bring down a single Low-impact asset (they could perhaps do it if they attacked a lot of Low-impact assets simultaneously, but that is hard to do).
This means that no Transmission-level assets (BES assets) would be fruitful targets for Russian hackers. I assumed the attackers had tried to compromise these assets, not knowing how hard it would be to accomplish this goal. And I was for the same reason very skeptical of the initial DHS briefings and the WSJ article last July, which strongly implied (if they didn’t state it outright) that some Transmission-level assets (probably utility control centers) had been penetrated.
When DHS came out with their new story (and a week later, a second story) that said only a very small generating plant had been compromised (far below the 75 MW threshold for being a part of the Bulk Electric System), I took this as confirmation that I was right, and the Russians had essentially wasted a lot of time and money trying to break into something that was pretty much impenetrable.

However, the Friday WSJ article implicitly describes a very different scenario for the attacks:

The biggest difference between the new scenario and the one I was assuming is that the attackers weren’t obsessed with a cascading BES outage as their be-all and end-all. They were looking to cause whatever damage they could (or more specifically to position themselves to do so in the future if called upon), and they were fine with attacking the distribution system. In particular, they were looking at cutting off power distribution to military installations, which of course is a very understandable strategic purpose (and I assume the US is doing the same sort of reconnaissance and probing in the Russian grid).
This means that the attackers weren’t going to be stymied by the fact that they couldn’t penetrate any Medium- or High-impact assets. A single military base could in most cases easily be attacked by disrupting a single Low-impact generating plant or substation, or even a distribution-level plant or substation. Because of this, the Russians’ universe of possible targets was much larger than I was assuming last summer – so I was wrong last week in pointing out to the large spike in Russian readers of my post (among whom I assumed were at least some of the people involved in attacking the US grid) that their attacks so far had been a “dismal failure”. Instead, they might well believe them to be at least moderately successful, and Friday’s WSJ article provides some documentation for why they would be justified in this belief (of course, I’m not trying to lift the spirits of the Russian attackers by saying that! In any case, my spike of Russian readers quickly dissipated after that story, and now Russia is number four in my readership list, after the US (once again firmly in first place), Canada and the Ukraine (where I seem to have a steady readership, unlike the fickle Russians).
Another big difference between my original scenario and the one from Friday’s article is that I was assuming that the Russians would want to attack US power entities through vendors of control systems, by compromising the remote-access channels they already had set up with their customers. But the vendors discussed in the Friday article are quite different. They are all fairly small firms, including two excavating companies, an office-renovation firm, individual engineers (attacked through a watering-hole attack on a publisher of magazines read by power engineers), and others. So I was entirely wrong in my idea of the vendor entities that served as the intermediaries for the Russian attacks.
There’s no way that an attack on any of these vendor targets could ever get the Russians into the utility assets they needed to compromise in order to cause a cascading BES outage. But what could it do? It could get them into the IT networks of utilities. After all, every vendor interacts probably every day with utility staff using workstations attached to the IT network.
And the Russians didn’t have to compromise a remote access system to get to these workstations. All they had to do was to follow the same path used in the Ukraine attacks, as well as just about every other successful cyberattack worldwide in recent years: use phishing emails (or watering-hole attacks) to load malware onto workstations on the IT network. And once they were on one or a few workstations, it was much easier to compromise almost any other workstation on the IT network, since most IT network assets are much better protected from external attacks than they are from internal ones. The WSJ article provides great detail on how some of these phishing attacks proceeded.

Of course, the goal of the attacks wasn’t to compromise the IT network, but somehow to reach the control systems (i.e. the “OT” network, meaning operational technology), where they could drop malware that will allow them to come back later to turn that into actual destruction. And here we need to ask “Did the attackers reach any control systems?” The article answers this question in the affirmative – and the systems weren’t in just two wind turbines or one small natural gas-fired power plant, as DHS stated this summer. Here are four paragraphs from the last part of the article:

Federal officials say the attackers looked for ways to bridge the divide between the utilities’ corporate networks, which are connected to the internet, and their critical-control networks, which are walled off from the web for security purposes.

The bridges sometimes come in the form of “jump boxes,” computers that give technicians a way to move between the two systems. If not well defended, these junctions could allow operatives to tunnel under the moat and pop up inside the castle walls.

……..

To make a long story short, it seems the Russian attackers had a much broader goal than just causing a cascading BES outage, which made it perfectly acceptable for them to attack Low impact Transmission-level assets, as well as distribution-level assets not part of the Bulk Electric System at all – since both of these types of assets are much less well-defended than BES assets. Because of this broader goal, they weren’t confined to attacking utilities by commandeering vendor access to their remote access systems; they were perfectly happy using the tried-and-true phishing route to get into the IT networks of utilities. And from there, they were able to penetrate the control system networks of at least eight utilities, where they might have been able to deposit malware.

My second post in this series will discuss the implications of this finding for cyber regulation of the electric power industry, including the NERC CIP standards.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] The WSJ web site is behind a paywall, so you can’t read the article there. I requested that the site provide a free link to this article, since I think it is of very high importance to the North American power industry. In the meantime, I found this online reproduction of the article.

I think all of you should seriously consider subscribing to the Journal, either in print or online. It has the best coverage of cyber issues of any major American newspaper. It also has the best coverage of economic issues, which I’m also very interested in. I don’t agree with the majority of the editorials or op-eds, but even then they’re all very well-written and informed, so you can’t just dismiss them unread like you can in some other publications.

Wednesday, January 9, 2019

Lew Folkerth on CIP-013: Part 1

(Note on Feb. 16, 2019: I have substantially rewritten part of this post, which I no longer found to be a good description of the role of risk in prescriptive vs. risk-based requirements. In the process, I've made this post - already my longest, I believe - even longer! What can I say? As Lew's January article points out - which I'll discuss in a few weeks, but which you can read now by downloading RF's January newsletter and going to page 10 - risk management is the future of all the CIP standards, not just CIP-013. I believe, and Lew seems to as well, that it is inevitable that all of the CIP standards will be risk-based in the not-too-distant future, like maybe three years. Every CIP compliance professional is going to have to become familiar with risk management, not just the ones working on CIP-013 now)

Two weeks ago, when I downloaded the most recent RF newsletter and went to Lew Folkerth’s column (called The Lighthouse), my heart started to beat faster when I saw that his topic this time is supply chain security – and really CIP-013. I’ve been quite disappointed with literally everything else I’ve read or heard so far from NERC about CIP-013, since none of it has directly addressed the fundamental difference between that standard and literally all other NERC standards (including the CIP ones): namely that CIP-013 is a standard for risk management first and supply chain security second (more specifically, the risks being managed by the standard are supply chain security risks). I was very pleased to see that Lew not only gets the point about CIP-013, but has a deep understanding that allows him to communicate what he knows about the standard to the rest of the NERC community.

I was quite heartened when I read Lew’s first sentence about the standard: “CIP-013-1 is the first CIP Standard that requires you to manage risk.” Yes! And it got better after that, since he not only described the standard very well, but he laid out a good (although too concise) roadmap for complying with it, and made some very good points about how it will be audited (Lew was a longtime CIP auditor until he moved into an Entity Development role at RF, where he still focuses on CIP). On the other hand, I do disagree with one thing he says, which I’ll discuss below. I’m dividing this post into three parts. This first part discusses what Lew says about CIP-013 itself. The second will discuss what Lew says (and what I say) about how to comply with CIP-013. The third post will discuss how CIP-013 will be audited, including a subsequent email discussion I had with Lew on that topic.

Lew makes three main points about CIP-013:

I. Plan-based

His first point is “CIP-013-1 is a plan-based Standard…You are required to develop (R1), implement (R2), and maintain (R3) a plan to manage supply chain cyber security risk. You should already be familiar with the needs of plan-based Standards, as many of the existing CIP Standards are also plan-based.” I don’t agree that any existing CIP standards (other than CIP-013 itself) are plan-based[i], although several requirements are. Specifically, CIP-003 R2, CIP-010 R4 and CIP-011 R1 all require that the entity have a plan or program to achieve a particular objective. I would also argue that CIP-007 R3 is plan-based, even though it doesn’t actually call for a plan. This is because I don’t see that there’s any way to comply with the requirement without having some sort of plan, although perhaps a fairly sketchy one.[ii]

What are the objectives of these other plan-based requirements? They are all stated differently, but in my opinion they are all about risk management, even though CIP-013 R1 is the first requirement to state that outright. And if you think about it (or even if you don’t), when you’re dealing with cyber security, risk management is simply the only way to go. Let me explain by contrasting the CIP standards with the NERC Operations and Planning (O&P) standards, which deal with technical matters required to keep the grid running reliably.

In the O&P standards, the whole objective is to substantially mitigate particular risks - specifically, risks that can lead to a cascading outage of the Bulk Electric System – not manage them. These standards are prescriptive by necessity: For example, if a utility doesn’t properly trim trees under its transmission lines, there is a real risk of a widespread, cascading outage (which is what happened in 2003 with the Northeast blackout, although there were other causes as well). Given the serious impact of a cascading outage, there needs to be a prescriptive requirement (in this case, FAC-003) telling transmission utilities exactly what needs to be done to prevent this from happening, and they need to follow it. I totally agree that there’s no alternative to having a very prescriptive requirement for the utility to regularly trim all of their trees. It has to be all of the trees under its lines, not just every other one or something like that; a single overgrown tree can cause a line to short out (more specifically, cause an overcurrent that will lead to the circuit breaker opening the line).

The prescriptive requirements in CIP take this same approach: They are designed with the belief that, if you take certain steps like those required for CIP-007 R2, you will substantially mitigate a certain area of risk - which, in the case of that requirement, is the risk posed by unpatched software vulnerabilities. You need to follow those steps (which include rigid timelines for various tasks, as any CIP compliance professional knows only too well), and if you don’t, there will almost inevitably be severe consequences. But, if you take those steps religiously, the risk of unpatched software vulnerabilities causing a BES outage will be close to eliminated. And when we’re talking about the risk of a cascading BES outage like happened in 2003, it seems there is no choice but to eliminate risk as much as possible, not simply lower it.

But are the severe consequences really inevitable in this case? If one utility doesn’t patch five systems in their Control Center for two months, will there inevitably be some sort of BES outage (let alone a cascading one)? It’s far from certain. How about if all the utilities in one region of the country don’t patch any of their Control Center servers for one year? Will there inevitably be an outage then? Again, the answer is no, but obviously the risk of a BES outage – and even a cascading one – is much larger in this case than in the first one. The risk (which I calculate as equal to likelihood plus impact) is larger for two reasons: 1) The likelihood of compromise is higher due to the fact that the interval between patches is much longer; and 2) The potential impact on the BES is much more serious due to the fact that a number of utilities in one region of the country are all not patching. An attack that would compromise one control center would be very likely to compromise others as well, since they’re presumably subject to the same unpatched vulnerabilities.

I don’t think it will be a great surprise to anyone that the best way for utilities to lower the serious risk in the second scenario would be for them all to patch much more regularly, say every 35 days as required by CIP-007 R2. This will greatly lower both likelihood and impact, although there is definitely a cost to doing this! But since we have greatly lowered risk by getting all utilities in the area to patch every 35 days, why stop there? Could we lower risk even more by having them patch every ten days? Absolutely. Then why not go further? Why not have them patch every day, or even every hour?

Of course, at this point (if not before), you start thinking about the cost of lowering the patching interval. If a utility is going to patch all servers in their control centers every day, they are probably going to have to employ a fairly large team that does nothing but patch servers day in and day out. This is going to cost a lot of money, especially when you consider that they will quickly be so bored with the job that they’ll probably jump ship, requiring constantly finding and training replacement team members.

So where do you draw the line here? Of course, CIP-007 R2 draws it at 35 days, meaning every 35 days there needs to be a new cycle of determining patch availability (for every piece of software installed in the ESP), determining applicability, and then either applying the patch or implementing a mitigation plan. That might be perfect for say the utility’s EMS system, whose loss would have a very high impact on the BES – in fact, I know of some utilities that argue that 35 days is way too long for the EMS, and the interval should be 15 days. But there are other systems, say in Medium impact substations, whose loss doesn’t have the same level of impact. For them, could the patching cycle be lengthened to 45 or 60 days without much increase in overall risk? Quite possibly.

Here’s another consideration: How often does the software vendor release new patches? For some devices that are Medium impact BES Cyber Systems, the answer might be “close to never”. Is it really necessary to contact those vendors every 35 days, given that this takes somebody’s time that might be spent doing something else that does more to reduce BES risk? In this case, the interval for checking patch availability might be lengthened to 90 days without increasing the probability of compromise – and therefore risk – very much.

However, as we all know, a prescriptive requirement like CIP-007 R2 doesn’t allow for consideration of risk at all. Every NERC entity with Medium or High impact BES Cyber Systems needs to follow exactly the same patch management process for every system, whether it’s the EMS that controls power in a major metropolitan area, or a relay on a less impactful 135kV transmission line; and they need to check availability every month for every software package installed in the ESP, regardless of whether the vendor releases patches monthly or they haven’t released one at all for 20 years. The extra funds required to comply with a requirement like this (and CIP 7 R2 is without doubt the most resource-intensive of all the CIP requirements, although I hear that CIP-010 R1 gives it a pretty good run for its money) have to come from somewhere, and since every entity I know has a fairly fixed budget for cyber security and CIP compliance, it will have to come from mitigation of other cyber risks – such as phishing, which isn’t addressed at all in CIP now (of course, I’m sure all utilities are devoting at least some resources to anti-phishing training, which is good. However, the most recent Wall Street Journal article on the Russian supply chain attacks makes it pretty clear that some utilities are being compromised by phishing attacks, although whether that compromise has reached their control networks is still unknown).

A requirement like CIP-013 R1 is different. It requires the entity to develop a plan to mitigate risks in one area – in this case, supply chain security. It is up to the entity to decide how they’ll allocate their funds among different supply chain risks. And the best way to do that is to put the most resources into mitigating the biggest risks and the least resources – or none at all – into mitigating the smallest risks. That’s why I have always believed that the first step in CIP-013 compliance - and Lew confirms this in his article - is to identify the important supply chain threats to BES Cyber Systems that your organization faces, then rank them by their degree of risk to the BES. The amount of resources you allocate toward mitigating each risk should be directly proportional to its degree (and the lower risks won’t receive any mitigation resources). This way, your limited funds can achieve the greatest results, because they will mitigate the greatest possible amount of overall risk.

This is why CIP-013 doesn’t say the entity must take certain steps to mitigate risk X and other steps to mitigate risk Y, ignoring all of the other risks. Instead, CIP-013 does exactly what I think all cyber security standards should do: require the entity to follow the same process to mitigate cyber risks that they would follow if they weren’t subject to mandatory cyber security requirements. But entities are mandated to follow this process. It’s not a “framework” that they can follow or not, with no serious consequences if they don’t.

The point about mandated is key: We all know that utility management makes much more money available to mitigate cyber risks because NERC CIP is in place and violations carry hefty penalties (and the non-monetary consequences are at least as bad as the actual penalties), than they would if CIP weren’t in the picture. The problem is to rewrite the CIP standards so that they don’t distort the process of risk identification, prioritization and mitigation that an entity would follow in their absence – yet still keep the money spigot open because they’re mandatory. CIP-013 comes close to achieving this goal in the domain of supply chain security risk management. We need similar standards for all the other domains of BES cyber security.

Fortunately, the CIP standards are gradually moving toward eliminating prescriptive requirements and implementing plan-based (i.e. risk-based) ones. In fact, the two major requirements drafted (or revised) and approved since CIP v5 (CIP-003 R2 and CIP-010 R4) are both plan-based, and the three new standards drafted since v5 (CIP-014, CIP-013 and CIP-012) are also all plan-based; moreover, it’s almost impossible to imagine a new prescriptive CIP requirement being drafted. But prescriptive requirements like CIP-007 R2, CIP-010 R1 and CIP-007 R1 remain in place, where they continue to require much more than their “fair share” of mitigation resources.

II. Objective-based

Lew’s second point is “CIP-013-1 is an objective-based Standard.” I agree with this, too, but I think it’s redundant. If a requirement is plan-based, it’s ipso facto objective-based, since the purpose of any plan is to achieve its objective. I pointed this out to Lew in an email, and he replied that the redundancy was intended. He continues “I’m trying to lay a strong foundation for future discussion. A plan without an objective isn’t worth the media it’s recorded on. But some entities have had difficulty grasping this idea and need to have it reinforced.” Consider it reinforced!

Lew goes on to identify the objectives of CIP-013, and this is where I disagree with him (although FERC deserves partial blame for this. See below). To identify the objectives, he goes to the second paragraph of FERC Order 850, which approved CIP-013 in October. Here, FERC states that the four objectives of CIP-013 are

Software integrity and authenticity;
Vendor remote access protections;
Information system planning; and
Vendor risk management and procurement controls.

And where did FERC determine that these are the objectives of CIP-013? If you pore through the standard, I can promise you’ll never find these stated together anywhere, although you’ll find them individually in different places – along with other objectives that don’t seem to have made it into the Final Four, for some reason.

But FERC didn’t make these objectives up; they came from an authoritative source – FERC itself! Specifically, they came from FERC’s Order 829 of June 2016, which FERC issued when they ordered NERC to develop a supply chain security standard in the first place. So it seems FERC, when looking for the purpose of CIP-013, decided that the people who drafted the standard weren’t to be trusted to understand its real purpose, and the best source of information on this topic is…FERC (although, since only one of the five Commissioners who approved Order 829 is still on the Commission, it’s very hard to say that FERC 2018 is the same as FERC 2016).

This would all just be an amusing piece of trivia, if it weren’t for two things. First, FERC’s four objectives are very specific, and are far from being the only objectives found in CIP-013. For example, the first two objectives are found in R1.2, but there are four more items in R1.2 that FERC didn’t include in their list, for some reason. I see no reason why all six of the items in R1.2 shouldn’t be included in a list of objectives of CIP-013, although even that would hardly constitute a complete inventory of CIP-013’s objectives.

Since FERC didn’t do a good job of it, how can we summarize the objectives of CIP-013? It’s not hard at all. We just need to go to the statement of purpose in Section 3 at the beginning of the standard: “To mitigate cyber security risks to the reliable operation of the Bulk Electric System (BES) by implementing security controls for supply chain risk management of BES Cyber Systems.” In my opinion, this is a close-to-perfect summary of what CIP-013 is intended to do.

But Lew isn’t just quoting FERC for people’s edification; he’s saying that the objectives FERC lists should be the objectives that your supply chain cyber security risk management plan aims to achieve. Specifically, he says “Your actions in developing and implementing your plan should be directed toward achieving these four objectives. You should be prepared to demonstrate to an audit team that you meet each of these objectives. These objectives are not explicitly referenced in the Standard language. However, as outlined in the FERC Order, the achievement of these objectives is the reason the Standard was written.”

You’ll notice that Lew states that a NERC entity will need to demonstrate to the auditors that their plan achieves FERC’s four objectives. Now, even though Lew isn’t an auditor any more, I know that his words are taken very seriously by the auditors in all of the NERC Regions. This means most, if not all, auditors will pay attention to this sentence, and therefore you can expect many or even most auditors to ask you to show them that your plan meets these four objectives.

Since I obviously don’t think that FERC’s four objectives are a completely accurate summary of the purpose of CIP-013, am I now saying that Lew has provided misleading advice to NERC entities, so that they’ll end up addressing meaningless or even harmful objectives in their plans? No, there’s no harm in telling NERC entities that their auditors will want to determine if their CIP-013 plan meets each of FERC’s four objectives, since as I’ve said those objectives are all found somewhere in the standard anyway. The harm is that the real objective of CIP-013 is what’s found in the Purpose statement in Section 3; that statement encompasses FERC’s four objectives, and a lot more. This needs to be brought to people’s attention, since neither FERC nor NERC have done so yet.

Why doesn’t Lew instead say that auditors should make sure the entity’s CIP-013 plan meets the stated objective (purpose) of the standard? This could still be followed by FERC’s four things – in order to provide more detail. I think that would work, as long as it’s made clear that FERC’s four things are in no way a summary of everything that needs to be addressed in the plan. The Purpose statement provides that summary. But is that enough detail to make the requirement auditable? That’s a question I’ll discuss below and in the third post in this series.

III. Lew’s (implicit) methodology for CIP-013 compliance

Lew’s third point is “CIP-013-1 is a risk-based Standard”. He explains what that means, and in the process specifies (very concisely) a complete compliance methodology, when he writes:

You are not expected to address all areas of supply chain cyber security. You have the freedom, and the responsibility, to address those areas that pose the greatest risk to your organization and to your high and medium impact BES Cyber Systems.

You will need to be able to show an audit team that you have identified possible supply chain risks to your high and medium impact BES Cyber Systems, assessed those risks, and put processes and controls in place to address those risks that pose the highest risk to the BES.

This passage actually describes the whole process of developing your supply chain cyber security risk management plan to comply with CIP-013 R1.1, although it is very densely packed in the passage. Since I’m a Certified Lew Unpacker (CLU), I will now unpack[iii] it for you (although my unpacked version is still very high-level):

Lew’s CIP-013 compliance methodology, unpacked for the first time!

A. The first step in developing your plan (in Lew’s implicit methodology) is that you need to consider “all areas” of supply chain cyber security. I interpret that to mean you should in principle consider every supply chain cyber threat as you develop your plan. Of course, it would be impossible to do this – there are probably an almost infinite number of threats, especially if you want to get down to a lot of detail on threat actors, means used, etc. Could you simplify that by just listing the most important high-level supply chain cyber threats likely to impact the electric power industry? Sure you could, but do you have that list?

And here’s the rub: It would be great if there were a list like that, and it probably wouldn’t be too hard for a group of industry experts to get together and compile it (I’m thinking it probably wouldn’t have many more than ten items). Even better: The CIP-013 SDT was a group of industry experts. Why didn’t they put together a list like that and include it in CIP-013 R1? As it is, there is no list of threats (or “risks”, the word the requirement uses. I have my reasons for preferring to use “threats” – which I’ll describe in a moment) in the requirement, and every NERC entity is on its own to decide what are the most important supply chain cyber security threats it faces. This inevitably means they’ll all start with different lists, some big and some small.

There’s a good reason why the SDT didn’t include a list in the requirement (and, even though I attended a few of the SDT meetings, I’ll admit this omission never even occurred to me): FERC only gave them one year to a) draft the standard, b) get it approved by the NERC ballot body (it took four ballots to do that, each with a comment period), c) have the NERC Board approve it, and d) submit it to FERC[iv] for their approval (and FERC’s approval took 13 months, longer than they gave NERC to develop and approve the standard in the first place). If the SDT had taken the time to have a debate over what should be on the list of risks, or even whether there should be a list at all in the standard, they would never have made their deadline.

This is a shame, though. To understand why, consider one plan-based requirement that does include a list of the risks that need to be considered: CIP-010 R4. Looking at this requirement can give you a good idea of the benefits of having the list of risks in the requirement.

CIP-010-2 R4 requires the entity to implement (and, implicitly, to develop in the first place) “one or more documented plan(s) for Transient Cyber Assets and Removable Media that include the sections in Attachment 1”. When you go to Attachment 1, you find that it starts with the words “Responsible Entities shall include each of the sections provided below in their plan(s) for Transient Cyber Assets and Removable Media as required under Requirement R4.” Each of the sections describes an area of risk to include in the plan. These are stated as mitigations that need to be considered, but you can work back from each mitigation to identify the risk it mitigates very easily.

For example, Section 1.3 reads “Use one or a combination of the following methods to achieve the objective of mitigating the risk of vulnerabilities posed by unpatched software on the Transient Cyber Asset: Security patching, including manual or managed updates; Live operating system and software executable only from read-only media; System hardening; or Other method(s) to mitigate software vulnerabilities.” (my emphasis)

The risk targeted by all of these mitigations can be loosely described as the risk of malware spreading in your ESP due to unpatched software on a TCA. What are you supposed to do about it? Note the words I’ve italicized. They don’t say you need to “consider” doing something, they say you need to do it. And since Attachment 1 is part of CIP-010 R4, this means you need to insert the word “must” before this whole passage (in fact, that word needs to be inserted at the beginning of each of the other sections in Attachment 1 as well). You must achieve the objective of mitigating this particular type of risk.

But if I’m saying CIP-010 R4 is a plan-based (as well as objective-based and risk-based) requirement, how is that compatible with the (implicit) use of the word “must”, at the beginning of this section as well as all the other sections? Does this turn R4 into a prescriptive requirement?

I’m glad you asked that question. Even though you have to address the risk in your plan, you have complete flexibility in how you mitigate that risk. The requirement still isn’t prescriptive, because it doesn’t prescribe any particular actions. The same approach applies to each of the other sections of Attachment 1: The risk underlying each one needs to be addressed in the plan, while the entity can mitigate the risk in whatever way it thinks is best (although it must be an effective mitigation. Lew has previously addressed what that means).

Obviously, somebody who is complying with CIP-010 R4 will know exactly what risks to include in their plan for Transient Cyber Assets and Removable Media, due to Attachment 1 (and remember, since Attachment 1 is called out by R4 itself, it is actually part of the requirement – not just guidance). They can add some risks if they want (my guess is very few will do that), but at least they have a comprehensive list to start with.

And more importantly, the auditors have something to hang their hat on when they come by to audit. They can ask the entity to show them that they’ve addressed each of the items listed in Attachment 1, then they can judge them by how well they’ve addressed each one (i.e. how effective the mitigations described in their plan are likely to be, and how effective they’ve actually turned out to be – since most audits will happen after implementation of the plan, when there’s a year or two of data to consider). This is the main reason why I now realize it’s much better for a plan-based requirement to have a list of risks to address in the requirement itself, although that’s not currently in the cards for CIP-013 (it would be nice if NERC added this to the new SAR that will have to be developed to address the two or three changes in CIP-013 that FERC mandated in Order 850, but for some reason they don’t take orders from me at NERC).

You can see the difference this makes – i.e. the difference it makes to have the list of risks that must be addressed in the plan in the requirement itself – by comparing the RSAW[v] sections for CIP-010 R4 and CIP-013 R1. The former reproduces all of the detail in Attachment 1 – making the RSAW a great guide both for auditors and for the entity itself, as it prepares its supply chain cyber security risk management plan for CIP-013. The R1 Compliance Assessment Approach section goes on for more than a page.

And how about the CIP-013 R1 RSAW? Here’s the entirety of what it says for the R1 Compliance Approach: “Verify the Responsible Entity has developed one or more documented supply chain cyber security risk management plans that collectively address the controls specified in Part 1.1 and Part 1.2.” In other words, make sure the entity has complied with the requirement, period. Not too helpful if you’re drawing up your plan, but what more can be said? The RSAW can only point to what is required by the wording of R1.1, and since there is no Attachment 1 to give the entity a comprehensive idea of what they need to address in their plan, all the RSAW can do is point the reader to the wording of the requirement, which only says “The plan shall include one or more process(es) used in planning for the procurement of BES Cyber Systems to identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services resulting from: (i) procuring and installing vendor equipment and software; and (ii) transitions from one vendor(s) to another vendor(s).”

Not a lot to go on here, although I guess the RSAW could have just turned this into a question: “Does the plan include one or more processes used in planning for the procurement of BES Cyber Systems to identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services resulting from: (i) procuring and installing vendor equipment and software; and (ii) transitions from one vendor(s) to another vendor(s)?” That would have at least pointed to the need to address these two items - although I break them into three:

Procuring vendor equipment and software;
Installing vendor equipment and software; and
Transitions between vendors.

I think it would be good if the RSAW specifically listed these items (whether it’s two or three doesn’t matter to me) as being required in the plan, since they’re definitely in the requirement.

Even though it’s too late to have a list of risks to address in CIP-013-1 R1 itself, it’s not too late for some group or groups – like the CIPC or NATF, or perhaps the trade associations, for each of their memberships – to develop a comprehensive high-level supply chain security risk list for NERC entities complying with CIP-013 (as well as any utilities or IPPs who don’t have to comply, but still want to).

While the auditors couldn’t give a Potential Non-Compliance finding to an entity that didn’t include all of the risks on the list in their plan, they would be able to point out an Area of Concern – and frankly, that’s probably better anyway. I don’t think there should be a lot of violations identified for CIP-013. Given that R1.1 lacks any specific criteria for what should be in the plan, I see no basis for an auditor to assess any violation of R1.1, unless the entity submits a plan that doesn’t make an attempt to seriously identify and mitigate supply chain security risks at all. More on auditing in Part 2 of this post, coming soon to a computer or smartphone near you!

B. The second step in developing your supply chain cyber security risk management plan, in compliance with CIP-013 R1.1, is to decide the degree of risk posed by each threat (and this is why I said earlier that I prefer to also use the word threats, even though CIP-013 just talks about risks. It’s very awkward to talk about assigning a degree of risk to a risk. A risk is inherently a numerical concept; a threat is just a statement like “A software vendor’s development environment will be compromised and malware will be embedded in the software”. You can legitimately ask “What is the risk posed by this threat, and how does it compare to - say - the risk posed by the threat that malware will be inserted into a software patch in transit between the vendor and our organization, and we will install the patch?” It’s much more difficult (although not impossible, I admit) to ask “What is the degree of risk posed by this risk, and which of these two risks is riskier?” It begins to sound like the old Abbot and Costello routine, Who’s on first?).

How do you quantify the degree of risk posed by a particular threat? You need to consider a) the potential impact on the BES[vi] (remember, all risks in CIP-013, like all NERC standards, are risks to the BES, not to your organization) if the threat is realized in your environment, as well as b) the likelihood that will happen. You need to combine these two measures in some way, to come up with a risk score. Assuming that you’re assigning a high/medium/low value to both impact and likelihood (rather than trying to pretend you know enough to say the likelihood is 38% vs. 30%, or the potential impact on the BES is 500MW vs. 250MW, which you don’t), I recommend adding them. So if you assign values of 1, 2 and 3 to low, medium and high, and the likelihood is low but impact is high (or vice versa), this means the risk score for this threat is 4 out of a possible 6 (with 2 being the lowest possible score).

C. The third step is to rank all of the threats by their risk scores. Once you have your ranked threat list, you instantly know which are the most serious supply chain cyber security threats you face: they’re the ones at the top of the list.

D. The fourth step is to develop a risk mitigation plan for each of the top threats. As mentioned earlier, there’s no question that you won’t be able to completely mitigate any cyber threat. The most you should aim for is to bring the level of risk for each threat on your “most serious” list down to a common lower level (say, you’ll aim to bring all threats with a risk score of 5 or 6 down to a risk score level of 3 or 4), at which point some other unmitigated threats will then pose higher levels of risk; if you still have resources available to you, you should consider mitigating those “second-tier” threats as well. But whatever your available budget, you should invest it in mitigating the highest risks – that way, you’re getting the most bang for each hard-earned buck.

While you’re anxiously awaiting Part 2 of this post, you might re-read (or read for the first time) this post describing my free CIP-013 workshop offer, which is still on the table. If you would like to discuss this, I'd love to hear from you!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

[i] You could certainly argue that CIP-014 is a plan-based standard, although I think it falls short in a few ways. So let’s leave it out now, and say we’re just talking about cyber security standards.

[ii] On the other hand, I don’t call CIP-008 and CIP-009 plan-based, even though they both explicitly call for plans. They are definitely objectives-based, with the objective being an incident response plan or backup/recovery plan(s) respectively. But in my view of a plan-based requirement, the objective is always managing a certain area of risk. In CIP-010-2 R4, it’s risk from use of Transient Cyber Assets and Removable Media. In CIP-003-7 R2, it’s risk of cyber or physical compromise of BCS at Low impact assets. In CIP-011 R1 it’s risk of compromise of BES Cyber System Information. And in CIP-007 R3 it’s risk of infection by malware. But CIP 8 and 9 aren’t directly risk-based, not that they’re bad standards of course. They both call for development and testing of a plan, but risk only enters tangentially into the plan (if at all), since say a CSIRP for an entity in a very risky environment should probably be more rigorous than one for an entity in a “normal” environment.

[iii] I readily admit that what I write in the rest of this post isn’t all just an expansion of what Lew says in these two short paragraphs! However, each of the compliance steps that I discuss below is implicit in those two paragraphs. If you don’t believe me, I can prove it to you using advanced mathematics.

[iv] I’ll admit this is just my speculation. It’s not so much that the SDT wanted to draw up the list and didn’t have time to do it, as that they never had the leisure to even consider more philosophical questions like this; they were racing against the clock during their whole existence.

[v] RSAW stands for Reliability Standard Audit Worksheet. It’s the guide that NERC auditors use for audits. And for that reason, it’s also the guide that NERC entities use as they set up their compliance program for a particular standard.

[vi] Lew made the following comment at this point in the post: “This is a concept that needs more attention. I think entities should give consideration to those threats that could result in damage to difficult-to-repair equipment, such as large transformers, generators, turbines, boiler feed pumps, etc. If you can take over the control system for such equipment and run it to destruction, that is a risk with higher consequence that merely opening a breaker and causing a short outage. And I think this is the type of compromise that a nation-state would be interested in.” A word to the wise!