Tom Alrich's Blog: June 2018

Wednesday, June 27, 2018

Are Generating Plants Vulnerable to a Cyber attack?

July 24, 2018: I just realized that I never finished this post, and it might leave the reader with the wrong impression of what I was saying (although the E&E News article referred to would hopefully correct that). Since I'm referring to this post in a new post I'm doing today, I have added a final paragraph to make clear my position. - Tom

On June 26, Energy and Environment News published an – as usual – excellent article titled “Coal plants’ vulnerabilities are largely unknown to feds”. Since EE News is a subscription service and the price is fairly steep, you will probably need to see if the organization you work for can foot the bill for the service. But this is an excellent newsletter regarding energy and the environment[i], and I highly recommend you look into subscribing. Without any doubt, they have the best coverage of cyber security in the energy industry, written by Blake Sobczak and Peter Behr.

I’ll let you read the article, which speaks for itself, but I’d like to add a little to the quotations from me that appear at the end of the article. Blake didn’t misrepresent anything I said to him when we talked, but I got (mildly) chastised by an industry consultant for being too easy on the generation sector. Here is my overall position on cyber security for that sector.

I believe most coal, hydro and gas generating plants – especially those that are Medium impact under CIP – are probably fairly cyber secure as far as their own operations go. In other words, if one of these plants were to experience a cyber attack, it is very unlikely that it would be tripped.
This also applies to the Criterion 2.1 plants (>1500MW) that have been segmented so that there are no Medium impact BES Cyber Systems. There is a popular misconception that the ability to segment the plant so that no single system can affect 1500MW – which means there are no Medium BCS - constitutes a “loophole” in the CIP requirements. This is simply not the case. If say an 1800MW plant with three 600MW units is properly segmented (and the auditors are looking at this very closely whenever an entity claims that a 1500MW+ plant has no Medium BCS), then this plant is no more vulnerable to a complete shutdown from a cyberattack than would be three 600MW plants situated near each other. The only difference is that in the first case, the three “plants” share a common fence and in the second they don’t.[ii] Of course, if you think the 1500MW threshold is too high and it should really be around 500MW, that’s another story – but I think this is appropriate, and it’s actually a lot lower than the 2200MW that I remember was originally approved by the Standards Drafting Team[iii].
Even if a single plant, no matter how large, were to be brought down by a cyber attack, this would most likely not have a BES impact, since N-1 contingencies are already well planned-for. The danger to the BES would be from a coordinated attack on multiple plants.
Such a coordinated attack would be very hard to pull off (I used to think it was literally impossible, but now I’m not quite so sure about that, given some information I learned fairly recently about a situation in one part of the US. I am trying to interest various organizations in investigating this potential vulnerability. So far I haven’t had any success, but I’m not done yet. I will never publish details about this in my blog, but I’m not going to stop until some organization has committed to investigating this situation. However, even if this vulnerability were to be exploited, it is highly unlikely that an outage would occur, and certainly not a widespread or even cascading outage).

So my position is that, while it wouldn't be completely impossible to cause a widespread outage by attacking generation, it would be very difficult. As I said at the end of the article linked at the top, if you're aiming to bring down the North American power grid, you need to look elsewhere than generation.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post – especially on compliance with CIP-013. And if you’re a security vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss any of this, you can email me at the same address.

[i] There are actually multiple newsletters, all good.

[ii] Of course, the switching yard that connects a 1500MW+ plant will be Medium impact under criterion 2.8, regardless of whether the plant is segmented or not. And the Control Center that dispatches the plant will still have to count it as a Criterion 2.1 plant for criterion 1.4, or count the entire 1500MW in determining whether it is Medium impact under criteria 2.11 or 2.13.

[iii] This was for CIP v4. A 2200MW figure was approved at an SDT meeting in the summer of 2010. But before CIP v4 was finalized, the threshold was lowered to 1500MW. I must have missed that meeting, or maybe I was doing emails.

Monday, June 25, 2018

An Auditor gives advice on Event Logging

A CIP compliance analyst with a large electric utility wrote in recently with the following question:

'I am curious as to your experience with solutions for CIP7R4.2.2 “Detected failure of Part 4.1 event logging”. We have heard from program managers that other companies in the industry use either an ICS vendor solution or rig up a “heartbeat” or “polling” proprietary solution. In this particular situation, the platforms are Intel and Linux. I’m curious as to what is accepted as a solution to this interesting requirement.'

I passed this question on to an auditor who usually has something interesting to say on anything having to do with NERC CIP. He didn’t disappoint this time – in fact, he obviously devoted about an hour on a gorgeous weekend day (at least it was here in Chicago – I don’t know about the city where the auditor lives) to putting together the following answer:

“The answer is long and complicated. It all depends on the capabilities of both the monitored and monitoring systems. First of all, the entity needs to fully understand the expectation of the requirement. The requirement is not to determine that the Cyber Asset generating the logs is up and running, or even solely that it is generating logs locally. The expectation is to detect a failure of the logging process from start to finish. There are numerous potential points of failure. Something could happen on the Cyber Asset generating the logs that causes it to stop logging (perhaps the log file is full). If the device cannot natively send its logs to the log server/SIEM, it will need an agent to perform this function; something could happen to cause the agent to fail. Perhaps the IP address of the log server is incorrectly configured and the logs are being sent to the bit bucket. Perhaps there is a networking issue and the log server is not reachable from the Cyber Asset generating the logs. And then there are the issues that crop up on the log server/SIEM to contend with, especially when the log service and SIEM are different applications on the same or different servers.

“Here is what I have seen that does not work:

“- Some entities have simply monitored the Cyber Asset generating the logs using a simplistic method such as pinging the system. That approach fails because it can only detect when the system is either completely down or unable to be reached over the network. The problem with the ping approach is that it cannot detect when the device is up but the logging service has failed. As a side note, a Cyber Asset that is down is not generating logs. That is not a failure of event logging as envisioned by CIP-007-6 R4 Part 4.2.2. When the system is down, there is nothing to log. That does not mean that monitoring system availability is not important; it just does not accomplish what is expected in this instance.

“- A variation of the above is to monitor the logging agent on the Cyber Asset that cannot natively send its logs to a log server/SIEM. Quite often this is accomplished by seeing the service is “running.” This approach fails because of several reasons. The service could be hung; while it is “running,” it is not doing anything. The destination IP address of the log server/SIEM could be incorrect. There could be a networking issue making the log server/SIEM unreachable. And, if the only monitoring is of the source and not also of the log server/SIEM itself, the log server/SIEM could be down. The problem with monitoring the logging service on the source Cyber Asset is that this cannot detect a failure in the path between the source log and the destination log server/SIEM.

“OK, so what can work? Here is what I have seen:

“- Some systems are normally “chatty,” meaning that they generate a lot of log traffic in the normal course of operation. If the SIEM is capable, an event trigger could be configured that would generate an alert if the source system has not been heard from in a reasonable period of time. For example, a Windows or Unix/Linux system normally generates many logs per minute. The entity could determine how long it typically takes for the source system to reboot, add a buffer, and set the event trigger to alert if nothing has been received from the source system within the timeout window. For example, let’s say the source Windows system normally generates an average of ten event log messages per minute when idle and takes five-to-ten minutes to reboot after applying patches. If the entity defined a trigger event that would alert if no log messages have been received from the Windows system in fifteen minutes, that would accomplish the Part 1.4 requirement while minimizing false alerts. If the system generates only one message an hour and takes five-to-ten minutes to reboot, a two- or three-hour timeout might be appropriate.

“- Some entities cause their Windows and Unix/Linux Cyber Assets to issue a specifically crafted “heartbeat” event log message on a defined periodicity rather than simply monitoring for any log traffic. In this case, the SIEM is configured to generate an alert if the heartbeat message is not received as expected. Again, allowing for normal outages, such as the reboot timing, the failure to receive the heartbeat message indicates a failure somewhere along the path that needs to be investigated. This is relatively easy to implement, using a cron job in Unix/Linux or an AT scheduled task in Windows. The periodically scheduled task uses the appropriate operating system features to generate an event log message that is then picked up and sent to the log server/SIEM. In Windows, this can be done from a .bat file that uses the command line interface to execute the “eventcreate” command. Again, the timeout is based on the periodicity of the periodic event message creation.

“- Some Cyber Assets are very quiet, especially network switches. These devices usually have no native capability to generate an event log message on demand. There are several options here. If the switch is a managed switch with external IP accessibility, the entity might be able to use a remote management system to periodically connect to and log into the switch. This could be as simple as relying on a third-party solution that is already being used to periodically back up the configuration (e.g., CiscoWorks or Industrial Defender). The switch is expected to log the access event per CIP-007-6 R4 Part 4.1.1 and 4.1.2 anyhow. The login attempt message can be used in lieu of a specially crafted heartbeat. If the switch is not externally reachable for management purposes, the entity might be able to trigger the log in event from another Cyber Asset within the ESP and accomplish the same thing.

“- As a last resort, the entity staff need to manually check on the device, perhaps as part of the daily system checks, to see if there are recent log messages in its buffer that were not sent out.

“If the entity is using multiple log servers and/or redundant SIEMs, the monitoring should include all of them. That way, the entity does not find itself unexpectedly in a single point of failure situation.

I am sure there are other options, but these are the typical ones I have seen and none of them require extensive programming effort or expensive vendor support.”

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

Wednesday, June 20, 2018

On Screwing Up

At the end of my post yesterday, I chastised (tongue in cheek, of course) Lew Folkerth for making a mistake in his most recent article, which I analyzed in the post. I even suggested that his offense might warrant execution, although I recommended the execution be stayed. I'm very glad I did that, because my head would be on the block next to his! It turns out I made a few mistakes in the post myself.

First, my longtime friend Jim Batug, retired from PPL Generation, pointed out that I had referred to CIP-007-3 at one point, when I obviously meant CIP-003-7. I corrected that.

Then an auditor wrote in to point out two problems with the post. In both cases, I inserted what he wrote within the post.

A paragraph where I tried to be Tom Alrich, Boy Engineer and sound like I know more about substations than I actually do.
An implementation date mistake I made in the last part of the post (the part where I was roasting Lew for making a mistake!).

I just hope the auditor doesn't give me a PNC for this, since I can't afford a $1 million fine. But even that would be better than being executed. Barely.

Tuesday, June 19, 2018

A must-read for your Lew Folkerth Collection

I’m now getting in the habit of reading Lew Folkerth’s column in the RF newsletter as soon as the newsletter appears. So when I got notice of the May-June newsletter, I didn’t let it sit in my inbox for days, but immediately downloaded it and went to Lew’s column – called, as always, The Lighthouse. As usual, it was very rewarding to read the column, and I think you’ll agree with me when you read it (unless you don’t give a d___ about NERC CIP, but then why are you reading this blog in the first place?).

Lew covered multiple topics in this column, and did a good job on all of them except the last one. Since I’m naturally jealous of anyone who has such a good knowledge of everything having to do with NERC CIP, I will of course make a big point of calling attention to his error. But I’ll start at the beginning:

First, he discusses the compliance date for CIP-003-7. Of course, this agrees completely with what I said about the same topic, at much greater length. But in my opinion, I said it with a lot more flair than he did. So there.

Second, he provides a great analysis of the meaning of something FERC ordered in Order 843 (which approved CIP-003-7), which is a study of how the revised electronic access control requirement in CIP-003-7 is implemented (I had frankly skimmed through that part of the Order). He points out that FERC, in their NOPR on CIP-003-7 last year, had sounded like they were going to order beefed-up electronic access controls when they approved CIP-003-7 (as I had discussed in my post soon after the NOPR appeared).

But FERC was evidently persuaded by the comments received on the NOPR that they should hold off on doing this for now, and instead ordered NERC to conduct the study once audits begin on CIP-003-7 (and since they won’t begin until after 1/1/20, the report probably won’t be out until 2021). Lew points out three expectations FERC has for how entities will comply with the new requirement:

Responsible Entities are expected to be able to provide a technically sound explanation as to how the electronic access controls meet the security objective.
NERC and the Regional Entities will have the ability to assess the effectiveness of the electronic access control plan required by CIP-003-7 R2.
NERC and the Regional Entities will have the ability to assess an entity's adherence to its electronic access control plan.

You can bet the auditors will be looking for these three things as well. So make sure you know how you are going to address them as you’re implementing (or reviewing) your electronic access control program for your Low impact assets. I do want to point out that “effectiveness” is something Lew has emphasized is very important all along: When the requirement just tells you the objective to achieve, not how you are to achieve it, the auditors are going to want to make sure whatever control you do implement is effective. So you can’t say you decided that repeating an ancient chant once a day was the best way to control electronic access to your Low impact assets. Sorry to disappoint you on that.

Lew also pointed out that you can expect that, when you do get audited on this requirement starting in 2020, the auditors will ask more questions than they would normally need to, in order strictly to determine compliance with the requirement. Don’t get upset about this, since they’re doing it mostly because FERC wants this information. But definitely be prepared to answer those questions.

Lew’s third topic is the impact of Criterion 2.4 of Attachment 1 of CIP-002-5.1 on Low impact BCS. He did this in response to a question whether the presence of a single 500kV line at a transmission substation brings “the entire substation” to the Medium impact level. Lew starts his answer by pointing out that criterion 2.4 (and indeed, all of criteria 2.4 through 2.8) applies to “Facilities” with a capital F, meaning it’s a NERC Glossary term.

Lew points out that each line, transformer, bus, etc. in the substation is a Facility. So in the case of a substation with one 500kV line, that line is the only Medium impact Facility at the substation, meaning only the BES Cyber Systems that control that line (primarily relays, of course) will be Medium impact.

Note: An auditor wrote in to me after this appeared and made the following comments on the above paragraph. Of course, I stand corrected and appreciate his pointing this out to me: You stated “Lew points out that each line, transformer, bus, etc. in the substation is a Facility. So in the case of a substation with one 500kV line, that line is the only Medium impact Facility at the substation, meaning only the BES Cyber Systems that control that line (primarily relays, of course) will be Medium impact.” You are not technically correct in your statement. The Medium Impact Facilities include the breakers, switches, transformers, etc., that are operated at 500 kV (essentially connected in some fashion to the 500 kV line). The relays do not “control” the line, they control the equipment connected to the line. If you are monitoring the line (or more likely the bus), that brings in the relays connected to the CTs and PTs (Current and Potential Transformers).

The rest of the BCS will be Low impact, unless the substation also meets Criterion 2.5. Lew points out that “In order to meet IRC 2.5, a substation must connect at 200kV or higher to three other substations… If this aggregate weighted value exceeds 3000, then the BES Cyber Systems associated with Facilities at that substation receive a medium impact rating (my emphasis).”

I do want to point out a slight infelicity (I won’t call it an error. Heaven forbid!) in the italicized phrase in the last sentence above. This seems to say that, if the substation does meet criterion 2.5 as well as 2.4, then all BCS in the substation will be Medium impact. In fact, criterion 2.5 says that only Facilities operated at 200-499kV will be Medium impact. This means that, if there’s a 138kV line also at the substation, it will be Low impact and the relays associated with it will also be Lows.

During the period in 2014 and 2015 when NERC entities were trying to figure out how to identify and classify BES Cyber Systems, I pointed out a few times – including this post – that, for criteria 2.4 - 2.8, entities don’t have to classify all BCS at the asset in question at the Medium level. But I also talked to some entities about whether any of them were taking advantage of this. The universal answer was no (and I talked to a few very large entities, who would presumably have a lot of BCS that might be reclassified Low rather than Medium impact). The reasons included:

It would be too confusing to require the substation technicians to treat some BCS differently than others at a single substation;
Many, if not most, substations don’t have their networks segregated according to the voltage level of the lines or transformers controlled by the different systems on the network. It would be a lot more expensive and time-consuming to try to separate the networks than to leave them connected. Of course, what this means is that, even though some BCS at the substation might be Low impact, since they’re on the same network as Medium BCS they’ll end up being Medium Protected Cyber Assets anyway – and they’ll be subject to almost all the same requirements as Medium BCS; and
Go away, you’re asking too many questions.

My guess is reason number 2 is probably the most important of these three reasons. But I would be interested in hearing from anybody who did actually take advantage of the “Facilities” language to treat some of their BCS at a “Medium impact substation” as Lows.

Lew’s fourth topic is also quite interesting. Someone asked “Is a list of low impact BES Cyber Systems required?” Of course, a lot of people in the NERC CIP community ask that question. Even though the CIP standards say in two places that such a list isn’t required, some of the regions have given noises otherwise, and all of the regions have made it clear they wouldn’t mind seeing such a list.

Lew’s answer is quite straightforward: No, it isn’t required, as long as you’re willing to have your physical and electronic access controls at the Low asset apply to every Cyber Asset located in the asset. But if you have, say, a firewall that only protects some of the Cyber Assets but not others (and those other assets are connected routably to the outside world), you will need to be able to show that all BCS have been protected by the firewall.

The last question that Lew addresses is “Does the approval of CIP-003-7 alter the required date for the first test of my Cyber Security Incident response plan for low impact BES Cyber Systems?” And here Lew made his mistake. His reply included “Section 4.5 requires a test of the plan every 36 months. Section 4’s effective date was April 1, 2017. Therefore the first test of your incident response plan for low impact BES Cyber Systems must be completed by April 1, 2020.”

However, an auditor from another region read Lew’s column and noted that the initial performance date for the Low impact CSIRP in CIP-003-6 was April 1, 2017; indeed, I had pointed this out in a post two weeks before that date. So Lew changed his response to say “No, the first test of your incident response plan was due on April 1, 2017. This is not changed by CIP-003-7.” If you want to verify this for yourself, go to the NERC spreadsheet that I linked in this recent post (which pointed out that the initial performance date for the initial test of a High impact CSIRP is 7/1/18. In other words, the Lows had to have their CSIRP tested 15 months before the Highs did! How’s that for fair treatment?).

The same auditor pointed out to me that my statement ".. the initial performance date for the initial test of a High impact CSIRP is 7/1/18." is wrong, since the High CSIRP date was 7/1/2017. I was thinking about the initial performance date for the high impact recovery plan test, which I'd written about in the recent post linked above and has an initial performance date of 7/1/18. Again, I stand corrected and thank the auditor for pointing this out.
.
So it turns out that both Lew and I screwed up here. I just hope the auditor doesn't give me a PNC for these mistakes! I can't afford to pay a $1 million fine.

So Lew screwed up. I suggest he be given a stay of execution for this offense, on the grounds that he has no previous record, he’s a nice guy, he’s a good family man, etc. But don’t let it happen again, Lew! J

Note from Tom 6/26: Lew asked me to point out that anyone who downloaded his article a week ago or more should re-download it, since he has corrected the problem noted.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post. And if you’re a security vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss any of this, you can email me at the same address.

Friday, June 15, 2018

Our very own Maginot Line!

I have been saying for the past 2 ½ years that I am writing a book about how to fix the problems in the NERC CIP standards (and the compliance regime that accompanies them). This year, I’m actually making progress toward that goal (with one or two co-authors), and I hope to have the book published this year (self-published, to be sure!).

As part of this effort, I have been thinking a lot about what should be in scope for CIP. And I just reread a post I wrote in July 2016, very soon after FERC issued Order 829, which mandated development of a supply chain security standard. In this post (which I wrote after the initial post describing what was in Order 829), I pointed out that in one part of the Order, FERC seemed to be considering having more in scope for the new standard than just BES Cyber Systems.

In fact, FERC laid out four objectives for the new standard(s) they were ordering. Three of the four objectives applied to BES Cyber Systems, but the third objective was called “Information System Planning and Procurement” (paragraphs 56-58, pages 37-38). I found this title very interesting, because I’m sure FERC understands that control systems aren’t “information systems”. Indeed, in the entire discussion of the third objective, FERC never once mentions BCS.

Yet the first sentence of this section reads “The new or modified Reliability Standard must address how a responsible entity will include security considerations as part of its information system planning and system development lifecycle processes.” And the first sentence of the next paragraph (paragraph 57) reads “This third objective addresses the risk that responsible entities could unintentionally plan to procure and install unsecure equipment or software within their information systems, or could unintentionally fail to anticipate security issues that may arise due to their network architecture or during technology and vendor transitions (my emphasis).”

Clearly, FERC isn’t being sloppy here; they are talking about controls on the procurement of information systems (which are also called IT systems, of course). This is underlined in the next sentence of paragraph 57, where they bring up BlackEnergy. As you know, this is the malware that allowed the attackers in the first Ukraine attack in 2015 to take control of the IT network at several utilities. The attackers had free run of the IT network for more than half a year, before they finally figured out a way to take control of key relays on the OT network – which was their ultimate objective, of course.

BlackEnergy did all of its direct damage on the IT network; it never itself penetrated the OT network (and probably couldn’t have, since all the OT connections were most likely serial). But FERC used it as their poster child for what they were trying to prevent, in articulating their third objective in Order 829. Paragraph 56 states that this third objective includes “identification and documentation of the risks of proposed information system planning and system development actions (my emphasis).”

So it seems clear to me that FERC was asking NERC to start looking at some controls on IT systems as well as OT systems, at least as far as procurement and installation are concerned. They understood that the Ukraine attacks wouldn’t have happened if the attackers had to penetrate the OT networks first, rather than starting with the soft underbelly of the IT network.

Of course, the CIP-013 drafting team didn’t take FERC up on their implicit suggestion to include systems deployed on IT networks in the scope of the new standard. And I certainly can’t blame the SDT for not doing that, because:

1. The decision to include IT systems within the scope of a CIP standard would have to be made at a higher level than an SDT; in fact, it would likely require some vote of the NERC ballot body.

2. More importantly, FERC only gave NERC one year to develop the new standard, put it through four ballots (with changes between each one), get it approved by the NERC Board of Trustees, and finally put it on FERC’s desk to approve. Debating a big change like including IT systems would have made it impossible to meet that deadline.[i]

But I don’t think the fact that the SDT didn’t take up FERC’s suggestion of including IT systems in the scope of CIP-013 in any way settles the question whether IT systems should ever be included, in any way, in the scope of any CIP standard. My contention is that the Ukraine attacks show that ignoring the IT network altogether can make it more likely that a cyber attack could impact the North American BES at some point. I am certainly not saying that IT systems need to be included in the scope for all of the current CIP standards, or even for any of them. It may be that the main risk from IT systems is when they are deployed, as FERC implied in the quotations above, meaning that they should be included in the scope of CIP-013 at some point - although even then probably not in the same way as BES Cyber Systems are now.

But when I’ve said something about including IT systems in scope for CIP to people knowledgeable about NERC and NERC CIP, they have always disagreed with me, for two reasons. The first reason they bring up is that NERC has “no jurisdiction” over cyber assets on the IT network. I simply don’t believe this. Anything the utility does that can have an impact on the Bulk Electric System is in scope for NERC standards in general (and the distinction between IT and OT networks first appeared in the NERC standards with CIP, even though I don’t think the term “operational technology” had been coined then).

For example, there are a number of NERC standards (like FAC-003, the standard requiring tree trimming) that require records that are certainly kept on the IT network. In fact, OT networks don’t normally hold records at all, except records of the operations or configurations of the network devices themselves. If NERC wanted to create a new type of Cyber Asset in scope for CIP, called something like “Protected IT Cyber Asset”, I doubt this would violate anything in NERC’s Rules of Procedure, let alone Section 215 of the Energy Policy Act of 2005 (which set the foundation for mandatory reliability standards for the industry).

However, it is the second reason they bring up that I find most interesting. When I point out (as FERC did) that a compromise of IT networks at Ukrainian utilities led to the attacks on their OT networks, I inevitably hear, “Oh, that would never happen in North America. Even if a utility doesn’t have to comply with CIP, they all have well-configured firewalls in place to protect their OT networks (which of course the Ukrainian utilities didn’t have). And any utility subject to CIP has very good protections in place, beyond a doubt.”

Let’s stipulate that the point about all utilities having good firewalls is correct (and I have no evidence to suggest otherwise, although the problem with firewalls is they can always be made “wide open” through one mistake by an administrator, let alone a skilled attacker). And let’s go beyond that to stipulate that all utilities have great remote access control with two-factor authentication (as is required for Medium and High impact assets by CIP-005 R2). What these people are saying is that, if these two protections are in place (as well as other protections required by CIP-005 R1), there is virtually no possibility that a compromise of the IT network (even a thorough one like in the Ukraine, where it seems the attackers had free run of the entire network for six months or more, after initially getting a foothold through a phishing email) could lead to a successful attack on the OT network.

Of course, this is nonsense. It is equivalent to the French belief after World War I that an impenetrable line of forts along their border with Germany would prevent the Germans from ever invading France again. These forts were actually constructed and were called the Maginot Line. Of course, at the beginning of World War II, the Germans simply bypassed the line and invaded France through Belgium[ii].

So I really don’t believe there is any way someone can assert that the IT network can never have an impact on the BES, and therefore never needs to be in scope for the CIP standards. If they control the IT network, anyone with enough resources and time (both of which were in abundant supply for the Ukrainian attackers) will be able to find a way into the OT network, no matter what controls are in place. Here’s one example: Suppose an engineer gets an email from his or her boss’s account (which of course has been taken over by the attackers, probably through a keystroke logger that recorded his or her password), saying that at 2 PM the next day, he needs to open a particular set of circuit breakers as part of a test they are doing. Hopefully, the engineer will be suspicious of that request, but I think all of us can attest to times when we have come close to believing a phishing email, despite our thorough understanding of the dangers.[iii]

Again, I’m not saying that I want the scope of the existing CIP standards to suddenly be expanded to include IT systems. The proposal I am making in my book is to completely rewrite the standards (or more exactly, to replace them with new standards, or really just one new standard), so that – to make a long story very short - they are objectives-based and risk-based. IT systems will never pose the same level of risk to the BES as OT systems do, and therefore the entity will never need to apply the same level of controls to IT systems as they do to OT systems[iv]. But I also don’t want IT systems to be left out of CIP altogether. NERC and NERC entities have to give up the idea that their OT networks are safe from anything that could come through the IT network, behind their impenetrable Maginot line.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post. And if you’re a vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss this, you can email me at the same address.

[i] As it is, I think the enforceability of CIP-013 R1.1 – the key part of the key requirement in CIP-013 – was reduced close to zero, due to the fact that the drafting team had to get something – anything – developed and passed by the deadline. I say this because R1.1 provides no list of threats that the entity needs to at least consider in developing their supply chain cyber security risk management plan. I discussed the issue in this post – where I called this a “near-fatal flaw” in CIP-013. Again, I can’t blame the SDT for leaving a list of threats out, since their short deadline didn’t allow for the discussion that would have been needed to include a list, as well as perhaps a few additional ballots as NERC entities second-guessed whatever list the SDT would come up with.

[ii] The French had of course anticipated the Germans would do this, and had a sizable force (with the British) in Belgium to counter them. But they were outmaneuvered because of another mistaken assumption they’d made; see the Wikipedia article referenced above.

[iii] Last week, I got an email supposedly from FedEx, about a package that was going to be delivered to me that day. Since I did have a package scheduled for delivery, I almost clicked on the link in the email before I looked at the email address and realized it was a phish. And three or four years ago, I heard a gentleman who was in charge of a big cyber security group at DHS mention that, in a test phishing email, a large percentage of the employees that reported to him – and probably warned people of the dangers of phishing every day – clicked on the link.

[iv] Of course, I realize that most utility IT systems are very well protected anyway, even though CIP doesn’t apply to them. But an important part of the “new CIP” proposal I am making in my book is that the utility needs to be able to look at all of the cyber threats to the BES at once, compare the risk each threat poses to the BES, and direct their efforts (and funds) toward mitigating the most important risks. Saying the IT network is completely out of scope, without even considering whether there are any serious threats to the BES that could come from the IT network, obviously defeats this process. If the utility truly believes they have already completely mitigated any IT threats to the BES from IT, they would certainly be able to assert this, with appropriate documentation. In the CIP compliance regime I’m proposing, NERC entities would be in charge of the decision as to which threats to mitigate, and to what degree they should do so. But they would have to document the reasons for their decisions.

Monday, June 11, 2018

Are you thinking about CIP-013?

I have been unusually silent on CIP-013 lately; I’ve gone a whole month since posting about it. However, that doesn’t mean it’s not coming. I still believe (and others do as well) that FERC will approve the standard in Q3 (meaning at their September meeting). And as the post just referenced shows, I still believe the most likely compliance date for CIP-013 is April 1, 2020, while the next most likely is July 1, 2020. And as I said in this post in January, you really need to aim to have your supply chain cyber security risk management plan (which is the whole point of CIP-013, of course) finished by six months before the compliance date, to give you time to have it reviewed by your region.

So you really need to consider October 1, 2019 or January 1, 2020 as your “plan completion date”. Once your region has given you their comments on your plan, and you’ve adjusted the plan to address those comments, you should then put it into place. Hopefully, you'll have it implemented with at least a little time remaining before the compliance date. And if you’re one of the entities that likes to come into compliance at least 90 days before the compliance date (as did a number of entities in the run-up to CIP version 5), then you need to move each of these dates up by 90 days, to July 1 or October 1, 2019).

So now the date you will need to have your supply chain cyber security risk management plan developed is as early as next July 1! Does that seem very far away? Not if you know what you will need to do to develop your plan (hint: it’s a lot).

Which brings me to the subject of this post. Tom Alrich LLC is offering a free 1-2 hour webinar workshop for your company on CIP-013 and what you will need to do to comply with it. The purpose of the workshop is to get the different groups that will be involved in complying with CIP-013 – supply chain, legal, cyber security and NERC compliance - thinking about the issues that are involved. And in case you haven’t been reading my posts on this subject, complying with CIP-013 will be very different from complying with any of the previous CIP standards. The topics to be addressed can include:

CIP-013 is one of the first risk-based NERC standards. While it’s not mandatory, it is highly advised to classify both BES Cyber Systems and vendors by the degree of risk they pose, with different plan strategies corresponding to different degrees of risk. How can you do this?
The standard doesn’t list the particular risks (although I would prefer the term ‘threats’) that you need to address in your supply chain cyber security risk management plan. How can you compile a credible yet manageable list of risks for your plan?
CIP-013 is the first plan-based CIP standard that doesn’t prescribe any particular actions - it simply requires that you develop and implement a plan[i]. How will you develop the plan and how will it be audited?
While attention has mostly focused on the requirement to mitigate vendor risk, the entity also needs to mitigate implementation risks and risks of transition between vendors, as well as risks posed by services vendors. What are possible strategies for these?
While much of the discussion of CIP-013 has focused on the question of getting vendors to agree to contract language, it is a fact that contract language isn’t the only way – or probably even the preferred way – to get vendor agreement to take actions required by CIP-013. What are good strategies for obtaining vendor commitment, so that the high-cost option of demanding contract language can be avoided, except in cases where it is really needed?
How do you document that vendors followed through on their promises? And what do you do if a vendor doesn’t keep its promise, or won’t make any promise to you in the first place?

If you would like to discuss this with me, please drop me an email at tom@tomalrich.com or call me at 312-515-8996. Thanks!

[i] CIP-013 R1.2 lists six general risk mitigation goals that must be addressed in your plan, but doesn’t require you to take specific steps to achieve any of these six goals. The new versions of CIP-005 and CIP-010 that were balloted with CIP-013 (and will be implemented when CIP-013 is) include three new requirement parts (CIP-005-6 R2.4 and R2.5, and CIP-010-3 R1.6) that in fact do require the entity to take specific actions that implement two of the items in CIP-013 R1.2 (specifically R1.2.6 and R1.2.5). But CIP-013 itself doesn’t require any specific actions.

Monday, June 4, 2018

A Hole in the new Low impact Electronic Access Control Requirement

I had a conversation today with someone I have known for quite some time who is a very close follower of developments in NERC CIP. He pointed out to me a serious flaw in the Low impact electronic access control requirement in CIP-003-7. If a lot of entities take advantage of this flaw to lower their compliance burden, this could end up undermining the security of many Low impact assets. However, I would guess that most NERC entities with Low assets will do the right thing from a security POV anyway, so this post is more in the interesting facts category than the “sound the alarm” category.

Section 3 of Attachment 1 of CIP-003-7 starts off with this wording:

Electronic Access Controls: For each asset containing low impact BES Cyber System(s) identified pursuant to CIP-002, the Responsible Entity shall implement electronic access controls to:

3.1 Permit only necessary inbound and outbound electronic access as determined by the Responsible Entity for any communications that are:

i. between a low impact BES Cyber System(s) and a Cyber Asset(s) outside the asset containing low impact BES Cyber System(s);

….

Note that this requirement is written strictly to protect communications between BES Cyber Systems located at a Low impact asset and any Cyber Asset outside the Low asset. By contrast, the electronic access control requirement for Medium and High impact BCS, CIP-005-5 R1, is applicable not just to BCS but also to the Protected Cyber Assets (PCA) that are associated with them.

Of course, it is good that this is the case for Mediums and Highs. Since PCAs are Cyber Assets that are routably connected to one or more BCS, they can easily be used as “jumping-off points” to attack the BCS themselves. It makes no sense to protect just some systems on a network; if they aren’t all protected, then none of them are really protected.

However, when it comes to Low assets, there is no concept of a PCA. At Medium and High impact assets, PCAs are identified by first drawing the ESP, then identifying any systems that aren’t part of a BCS as PCAs. But the entity owning a Low asset isn’t required to designate an ESP in the first place[i] – so there is no way, without rewriting requirements, for there to be PCAs. This means that, in the case where there is a routable network at the Low asset that contains at least once BCS, according to the requirement only the BCS on that network need to be protected, not any other devices.

I don’t know whether or not I had even thought about this problem previously, but I know that if I had, I wouldn’t have thought about it for long - simply because the answer would have seemed so obvious to me. That answer is that entities will normally use a firewall to protect the whole network that contains a Low BCS, so the “PCAs” will be protected anyway, even if that isn’t required. And if the entity wouldn’t use a firewall, they would use another device like a data diode, or a procedure like putting all BCS on a separate, air-gapped network that doesn’t have an external routable connection. The idea is that these two use cases – and most of the others found in the concept diagrams starting on page 36 in the Guidance and Technical Basis for CIP-003-7 – protect the whole network, just like a firewall would. If any of these use cases is in place, every device on the network is protected, whether or not it is a component of a BES Cyber System.

However, my friend pointed out to me that there were at least two cases in which an entity could comply with the Low impact Electronic Access Control requirement in CIP-003-7, yet still only protect the BCS. The first of these is described in the first concept diagram: the case in which host-based firewall technology is used to protect just the BES Cyber System(s) but nothing else on the network. The second is the case in which a network firewall is used, but access is restricted just for the BCS on the network; access to other devices is left unrestricted (or perhaps under-restricted, if there are huge port ranges open that aren’t justified).

So the “hole” in CIP-003-7 is that it’s possible to be perfectly compliant with the Low impact electronic access control requirement, yet still leave the Low impact BCS effectively unprotected, because the other Cyber Assets on the network are completely unprotected against external threats. Is it likely that many NERC entities will take advantage of this hole in order to reduce the work they need to perform at Low impact assets? I doubt it, but I’ve been surprised before.

Will the hole be “patched” in the future? My guess is not, since the time to do it would have been in April, when FERC issued Order 843. Assuming they knew about the problem, the fact that FERC didn’t order NERC to patch the hole is probably due to one or more of these three reasons:

They knew that patching the hole would almost certainly mean doing away with the provision in CIP-002 and CIP-003 that an inventory of Low impact BCS isn’t required (it would be pretty hard to show that all the “PCAs” on the network had been protected if you didn’t know which devices were components of a BCS and which weren’t) – and this would provoke a huge fight.
They knew that the changes in CIP-003-7 had been very controversial among the NERC entities, and they wanted to avoid starting another such battle on the heels of that one.
They knew that the electronic access control requirement in CIP-003-6 would have avoided this problem, since it required a LEAP (low impact electronic access point, typically part of a network firewall) whenever there is LERC (low impact external routable connectivity). In requiring the hole be fixed in CIP-003-7, they would effectively have been saying “You know, we’ve decided that CIP-003-6 wasn’t quite as bad as we thought it was. We’d like you to bring back the LERC/LEAP idea, while still fixing the problem we asked you to fix in the first place in Order 822 – namely, the fact that the word ‘direct’ in the definition of LERC was unclear. Of course, we know that will require just as much – if not more – work than was required for CIP-003-7. But don’t worry – it’s all for the cause of BES security. Have a nice day.” This would probably have prompted most of the CIP Modifications drafting team members – who collectively spent many man-years drafting the new requirement and shepherding it through the perilous balloting and approval process – to commit group suicide.

So I doubt very much this hole in CIP-003-7 will ever be filled. What will save the grid from collapsing due to a massive cyberattack on Low impact assets? Owners of Low impact assets will simply have to – using the title of Spike Lee’s movie masterpiece – “Do the Right Thing” and put in a network firewall (or other protection that applies to the whole network), even though it’s not required. But this leads to an interesting question: If NERC CIP is so brittle that a gaping hole like this one can’t be fixed other than simply crossing our fingers and hoping for the best, does this mean there are fundamental problems with the whole NERC CIP compliance regime?

Yes, it does.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post. And if you’re a vendor to the power industry, TALLC can help you in various ways, including developing marketing materials, delivering webinars, etc. To discuss this, you can email me at the same address.

[i] And the reason they’re not required to designate an ESP is because this would lead to an implicit requirement to develop an inventory of Low impact BES Cyber Systems, which is explicitly ruled out by language in CIP-002 and CIP-003.

Friday, June 1, 2018

Are you High?

My longtime friend Trey Cross emailed me today about something that was mentioned in NERC’s weekly Standards, Compliance and Enforcement bulletin: the initial performance date for four CIP requirement parts is July 1, 2018. This means that, by that date:

At High impact Control Centers, recovery plans need to be tested with an operational test. Per CIP-009 R2.3, this needs to be done every 36 months.
At High impact Control Centers, there needs to be an active vulnerability assessment. Per CIP-010 R3.2, R3.2.1 and R3.2.2, this also needs to be done every 36 months.

I verified this by looking at NERC’s spreadsheet for CIP v5 effective dates (available here). Of course, the requirements in question became effective on July 1, 2016, along with the other CIP v5 requirements. But does this mean the entity has until July 1, 2019 to perform these things for the first time? No, it doesn’t.

When CIP version 1 was implemented, most entities assumed that the clock would start running on periodic requirements (like these) on the effective date of the requirement, yet some regions required that the vulnerability assessment be performed before the effective date. Since the v1 standards never said anything about initial performance dates, I doubt that any entities were give violations for not finishing their SVAs on time, but after that snafu the drafting teams always made sure to specify the “initial performance dates for periodic requirements”. Of course, this was done in the case of CIP v5 and v6, so here we are.

I would think that almost all High Control Centers would have done this, but if not…hey, you didn’t have anything else to do on weekends in June, did you?

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or challenges like what is discussed in this post. And if you’re a security vendor to the power industry, TALLC can help you by developing marketing materials, delivering webinars, etc. To discuss this, you can email me at the same address.