Tom Alrich's Blog: What does Colonial Pipeline mean for NERC CIP?

Sometimes it seems that all questions regarding OT security ultimately come down to questions about NERC CIP. Of course, this is because NERC CIP is the oldest set of cyber regulations – outside of the nuclear and military domains – that directly addresses OT, and is still one of only a handful of cyber regulations that focuses solely on OT.

So far, I’ve tried to focus on the general OT security implications of the Colonial Pipeline ransomware incident, but the fact is that there are some very important implications for CIP in the incident. This week, I received an email from someone who’s been working in CIP for quite a while, but who wishes to remain unidentified. This person asked what CIP changes would be needed, based on the lessons learned from the Colonial attack. I think it’s time that we discussed this.

The most important part of the Colonial Pipeline attack, from a CIP perspective, is that the ransomware was – according to Colonial – confined to the IT network. But even then, the attack ended up shutting down Colonial’s entire pipeline system. And according to news reports, it seems that the vehicle for this happening was that Colonial’s customer billing system (which was on their IT network, of course) had to shut down; this somehow entailed the shutdown of the OT network.

Two questions inevitably arise in the mind of anyone (such as me) whose mind has been forever warped by spending too many hours pondering recondite NERC CIP issues:

1. If the pipeline industry had been subject to compliance with the NERC CIP standards, would this incident even have happened?

2. If something like this did happen in an electric utility – namely, that the utility had to shut down some or all of their transmission and/or distribution systems due to loss of a system on their IT network – would it have been because of a CIP violation, or could this have happened even in a utility that was completely CIP compliant? If the answer to the latter question is Yes, this implies that the CIP standards can’t prevent a successful cyberattack on the IT network from ever shutting down power operations. In turn, this would imply the CIP standards need to be expanded in some way to cover IT systems, as well as OT ones.

I brought this issue up (really for the second or third time, but for the first time as the subject of a whole post) in July, when I reported on a webinar I’d attended the day before. In that webinar, my longtime friend Jodi Jensen of WAPA had wondered how the loss of Colonial’s billing system could possibly have shut down pipeline operations. Why couldn’t operations have continued, on the understanding that Colonial would generate all outstanding bills once the billing system was back online?

Why was Jodi asking this question? I think it’s because NERC CIP (as well as good SCADA security practice, which Jodi knows a lot about) requires that every system whose loss or misuse could “affect” the Bulk Electric System in some way be included in the OT network, which is protected by an Electronic Security Perimeter – and at least in the power industry, the ESP protections (for assets that are classified as Medium or High impact) are quite strong; I have never heard of an OT compromise that began with a frontal assault on the ESP (which isn’t to say it’s never happened, of course) – although there have definitely been OT compromises that didn’t start with a breach of the “wall” between IT and OT. In fact, I cited two of those cases in my most recent post on the Colonial issue.

And how does an electric utility decide whether or not a system’s loss or misuse might affect the BES? It’s by running the systems through a bizarre methodology for classifying systems in operation at a NERC entity, which – if I didn’t know better, because I attended a number of the drafting team meetings when it was designed, and I engaged in not a few arguments with the then-chairman of the drafting team about this issue – I would guess was designed by a team consisting of Jack the Ripper, Suleiman the Magnificent and Rube Goldberg; this “system” is known commonly as CIP-002 R1 and Attachment 1.

The keystone of this classification system is the definition of BES Cyber System (well, really BES Cyber Asset, but don’t pay attention to details if you want to understand CIP-002 R1. You’ll go mad). I won’t bore you with the full definition of BCS, but its two main points are:

1. If the system were “rendered unavailable, degraded, or misused”, it would “affect the reliable operation of the Bulk Electric System”.

2. That effect needs to occur within 15 minutes. That is, a system isn’t a BCS if its loss or misuse won’t usually result in a BES impact within 15 minutes. Of course, when you’re dealing with electric power, if there’s any impact at all it’s usually going to occur within a second or two. The massive power outage in Florida in 2008 was detected literally within a couple seconds in Alberta.

Let’s say the Colonial Pipeline billing system were magically transformed to do electric power billing (although the two types of billing are very different) and installed at a large electric utility. Would it be classified as a BES Cyber System and thus be required to be installed within the ESP (i.e. on the OT network)? Or would it be just another IT system like payroll? Of course, if it were actually a BCS (again, at a Medium or High impact facility), the utility would be in a lot of trouble if a subsequent audit discovered it was on the IT network.

But how could a utility’s billing system (not the metering system needed to produce the data for bills. Metering is always part of the OT network, and of course metering is found everywhere power is distributed, including at your home) have a 15-minute on the BES? Ultimately, if an electric utility’s billing system is unavailable for an extended period of time, there might be some BES impact, but it certainly wouldn’t be within 15 minutes.

So the billing system wouldn’t be a BCS in a NERC CIP environment. This means it would be totally exempt from all CIP requirements. But does this mean the billing system doesn’t pose any risk at all to the BES? Does it really deserve to get off Scot free from CIP, even if it shouldn’t be subject to the same set of requirements as for example the utility’s Energy Management System, which has an undeniable 15-minute BES impact?

Rather than deal with a hypothetical system, let’s look at a real one that is found in literally every electric utility (in fact, it’s found in just about every industrial facility worldwide): the historian. This is a system that records what goes on inside the utility’s power network, so if there’s some sort of adverse event, the utility can go back through the data to trace what actually happened. Since the BES can run quite nicely, thank you, when the historian isn’t fulfilling this purpose, it’s normally not classified as a BCS.

But in some cases – as Kevin Perry pointed out in this post – the historian is used to provide a real-time view of what’s going on in the utility’s operations, meaning that the operators in the Control Center might make a split-second decision based on something the historian is telling them. Ever the auditor, Kevin correctly pointed out that, if the historian is used for that purpose, it is a BCS and needs to be installed inside the ESP, not on the IT network.

In the same post, Kevin described one case where he decided during an audit that a historian really was a BES Cyber System (actually a Critical Cyber Asset, the somewhat equivalent term used in CIP versions 1-4). He didn’t say whether that utility was fined or not (or whether they even received a violation), but at the least they had a very bad day when they found out they had to do all the re-engineering required to move their historian from the IT network into the ESP.

But what if the historian doesn’t perform any real-time monitoring function, so it’s correctly classified as not a BCS? Does that mean that its loss or compromise doesn’t affect the BES at all? Of course not. For example, if the historian malfunctioned (due to a cyberattack) at the same time a serious event occurred on the utility’s power network, the utility might not be able to determine the cause of the event, say a relay misconfiguration in a substation. Since the cause wasn’t detected, the misconfiguration might remain in place and cause another serious event. That’s a BES impact, although it’s not within 15 minutes.

So at least with the historian, we have an example of a system whose loss or misuse will usually impact the BES, but which doesn’t have a 15-minute impact (Kevin mentions ICCP servers as another example). This means the historian isn’t a BCS and isn’t in scope for NERC CIP at all. But is that right? Shouldn’t systems whose loss or misuse can impact the BES – even though the impact won’t always or even usually be within 15 minutes – be in some way in scope for CIP?

In the same post referred to a couple times above, I quoted from an email Kevin sent me, in which he said, “I would argue that any ‘IT’ system or system component that is essential to (sustaining OT operations) needs to be considered OT and kept isolated from the rest of the IT world.” In other words, Kevin is suggesting that systems whose loss or misuse can affect the BES, but not within 15 minutes, should be included in the OT network, not the IT network.

Does this mean that such a system should be subject to the same requirements as an actual BCS? One might initially be inclined to argue “No, such systems should be required to be on the OT network – that is, within the ESP – but they shouldn’t have to comply with all of the requirements that BCS have to comply with.”

But this ignores an important security fact and an important CIP fact. The security fact is that, if a system is connected to an IP network and it gets compromised, it can be used as a launching point for attacks on other systems connected to the same network, which might have higher intrinsic value than the system that was first compromised. The CIP fact is that every system that is on the same network as a BCS needs to be declared to be a Protected Cyber Asset; and PCAs are subject to almost exactly the same set of requirements as BCS are.

So if we want to have a smaller set of CIP requirements that applies to these “intermediate systems” (i.e. systems whose loss will affect the BES, but not in 15 minutes. Of course, this is my name for them, and note that if Intermediate System is capitalized, it has a different meaning in CIP – so my name would never be usable in practice), we would probably need to put them on another network, separate from the IT and ESP networks and protected from both. At that point, we could apply a different set of CIP requirements just to these systems.

But which of the current CIP requirements should apply to the “intermediate systems”? I don’t know about your answer, but here’s mine: None of them. That is, I don’t want the current CIP standards extended any further than they already are. I totally agree that these intermediate systems do need to be included in a general OT cybersecurity compliance regime, and I’m fine if that regime is still called NERC CIP, but it needs to be a completely risk-based system.

I outlined the “new CIP” system I’d like to see in an article for a British publication in 2019 (this isn’t available online, but if you email me, I can send you a PDF of it). I also did a webinar on this topic for that same publication in 2018. Note my views have changed somewhat since then, but the general framework I laid out is still what I believe is needed.

And I’m not saying that what I’m proposing is the only way to do this. In fact, the current CIP Modifications drafting team proposed another idea in 2018, which I really liked. I wrote three posts on it: no. 1, no. 2, and no. 3. What happened to that idea? That’s a sad story: Briefly, the SDT was proposing some radical changes to CIP. These would have required NERC entities to revise a lot of their CIP compliance documents, as well as follow some new procedures. And it seems too many NERC entities simply ruled this out as impossible.

The result of this? The current CIP compliance regime (i.e. the standards and how they’re interpreted and enforced) is preventing CIP from being extended to intermediate systems, as well as to the cloud. The latter is especially unfortunate, since, even though I’m sure there are a few hardy souls who have actually outsourced Medium impact, and maybe even High impact, BCS to the cloud (e.g. in outsourced SCADA), they’re only doing this because they’re willing to live with the constant fear that an auditor will throw the book at them and make them shut down all BCS in the cloud, forcing these entities to reproduce their BCS in the nice, safe confines of a Control Center or substation that you can actually point to and walk into.

Folks, if you want to be able to put Medium or High impact BCS in the cloud and feel safe doing so (as well as other improvements, like much faster response to new threats and a much more efficient allocation of compliance resources, as well as of course incorporation of the intermediate systems we’ve just discussed), you’re going to have to accept that the CIP standards have to change. This means you’ll have to change your current procedures and documentation, period. Which do you want? BCS in the cloud or your current CIP procedures? You can’t have both.

Is it time to review your CIP-013 R1 plan? Remember, you can change it at any time, as long as you document why you did that. If you would like me to give you suggestions on how the plan could be improved, please email me.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

3 comments:

Tom AlrichAugust 9, 2021 at 12:15 PM
Tim Kucey of PSEG (formerly with NERC) pointed out in an email that my statement that the 2008 Florida incident was felt almost immediately in Alberta can't be true, since AB isn't part of the Eastern Interconnect. He suggested I must have been thinking of Manitoba or Saskatchewan. He's completely correct. It's been a while since I heard that factoid. Some bit evidently got switched off or on (when it was either on or off originally) in my memory. Thanks for the correction, Tim!
UnknownAugust 11, 2021 at 1:19 PM
I have concerns that there are so many different authorities that utilities and/or pipeline operators are regulated by, each handing down cyber security mandates without consulting or coordination with existing mandates. I guess it is reasonable to expect some amount of knee-jerk response, and as the dust settles and authorities mature there will be some level of coordination and orchestration in the subsequent mandates we receive.
Tom AlrichAugust 11, 2021 at 9:00 PM
I'm not sure what all these regulators are, Unknown. The only cyber regulator in the power industry that I know of is NERC. And the only cyber regulator in the pipeline industry that I know of is the TSA.

Tom Alrich's Blog

Sunday, August 8, 2021

What does Colonial Pipeline mean for NERC CIP?

3 comments:

Get new posts by email: