Monday, July 26, 2021

Two wise men weigh in on Colonial’s billing system


My post on the billing system at Colonial Pipeline brought out great comments from two wise men of the power industry cybersecurity world: Kevin Perry and Tim Roxey. As you’ll see, they didn’t say the same thing at all, but they didn’t contradict each other, either. Rather, Tim’s comments built on Kevin’s.

Here’s a quick summary of my previous post, although I hope you’ll read it if you haven’t yet:

·        Even though the ransomware attack never reached Colonial’s OT network, it did bring down their billing system.

·        And even though it might seem odd that the loss of the billing system could bring down pipeline operations, there were actually good reasons for why that happened (which I’ll let you read).

·        I concluded by pointing out that “Tom’s First Law of OT Networks says that an ‘operations-focused’ company – as opposed to an information-focused company like an insurance company or a consulting firm – will be forced to bring their OT network down if their IT network falls victim to a ransomware attack.”

I stand by what I said, but Kevin’s and Tim’s email comments made me realize that I hadn’t asked the more interesting questions:

1.      How can we identify systems that don’t directly control operations, yet can have a huge impact on operations just the same (i.e, IT systems that perform functions required for operations)? And when we’ve identified them, what measures can we take to protect them better than other systems on the IT network that clearly have no direct operational impact, like say the systems that run the utility’s charitable operations?

2.      Should those systems be regulated by OT-focused cybersecurity compliance regimes, such as the dreaded…(and here I cross myself, despite not being Catholic)…NERC CIP?

3.      Or maybe we need to go beyond all this talk about regulation and protecting systems, and think about what the real problem might be?

Briefly, Kevin addressed questions 1 and 2; Tim took question 3 (not that I even thought of these questions until now, of course). I’ll start with what Kevin said, and cover what Tim said in my next post.

On Thursday, Kevin wrote this to me:

I would argue that any “IT” system, or system component that is essential to keeping to OT operational needs to be considered OT and kept isolated from the rest of the IT world.  As you noted, electric metering, whether at the customer point of delivery or in a tie substation, is OT.  The data from the meters are fed into the IT billing systems.  If the billing systems are down, bills will be delayed, but the meter data collection will continue until it can be transferred to the billing systems.  It is inexcusable that the OT must be shut down because an essential IT system is down.

Here are the points that I infer Kevin is making:

1.      This problem wouldn’t have happened in the electric power industry, since an electric utility's operations (including metering) can continue, even when the bills can’t be generated (no pun intended).

2.      The billing system is “essential to operations” in the pipeline industry (or at least in Colonial’s case), although not in the electric power industry (meaning it isn’t a BES Cyber System, or BCS).

3.      If there were a cyber regulatory regime like NERC CIP in place in the pipeline industry, the billing system would need to be considered the equivalent of a BCS.

4.      Regulation or no, the pipeline industry should protect their billing systems using at least some of the same measures (including isolation) used to protect OT systems.

I responded to Kevin’s email with the question, “If you think certain IT systems should be isolated, would you favor an expansion of the CIP standards to require network isolation, as well as perhaps some (although not necessarily all) of the other CIP requirements?”

I want to make one point here: CIP already covers a large group of systems that many electric utilities consider to be part of IT, not OT. Those are systems located in Control Centers. While these systems certainly perform an OT (and in many cases BES) function, they aren’t Industrial Control Systems, since they’re implemented on standard Intel-based hardware and run standard IT operating systems: Windows™ and Linux. A lot of the management that needs to be done on them is the same as what needs to be done for say financial systems.

And interestingly enough, Control Centers aren’t included in NERC’s 80-page “definition of the BES”. That definition requires an asset to be connected to the grid at 100kV or higher. The only reason systems in Control Centers are even included in CIP is because Control Centers are specifically called out in CIP-002 R1.1. So it wouldn’t be unprecedented if other “IT systems” were in scope for CIP, although CIP-002 would have to be amended for that to happen.

Kevin (a member of the NERC teams that drafted Urgent Action 1200, the CIP predecessor, as well as CIP versions 1 and 2, and who was then Chief CIP Auditor for the SPP Regional Entity for about ten years, until his retirement in 2018) replied to my email by saying:

A proper CIP-002 assessment of all Cyber Assets linked to the proper functioning of the readily identifiable OT should be sufficient.  In the early days, some entities tried to move systems out of scope simply by moving them out of the ESP (Electronic Security Perimeter).  My team always took a hard look at the historians that were outside the ESP and also their map board display systems.  Most entities simply used their historians for temporal data storage and non-real time engineering analysis, and keeping them out of scope was OK.  

But I am also aware of at least one entity that used their historian to drive their map board displays and also used the historian data for real-time decision making.  Their historians were Critical Cyber Assets (now BCS) because they were used for real-time operations.  At least one entity had map board displays that were not readily available on the dispatcher console, thus the map board also became a CCA/BCS.  And my team did not stop with systems used for the entity’s real-time operations.  An entity who declared their ICCP servers out of scope because they were not using the outbound data (destined for their RC or another BA or TOP) themselves found their decision frowned upon.  Even though they might not be receiving real-time data from a remote association, they were supplying real-time data essential to the recipient(s).  When they argued to the contrary, my team referred them to the TOP and IRO standards that compelled them to send what was initially known as “Appendix 4B” data.

 

So, apply the same logic to the billing system and you will see the meter data collection subsystem is absolutely a BCS if its failure causes you to shut down your OT (SCADA/EMS) systems.  The part of the billing system that sends the invoices and payments is not.  Processing invoices and payments can wait until you get that system back up.

Here is what I take away from what Kevin says that he doesn’t favor expanding the CIP requirements to include systems located on the IT network because, if a system on the IT network meets the definition of BES Cyber System (which the different examples he used all do, even though the entities that operate them hadn’t classified them as such), it must be treated as a BCS, including being located within the ESP (i.e. the OT network). Of course, this only applies at Medium and High impact BES assets. Low impact assets aren’t required to have ESPs.

So a system like the pipeline billing system – if it existed in the electric power world – would need to be treated as a BES Cyber System, subject to all the privileges (?) attendant on that august designation.

I then asked Kevin whether he thinks utilities should designate their meter data collection systems as BCS. His answer was nuanced, yet at the same time quite clear:

Inconsistent.  The meter data loss does not impact reliability within 15 minutes (Tom’s note: The definition of BES Cyber Asset/BES Cyber System requires that the loss or misuse of the system would have an impact on the Bulk Electric System within 15 minutes. If it has an impact but it will usually take longer than that to happen, it’s not a BCS).  But it also does not cause the utility to shut down the grid.  Loss of telemetry does not stop the revenue-quality meter from collecting data.  Loss of the meter itself does not stop the flow of electricity.  There are procedures for dealing with an occasional failure, including redundancy and inter-utility meter data reconciliation.

If the meter is only a revenue meter, then it does not need to be a BCS.  If the meter also reports real-time flows and/or voltage, then it is a BCS.  That is what I meant by inconsistent.

So Kevin is saying that, given the current NERC CIP requirements, there are only two choices: The meter data collection system is a BCS or it’s not. If it’s a BCS, it doesn’t get any break from any other BCS, in terms of the number or types of requirements that apply to it. If it’s not a BCS, it’s completely out of scope for CIP.

But there are certainly cases where a lack of good security on the IT network can result in an outage of the OT network. I described a dramatic example of that in this post, where a ransomware attack that shut down the IT network but didn’t touch the OT network (as in the case of Colonial), in the end resulted in two large Control Centers being completely shut down for up to 24 hours, with the grid in a multistate area being run by cell phone.

It’s safe to say that none of the systems on the IT network of this utility met the definition of BCS, so there was no single system that led to the Control Centers being brought down – yet they were brought down anyway. This seems to me to point to the need for CIP to be extended in some way to cover IT assets – perhaps as some sort of “halfway house” asset. But there’s no way that the current CIP standards should be extended to cover anything else. They first need to be completely rewritten as risk-based. Then we can look at extending them to IT, based on the relative risk levels of OT vs. IT.

I’ll turn to Tim Roxey’s comments in my next post. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

No comments:

Post a Comment