Sunday, August 1, 2021

Tim Roxey tells us what the real problem is. Now we have to interpret what he says.


Last Monday, I wrote a post about the comments that Kevin Perry, former Chief CIP Auditor of SPP Regional Entity, made on this post, which discussed why it actually made sense for Colonial Pipeline to shut down their operations, due to a loss of their billing system in a ransomware attack.

The whole point of that post, as well as the previous posts I’d written on Colonial (which BTW you can find by searching on the main page of my blog. Until last summer, there was no way to search the blog, making it hard for me to find previous references to a subject, and close to impossible for readers to. I was quite glad when search was added) – starting with this one – was that a “purely IT” incident can affect the OT network, even if there’s no direct connection.

In the case of Colonial, the loss of their billing system meant that they couldn’t track who put how much gasoline into their pipeline and when, and who withdrew how much and when. For an electric utility, the loss of this capability wouldn’t require shutting down power transmission and distribution, since the utility can always bill for power used later (i.e. the meters will keep operating); and if the utility can’t bill later for some reason, they still need to provide power, because they’re…well, a utility.

But Colonial doesn’t own the gasoline in their pipeline; they’re transporting it, just as a mover transports your household goods to a new city. If  the mover loses your goods on the way, they’re on the hook for the entire value of those goods. By the same token, if Colonial keeps shipping gasoline while their billing system is down, they’ll literally lose track of what any one shipper has put into the pipeline, and will end up owing every shipper the entire value of their gasoline.

In last Monday’s post, I started by saying there were three questions that needed to be answered:

1.      How can we identify systems that don’t directly control operations, yet can have a huge impact on operations just the same (i.e., IT systems that perform functions required for operations)? And when we’ve identified them, what measures can we take to protect them better than other systems on the IT network that clearly have no direct operational impact, like say the systems that run the utility’s retirement plan?

2.      Should those systems be regulated by OT-focused cybersecurity compliance regimes, such as the dreaded…(and here I cross myself, despite not being Catholic)…NERC CIP?

3.      Or maybe we need to go beyond all this talk about regulation and protecting systems, and think about what the real problem might be?

To summarize what I think Kevin said in that post, he answered the first question by in effect saying, “Any system on the IT network whose loss or misuse can impact operations, like Colonial’s billing system, should be protected like OT systems are, including being isolated from other IT systems.”

Kevin answered the second question by in effect saying, “Any system whose loss or misuse can affect Bulk Electric System operations within 15 minutes (essentially, the BES Cyber Asset definition) should be classified as a BES Cyber System (BCS) and placed within the Electronic Security Perimeter (if the asset at which it’s installed is classified as Medium or High impact).”

An example he gave of this is a mistake he saw more than once in his ten-year NERC CIP auditing career: a NERC entity didn’t classify their historian as a BCS and installed it in the IT network, not the ESP. However, in the cases Kevin discusses, the historian was used for real-time monitoring purposes, and therefore should have been classified as a BCS. So it should have been installed in the ESP to begin with.

This is stretching what Kevin said a little, but one might draw the implication that, if a system’s loss or misuse doesn’t directly impact the process being controlled (which, in the case of an electric utility subject to the NERC CIP standards, is the smooth and uninterrupted operation of the BES. In the case of Colonial Pipeline, it’s the smooth and uninterrupted transport of natural gas in their pipeline system), then a) it’s OK to install it on the IT network, and b) it doesn’t need to be subject to special regulation, beyond a general obligation to follow good cybersecurity practices.

However, there are two cases I can identify in which the shutdown of the IT network directly required shutting down OT, even though there were no systems on the IT network that directly impacted the process being controlled by OT. One case is from 2018, when a serious ransomware attack on a very large electric utility’s IT network required shutting down the control centers as well – even though the ransomware never spread there.

The other case was cited by Tim Conway of SANS in a webinar earlier this year (which was quoted in Utility Dive). In 2017, the Not Petya malware (which was based on the Petya ransomware, except that not Petya didn’t even bother to save the encryption key after encrypting the victim’s systems – it simply threw the key away. The purpose of Not Petya was to cause havoc, pure and simple. And it did; about $10 billion worth of havoc – for which, naturally, Russia has never been held accountable. Do you notice a pattern here?) brought down the entire global operations of the Swedish shipping giant Maersk.

Tim pointed out in the webinar (reported in this post) that no operational systems like cranes were affected by the attack on Maersk’s IT network. However, because of the loss of its IT systems, Maersk no longer knew what was in the containers it was shipping – meaning it really couldn’t guarantee that a container shipped to Company A was actually picked up by the correct recipient, rather than somebody else. This is very close to the situation that Colonial Pipeline faced when they lost their billing system. In both cases, the company shut down operations (although in the case of Maersk, operations were down for two weeks, vs less than a week for Colonial. On the other hand, given the devastation that Maersk suffered, the fact that it only took them two weeks to get up and running again isn’t much short of a miracle).

In other words, these two cases show us that the security of the IT network can be essential to the correct operation of the OT network, and – at least in the case of a complete loss of the IT network, as happened with Maersk and the utility in the 2018 incident – some IT incidents can require shutting OT down, even when there’s no particular system on the IT network whose loss requires the OT shutdown (as was the case with Colonial).

So we’re fooling ourselves if we think that our OT network is protected from all disturbances on the IT network, even though we may have made it impossible for an attacker to penetrate the OT network from IT – just like the French were fooling themselves when they built the Maginot Line after World War I, to prevent another German invasion – even though there was no way the Germans could have crossed the line to enter France. And this is just as true with Electronic Security Perimeters. True, CIP-005 R1 and R2 provide formidable protections against an “invasion” that comes through the IT network. But they don’t protect against all compromises, especially ones that magically bypass the ESP, like in the 2018 ransomware case.

So is the solution to apply the full NERC CIP requirements to IT systems, as well as OT systems? God forbid! I wouldn’t wish the current NERC CIP requirements – in all their prescriptive glory – on my worst enemy. However, if and when the NERC CIP standards are rewritten as risk-based, and when there are important changes made to NERC’s CIP compliance regime (as I discussed in this webinar in 2019), then it will be possible to regulate both IT and OT systems, but in different ways, commensurate with the risks posed by both types of systems.

To go back to my three original questions, Kevin and I answered the first two. But what about the third? That is, instead of just talking about regulating and protecting IT vs OT systems, maybe we need to think beyond that silo? What’s the real problem we need to address?

Fortunately, there’s someone who thinks about what the real problems are: Tim Roxey, who has appeared in this blog before. He replied to the same post that Kevin did, saying (in the inimitable English dialect known as Roxey-speak):

I was in Whole Foods couple of weeks ago. Heavy storms moving in but I was in underground parking. 

 

I’m pushing about my cart when an announcement comes over the speakers. Please all shoppers stop shopping. We have lost our cash registers due to lightening in the areas. 

 

Me thinks. I have cash. I’m good. 

 

Me thinks wrongly. Somehow the Point Of Sale device can’t process the sales in cash cuz the credit side is down. 

 

Harumph. No, it was the people and a branch  point in their processing that broke. 

 

We are so dependent on our “usual processes” that we fail to see the alternatives. 

 

Colonial failed as well. 

 

If you are CIKR then this is Wrong. Be CIKR AND operate as such. 

This was of course quite interesting, but it wasn’t…how can I say this?...definitive. So I wrote back to Tim and asked him two questions: “Do you think some sort of regulation of these systems is necessary? Or are you saying that changing the utility’s (or pipeline company’s) whole modus operandi is required to fix these problems?”

Tim replied:

Actually if we look at this differently, we see opportunity. 

Apply regulations that address People, Processes, and technology. Stop concerning ourselves with IT/OT as the technology of applicability.  If you can have the People pull the plug because their Processes (Recovery) or Technology (IT bleeding into OT) has led to a condition of uncertainty (The function of CEO is RISK) then the regulations were not so much fantastic. 

The regs in Colonial Pipeline simply do not exist. Their Issue was IT not OT and hence most NERC Regs would not apply even if they existed in TSA world. 

Requiring Baseline Regulations that hit all three factors;

  • the People that operate inside
  • Processes that control CI Functions that employ
  • Technology to perform the Critical Infrastructure functions (National Security Functions)

Good Regulations address all three.  

Bottom line – Regulations tend towards baselines. Centers of excellence (Think INPO) tend towards Ceilings of excellent performance (best practices). Ceilings tend to include a better, more mature understanding of Risk. Not just the usual Vulnerabilities, Threats and Consequences stuff but also internal risks of how the People and Processes Parts and Technology parts interact. The People being unduly influenced by their knowledge of the processes (or lack thereof ) and the misunderstandings of the technology (IT really can touch OT) leads to enough uncertainty that conservative calls to pay Ransom are made.

As with all oracular statements (i.e. statements that a true oracle makes. And no, that’s not Larry Ellison), these are subject to many interpretations. I’ve reproduced Tim’s exact words (with a couple minor grammar corrections), so that each of us can draw our own interpretation from them. Here’s mine:

·        You’re missing the boat if you focus all of your attention on the question of IT vs. OT. That’s not the issue.

·        The real issue – for both cyber regulations and best practices – is people, processes, and technologies. Get those right, and you won’t have to worry about IT vs. OT.

·        Don’t just pay attention to PPT in three silos, but look at how people, processes and technologies actually interact – as in the case of Whole Foods, where a needless dependence of cash payment systems on credit card payment systems made it impossible for this Whole Foods store to sell anything at all.

·        And just as important, make sure that people understand how the processes and technologies actually work, since for example a belief that OT exists safe behind its Maginot Line defenses can lead to a pretty rude awakening, just like in France in 1940.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

No comments:

Post a Comment