Sunday, August 1, 2021

Tim Roxey tells us what the real problem is. Now we have to interpret what he says.


Last Monday, I wrote a post about the comments that Kevin Perry, former Chief CIP Auditor of SPP Regional Entity, made on this post, which discussed why it actually made sense for Colonial Pipeline to shut down their operations, due to a loss of their billing system in a ransomware attack.

The whole point of that post, as well as the previous posts I’d written on Colonial (which BTW you can find by searching on the main page of my blog. Until last summer, there was no way to search the blog, making it hard for me to find previous references to a subject, and close to impossible for readers to. I was quite glad when search was added) – starting with this one – was that a “purely IT” incident can affect the OT network, even if there’s no direct connection.

In the case of Colonial, the loss of their billing system meant that they couldn’t track who put how much gasoline into their pipeline and when, and who withdrew how much and when. For an electric utility, the loss of this capability wouldn’t require shutting down power transmission and distribution, since the utility can always bill for power used later (i.e. the meters will keep operating); and if the utility can’t bill later for some reason, they still need to provide power, because they’re…well, a utility.

But Colonial doesn’t own the gasoline in their pipeline; they’re transporting it, just as a mover transports your household goods to a new city. If  the mover loses your goods on the way, they’re on the hook for the entire value of those goods. By the same token, if Colonial keeps shipping gasoline while their billing system is down, they’ll literally lose track of what any one shipper has put into the pipeline, and will end up owing every shipper the entire value of their gasoline.

In last Monday’s post, I started by saying there were three questions that needed to be answered:

1.      How can we identify systems that don’t directly control operations, yet can have a huge impact on operations just the same (i.e., IT systems that perform functions required for operations)? And when we’ve identified them, what measures can we take to protect them better than other systems on the IT network that clearly have no direct operational impact, like say the systems that run the utility’s retirement plan?

2.      Should those systems be regulated by OT-focused cybersecurity compliance regimes, such as the dreaded…(and here I cross myself, despite not being Catholic)…NERC CIP?

3.      Or maybe we need to go beyond all this talk about regulation and protecting systems, and think about what the real problem might be?

To summarize what I think Kevin said in that post, he answered the first question by in effect saying, “Any system on the IT network whose loss or misuse can impact operations, like Colonial’s billing system, should be protected like OT systems are, including being isolated from other IT systems.”

Kevin answered the second question by in effect saying, “Any system whose loss or misuse can affect Bulk Electric System operations within 15 minutes (essentially, the BES Cyber Asset definition) should be classified as a BES Cyber System (BCS) and placed within the Electronic Security Perimeter (if the asset at which it’s installed is classified as Medium or High impact).”

An example he gave of this is a mistake he saw more than once in his ten-year NERC CIP auditing career: a NERC entity didn’t classify their historian as a BCS and installed it in the IT network, not the ESP. However, in the cases Kevin discusses, the historian was used for real-time monitoring purposes, and therefore should have been classified as a BCS. So it should have been installed in the ESP to begin with.

This is stretching what Kevin said a little, but one might draw the implication that, if a system’s loss or misuse doesn’t directly impact the process being controlled (which, in the case of an electric utility subject to the NERC CIP standards, is the smooth and uninterrupted operation of the BES. In the case of Colonial Pipeline, it’s the smooth and uninterrupted transport of natural gas in their pipeline system), then a) it’s OK to install it on the IT network, and b) it doesn’t need to be subject to special regulation, beyond a general obligation to follow good cybersecurity practices.

However, there are two cases I can identify in which the shutdown of the IT network directly required shutting down OT, even though there were no systems on the IT network that directly impacted the process being controlled by OT. One case is from 2018, when a serious ransomware attack on a very large electric utility’s IT network required shutting down the control centers as well – even though the ransomware never spread there.

The other case was cited by Tim Conway of SANS in a webinar earlier this year (which was quoted in Utility Dive). In 2017, the Not Petya malware (which was based on the Petya ransomware, except that not Petya didn’t even bother to save the encryption key after encrypting the victim’s systems – it simply threw the key away. The purpose of Not Petya was to cause havoc, pure and simple. And it did; about $10 billion worth of havoc – for which, naturally, Russia has never been held accountable. Do you notice a pattern here?) brought down the entire global operations of the Swedish shipping giant Maersk.

Tim pointed out in the webinar (reported in this post) that no operational systems like cranes were affected by the attack on Maersk’s IT network. However, because of the loss of its IT systems, Maersk no longer knew what was in the containers it was shipping – meaning it really couldn’t guarantee that a container shipped to Company A was actually picked up by the correct recipient, rather than somebody else. This is very close to the situation that Colonial Pipeline faced when they lost their billing system. In both cases, the company shut down operations (although in the case of Maersk, operations were down for two weeks, vs less than a week for Colonial. On the other hand, given the devastation that Maersk suffered, the fact that it only took them two weeks to get up and running again isn’t much short of a miracle).

In other words, these two cases show us that the security of the IT network can be essential to the correct operation of the OT network, and – at least in the case of a complete loss of the IT network, as happened with Maersk and the utility in the 2018 incident – some IT incidents can require shutting OT down, even when there’s no particular system on the IT network whose loss requires the OT shutdown (as was the case with Colonial).

So we’re fooling ourselves if we think that our OT network is protected from all disturbances on the IT network, even though we may have made it impossible for an attacker to penetrate the OT network from IT – just like the French were fooling themselves when they built the Maginot Line after World War I, to prevent another German invasion – even though there was no way the Germans could have crossed the line to enter France. And this is just as true with Electronic Security Perimeters. True, CIP-005 R1 and R2 provide formidable protections against an “invasion” that comes through the IT network. But they don’t protect against all compromises, especially ones that magically bypass the ESP, like in the 2018 ransomware case.

So is the solution to apply the full NERC CIP requirements to IT systems, as well as OT systems? God forbid! I wouldn’t wish the current NERC CIP requirements – in all their prescriptive glory – on my worst enemy. However, if and when the NERC CIP standards are rewritten as risk-based, and when there are important changes made to NERC’s CIP compliance regime (as I discussed in this webinar in 2019), then it will be possible to regulate both IT and OT systems, but in different ways, commensurate with the risks posed by both types of systems.

To go back to my three original questions, Kevin and I answered the first two. But what about the third? That is, instead of just talking about regulating and protecting IT vs OT systems, maybe we need to think beyond that silo? What’s the real problem we need to address?

Fortunately, there’s someone who thinks about what the real problems are: Tim Roxey, who has appeared in this blog before. He replied to the same post that Kevin did, saying (in the inimitable English dialect known as Roxey-speak):

I was in Whole Foods couple of weeks ago. Heavy storms moving in but I was in underground parking. 

 

I’m pushing about my cart when an announcement comes over the speakers. Please all shoppers stop shopping. We have lost our cash registers due to lightening in the areas. 

 

Me thinks. I have cash. I’m good. 

 

Me thinks wrongly. Somehow the Point Of Sale device can’t process the sales in cash cuz the credit side is down. 

 

Harumph. No, it was the people and a branch  point in their processing that broke. 

 

We are so dependent on our “usual processes” that we fail to see the alternatives. 

 

Colonial failed as well. 

 

If you are CIKR then this is Wrong. Be CIKR AND operate as such. 

This was of course quite interesting, but it wasn’t…how can I say this?...definitive. So I wrote back to Tim and asked him two questions: “Do you think some sort of regulation of these systems is necessary? Or are you saying that changing the utility’s (or pipeline company’s) whole modus operandi is required to fix these problems?”

Tim replied:

Actually if we look at this differently, we see opportunity. 

Apply regulations that address People, Processes, and technology. Stop concerning ourselves with IT/OT as the technology of applicability.  If you can have the People pull the plug because their Processes (Recovery) or Technology (IT bleeding into OT) has led to a condition of uncertainty (The function of CEO is RISK) then the regulations were not so much fantastic. 

The regs in Colonial Pipeline simply do not exist. Their Issue was IT not OT and hence most NERC Regs would not apply even if they existed in TSA world. 

Requiring Baseline Regulations that hit all three factors;

  • the People that operate inside
  • Processes that control CI Functions that employ
  • Technology to perform the Critical Infrastructure functions (National Security Functions)

Good Regulations address all three.  

Bottom line – Regulations tend towards baselines. Centers of excellence (Think INPO) tend towards Ceilings of excellent performance (best practices). Ceilings tend to include a better, more mature understanding of Risk. Not just the usual Vulnerabilities, Threats and Consequences stuff but also internal risks of how the People and Processes Parts and Technology parts interact. The People being unduly influenced by their knowledge of the processes (or lack thereof ) and the misunderstandings of the technology (IT really can touch OT) leads to enough uncertainty that conservative calls to pay Ransom are made.

As with all oracular statements (i.e. statements that a true oracle makes. And no, that’s not Larry Ellison), these are subject to many interpretations. I’ve reproduced Tim’s exact words (with a couple minor grammar corrections), so that each of us can draw our own interpretation from them. Here’s mine:

·        You’re missing the boat if you focus all of your attention on the question of IT vs. OT. That’s not the issue.

·        The real issue – for both cyber regulations and best practices – is people, processes, and technologies. Get those right, and you won’t have to worry about IT vs. OT.

·        Don’t just pay attention to PPT in three silos, but look at how people, processes and technologies actually interact – as in the case of Whole Foods, where a needless dependence of cash payment systems on credit card payment systems made it impossible for this Whole Foods store to sell anything at all.

·        And just as important, make sure that people understand how the processes and technologies actually work, since for example a belief that OT exists safe behind its Maginot Line defenses can lead to a pretty rude awakening, just like in France in 1940.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Friday, July 30, 2021

Allan is moving!

Dr. Allan Friedman, who has been running the Software Component Transparency (SBOM) Initiative of the National Technology and Information Administration (NTIA) of the US Department of Commerce since its inception in 2018, announced recently that, as of later in August, he will move from NTIA to CISA (the Cybersecurity and Infrastructure Security Agency) in DHS.

When Allan announced this (and I’m sure he’s done it at least ten times in different meetings of the Initiative – including our Energy Proof of Concept meeting on Wednesday of this week), he has always immediately followed that by saying that he will still be completely involved in the work of the Initiative. But the Initiative will change, simply because CISA is a very different organization from NTIA.

Of course, it’s too early to know how it will change. Allan has promised (I believe him, too) that the whole group involved in the Initiative (there must be at least 200 people who attend at least one of the meetings in any given month, including from Europe and Japan) will meet with him in September to decide the way forward. This isn’t to say it will be a democratic process, but at least people will have their input.

Alan has pointed out many times over the past two weeks that the Initiative started from just about zero in 2018, and now has built up a substantial body of experience, knowledge and especially written guidance about SBOMs. This couldn’t have happened without the NTIA’s approach to launching a new technology (as they did with DNS in the 1980’s and 1990’s, and as they’re now doing with 5G).

To launch a new technology, NTIA doesn’t gather a bunch of wise people in a room (virtual or otherwise), who scratch their chins, offer profound thoughts, develop a very thoroughly-researched document describing in great detail all of the ins-and-outs and do’s-and-don’ts of the new technology, then go home and congratulate themselves on a job well done - whether or not anybody’s even looking at what they’ve written.

Rather, the NTIA gets the actual stakeholders together to figure out what’s needed for the new technology to succeed, and what’s the best way to get there; there are no preconditions, and all meetings and documents are completely public. In the case of SBOMs, a key tool is the industry-focused Proofs of Concept, of which there are currently three (healthcare, autos and energy). It’s possible the three PoCs will remain under NTIA’s auspices, simply because they’re working well and there’s no reason to mess with a good thing (the energy PoC is especially fortunate, since Idaho National Labs is providing support in many ways, including the web site and Ginger Wright, my very able co-leader in the effort). Of course, Allan will be able to participate in the meetings, no matter what agency they’re “under”.

So if everything was going so well, why is Allan making this switch? I believe (without having discussed it with him yet) that he looked at the number of cybersecurity professionals inside NTIA – a small number, certainly – vs. the number inside CISA (CISA had about 3400 employees last year, and I’m sure that number has already jumped a lot, especially as they keep getting more jobs added to their portfolio). And he saw that both he and the SBOM “movement” (cult?) can expand in all sorts of ways if they’re part of CISA, that they couldn’t even dream of under NTIA. There are some really huge possibilities, and Allan has just begun to explore them.

Good luck in the new gig, Allan! A new world is opening up for you and for SBOMs.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Monday, July 26, 2021

Two wise men weigh in on Colonial’s billing system


My post on the billing system at Colonial Pipeline brought out great comments from two wise men of the power industry cybersecurity world: Kevin Perry and Tim Roxey. As you’ll see, they didn’t say the same thing at all, but they didn’t contradict each other, either. Rather, Tim’s comments built on Kevin’s.

Here’s a quick summary of my previous post, although I hope you’ll read it if you haven’t yet:

·        Even though the ransomware attack never reached Colonial’s OT network, it did bring down their billing system.

·        And even though it might seem odd that the loss of the billing system could bring down pipeline operations, there were actually good reasons for why that happened (which I’ll let you read).

·        I concluded by pointing out that “Tom’s First Law of OT Networks says that an ‘operations-focused’ company – as opposed to an information-focused company like an insurance company or a consulting firm – will be forced to bring their OT network down if their IT network falls victim to a ransomware attack.”

I stand by what I said, but Kevin’s and Tim’s email comments made me realize that I hadn’t asked the more interesting questions:

1.      How can we identify systems that don’t directly control operations, yet can have a huge impact on operations just the same (i.e, IT systems that perform functions required for operations)? And when we’ve identified them, what measures can we take to protect them better than other systems on the IT network that clearly have no direct operational impact, like say the systems that run the utility’s charitable operations?

2.      Should those systems be regulated by OT-focused cybersecurity compliance regimes, such as the dreaded…(and here I cross myself, despite not being Catholic)…NERC CIP?

3.      Or maybe we need to go beyond all this talk about regulation and protecting systems, and think about what the real problem might be?

Briefly, Kevin addressed questions 1 and 2; Tim took question 3 (not that I even thought of these questions until now, of course). I’ll start with what Kevin said, and cover what Tim said in my next post.

On Thursday, Kevin wrote this to me:

I would argue that any “IT” system, or system component that is essential to keeping to OT operational needs to be considered OT and kept isolated from the rest of the IT world.  As you noted, electric metering, whether at the customer point of delivery or in a tie substation, is OT.  The data from the meters are fed into the IT billing systems.  If the billing systems are down, bills will be delayed, but the meter data collection will continue until it can be transferred to the billing systems.  It is inexcusable that the OT must be shut down because an essential IT system is down.

Here are the points that I infer Kevin is making:

1.      This problem wouldn’t have happened in the electric power industry, since an electric utility's operations (including metering) can continue, even when the bills can’t be generated (no pun intended).

2.      The billing system is “essential to operations” in the pipeline industry (or at least in Colonial’s case), although not in the electric power industry (meaning it isn’t a BES Cyber System, or BCS).

3.      If there were a cyber regulatory regime like NERC CIP in place in the pipeline industry, the billing system would need to be considered the equivalent of a BCS.

4.      Regulation or no, the pipeline industry should protect their billing systems using at least some of the same measures (including isolation) used to protect OT systems.

I responded to Kevin’s email with the question, “If you think certain IT systems should be isolated, would you favor an expansion of the CIP standards to require network isolation, as well as perhaps some (although not necessarily all) of the other CIP requirements?”

I want to make one point here: CIP already covers a large group of systems that many electric utilities consider to be part of IT, not OT. Those are systems located in Control Centers. While these systems certainly perform an OT (and in many cases BES) function, they aren’t Industrial Control Systems, since they’re implemented on standard Intel-based hardware and run standard IT operating systems: Windows™ and Linux. A lot of the management that needs to be done on them is the same as what needs to be done for say financial systems.

And interestingly enough, Control Centers aren’t included in NERC’s 80-page “definition of the BES”. That definition requires an asset to be connected to the grid at 100kV or higher. The only reason systems in Control Centers are even included in CIP is because Control Centers are specifically called out in CIP-002 R1.1. So it wouldn’t be unprecedented if other “IT systems” were in scope for CIP, although CIP-002 would have to be amended for that to happen.

Kevin (a member of the NERC teams that drafted Urgent Action 1200, the CIP predecessor, as well as CIP versions 1 and 2, and who was then Chief CIP Auditor for the SPP Regional Entity for about ten years, until his retirement in 2018) replied to my email by saying:

A proper CIP-002 assessment of all Cyber Assets linked to the proper functioning of the readily identifiable OT should be sufficient.  In the early days, some entities tried to move systems out of scope simply by moving them out of the ESP (Electronic Security Perimeter).  My team always took a hard look at the historians that were outside the ESP and also their map board display systems.  Most entities simply used their historians for temporal data storage and non-real time engineering analysis, and keeping them out of scope was OK.  

But I am also aware of at least one entity that used their historian to drive their map board displays and also used the historian data for real-time decision making.  Their historians were Critical Cyber Assets (now BCS) because they were used for real-time operations.  At least one entity had map board displays that were not readily available on the dispatcher console, thus the map board also became a CCA/BCS.  And my team did not stop with systems used for the entity’s real-time operations.  An entity who declared their ICCP servers out of scope because they were not using the outbound data (destined for their RC or another BA or TOP) themselves found their decision frowned upon.  Even though they might not be receiving real-time data from a remote association, they were supplying real-time data essential to the recipient(s).  When they argued to the contrary, my team referred them to the TOP and IRO standards that compelled them to send what was initially known as “Appendix 4B” data.

 

So, apply the same logic to the billing system and you will see the meter data collection subsystem is absolutely a BCS if its failure causes you to shut down your OT (SCADA/EMS) systems.  The part of the billing system that sends the invoices and payments is not.  Processing invoices and payments can wait until you get that system back up.

Here is what I take away from what Kevin says that he doesn’t favor expanding the CIP requirements to include systems located on the IT network because, if a system on the IT network meets the definition of BES Cyber System (which the different examples he used all do, even though the entities that operate them hadn’t classified them as such), it must be treated as a BCS, including being located within the ESP (i.e. the OT network). Of course, this only applies at Medium and High impact BES assets. Low impact assets aren’t required to have ESPs.

So a system like the pipeline billing system – if it existed in the electric power world – would need to be treated as a BES Cyber System, subject to all the privileges (?) attendant on that august designation.

I then asked Kevin whether he thinks utilities should designate their meter data collection systems as BCS. His answer was nuanced, yet at the same time quite clear:

Inconsistent.  The meter data loss does not impact reliability within 15 minutes (Tom’s note: The definition of BES Cyber Asset/BES Cyber System requires that the loss or misuse of the system would have an impact on the Bulk Electric System within 15 minutes. If it has an impact but it will usually take longer than that to happen, it’s not a BCS).  But it also does not cause the utility to shut down the grid.  Loss of telemetry does not stop the revenue-quality meter from collecting data.  Loss of the meter itself does not stop the flow of electricity.  There are procedures for dealing with an occasional failure, including redundancy and inter-utility meter data reconciliation.

If the meter is only a revenue meter, then it does not need to be a BCS.  If the meter also reports real-time flows and/or voltage, then it is a BCS.  That is what I meant by inconsistent.

So Kevin is saying that, given the current NERC CIP requirements, there are only two choices: The meter data collection system is a BCS or it’s not. If it’s a BCS, it doesn’t get any break from any other BCS, in terms of the number or types of requirements that apply to it. If it’s not a BCS, it’s completely out of scope for CIP.

But there are certainly cases where a lack of good security on the IT network can result in an outage of the OT network. I described a dramatic example of that in this post, where a ransomware attack that shut down the IT network but didn’t touch the OT network (as in the case of Colonial), in the end resulted in two large Control Centers being completely shut down for up to 24 hours, with the grid in a multistate area being run by cell phone.

It’s safe to say that none of the systems on the IT network of this utility met the definition of BCS, so there was no single system that led to the Control Centers being brought down – yet they were brought down anyway. This seems to me to point to the need for CIP to be extended in some way to cover IT assets – perhaps as some sort of “halfway house” asset. But there’s no way that the current CIP standards should be extended to cover anything else. They first need to be completely rewritten as risk-based. Then we can look at extending them to IT, based on the relative risk levels of OT vs. IT.

I’ll turn to Tim Roxey’s comments in my next post. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Wednesday, July 21, 2021

How could a billing system attack shut down an OT network?


Yesterday, I attended an excellent webinar on a topic I’ve been waiting to have someone explain to me, “Consequence Driven Cyber Informed Engineering (CCE) – Resilience Strategies”. It was sponsored by Midwest Reliability Organization (MRO), and featured two longtime friends of mine: Jodi Jensen of WAPA and Sam Chanoski of INL. Since a recording will be available on MRO’s website soon, I won’t try to reiterate what was said in the webinar, other than saying it’s worth your while to listen to the recording.

What inspired me to write this post was Jodi’s statement, regarding the Colonial Pipeline ransomware attack, that Colonial had said that they shut down the actual pipeline (i.e. their OT network) because of the loss of their billing system (which was on the IT network). Of course, the IT network was compromised, so it had to be shut down and the machines rebuilt.

Colonial insisted that their OT network wasn’t affected by the ransomware, but they had to shut it down anyway due to the loss of their billing system. Jodi wondered why the billing system was essential to operations. In other words, couldn’t they have continued shipping petroleum products through the pipeline and worried about billing later?

I wrote three posts after the Colonial incident: Here, here, and here (in that order). In all three of them, I discussed possible reasons why the OT network (and pipeline) had to be shut down, even though the ransomware didn’t penetrate it. I also linked to a post I wrote last October, describing an incident in 2018 in which a major utility – a BA for a multi-state area – had to shut down their Control Centers (i.e. an important part of their OT network) for up to 24 hours and run the grid from cell phones, when their IT network was hit by a ransomware attack that required rebuilding 12,000 computers from scratch.

Just like in the case of Colonial, the utility swore the ransomware never penetrated their OT network (and I have no reason not to believe them), but they couldn’t take the chance that just one machine in the Control Center had been compromised. If that had happened, that one machine might have then compromised all of the IT network when it was restarted, requiring another huge shutdown and rebuild (and I’m told that this becomes much less fun the second time around, to say nothing of the third or fourth time). Which is why they shut down and rebuilt all the systems in the Control Centers as well.

I brought up that incident because this might have been another reason why Colonial shut down their pipeline. And after I wrote the second post, one of the most prolific commenters on my posts, a person named Unknown, wrote in once again to say

Like you, I also believe that Colonial shut down because they could not accurately bill customers or track their customers' assets (i.e. refined petroleum products).

Pipelines are like banks and oil in the pipeline is like cash in the bank. If a bank loses its ability to track who gave them cash (or who they loaned it to), then there is no point opening the doors, even if they can safely store the money in the vault.

Unknown wrote this because I had pointed out in the post that the Washington Post had said in an editorial (which I paraphrase), “If they had kept their pipelines operating while the IT network was down, they wouldn’t have been able to invoice their customers.” I added, “And it’s safe to say that Colonial doesn’t feel that it should deliver gasoline through their pipeline solely as a charity.”

Unknown was pointing out that it was more than the wish to avoid operating as a charity that motivated Colonial to shut down. They don’t own the gasoline they ship in their pipeline, any more than a trucking company owns the furniture they ship or a bank owns the money in its vaults. If either one loses track of what’s been entrusted to them, the trucking company or bank has to repay the entire amount (and certainly with consequential damages) to whoever shipped the furniture or deposited the money.

In other words, this isn’t like an electric distribution utility, which – at least for a brief period of time – owns the electric power they’re distributing to their customers (I’ll omit discussion of Retail Choice here). That utility has to keep the lights on, no matter what it costs them, and if they can’t bill during an emergency, they can usually bill later (the meters needed for billing are all on the OT network, so presumably an IT network shutdown wouldn’t affect them anyway). Colonial isn’t obligated to keep the cars in Georgia full of gas (nor are they paid to do that, of course). They obviously can’t keep shipping gasoline if it’s likely they’ll end up having to pay the full cost of the gas to the shippers.

I concluded my third post on Colonial by articulating the first law of nature that I’ve ever identified. Tom’s First Law of OT Networks says that “an “operations-focused” company – as opposed to an information-focused company like an insurance company or a consulting firm – will be forced to bring their OT network down if their IT network falls victim to a ransomware attack.”

I’ve been told that this can’t be considered as a new law of nature because there are already enough of those. How about Newton’s Laws of Motion? They’ve been around since the 1600s, and Einstein showed they’re not applicable in extreme conditions. Why not drop one of them, and put my law in its place? Seems sensible to me…

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Monday, July 19, 2021

How can we incentivize Transmission cybersecurity investment?


Next Thursday July 29, I’ll be speaking on a panel, with Ben Miller of Dragos, which will be asking (and at least trying to answer) the above question, at the “Transmission Infrastructure Investment, US” virtual conference. The panel will be led by Jim Cunningham of Protect Our Power.

Our panel addresses just one of ten live sessions in the conference, all of which look quite interesting. You can get an agenda and sign up here. Our session will run from 2:40 to 3:30 ET.

I’ll hope to see you there. If you run into me in the hallway or at lunch, please say hello.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Friday, July 16, 2021

Video of Josh Corman’s Great SBOM Proof of Concept talk

I anticipated that Josh Corman’s talk at this week’s Energy SBOM Proof of Concept meeting would be good, and I certainly wasn’t disappointed – in fact, it was great. I’m going to describe it a little here, but I’m pleased to announce that the video is available – so you don’t have to take my word on any of this. Josh’s talk starts a little after the 12-minute mark and goes on for 22 minutes (his connection went down at one point, but he came back very quickly).

The meeting was devoted to discussing use cases for SBOMs. It occurred to us in planning the meeting that one of the best ways to address this topic was to hear from the two people most responsible for the movement to make software bills of materials more than just a nice concept, but a regular practice with well-understood (but not mandated) guidelines for production and use. These two people were Dr. Allan Friedman, leader of the National Technology and Information Administration’s Software Component Transparency Initiative, and Josh, who coined the term SBOM, although he always points out that he wasn't the first one who had the idea of inventorying software components. They both spoke at this week’s meeting on how they came to see SBOMs as an important need, and why.

Allan spoke first (and led the meeting, as he usually does). His talk was very good, and you should listen to it. However, Josh’s was exceptional. He covered two topics: The events that led him (and others) to believe that SBOMs were needed, and SBOM use cases. The latter was based on the NTIA document whose development he led in 2019, Roles and Benefits for SBOMs across the Supply Chain, which is one of the three or four fundamental documents produced by the Initiative.

Below are some very interesting statements he made in the “history” part of his talk. They’re certainly nowhere near everything he said (he managed to get in lots of words in a short amount of time, without rushing his words. Fortunately, in the video you'll be able to hear everything he says, if you’re not afraid to back up at a few points during his discussion), nor can I swear that I didn’t get a few things wrong.

1.      He remembers July 13, 2013 as the day that he woke up to the problem of software component vulnerabilities. On that day, servers running Apache Struts 2 – an open source component of many applications – were attacked through previously-unknown vulnerabilities.

2.      Josh’s reaction then was “It’s open season on open source. Who’s going to attack just one bank anymore, when they can attack lots of targets through one component?”

3.      At the time of that attack, Josh was in a high-level position at Akamai. However, he soon moved to Sonatype, an early leader in open source dependency (component) management – and now one of the leading software composition analysis tools.

4.      Probably the event that woke most of the rest of us out of our blissful ignorance of the problem of component vulnerabilities was the 2014 disclosure of the Heartbleed vulnerabilities in the OpenSSL cryptography library in 2014, which was estimated to be found in about half a million “secure” servers.

5.      Heartbleed – as far as I know – didn’t lead to any major breaches, but it required a huge effort by a huge number of organizations, just to find whether they had any vulnerable web servers - and if so, where. Why was that? OpenSSL is a component of other software, and often a component of other components, etc. Many organizations never even found all the instances of OpenSSL that they were running. For example, Josh says it took DHS six weeks to even answer the question of which federal agencies were affected by Heartbleed.

6.      Meanwhile, some financial companies knew in literally minutes or hours both whether and where they were affected. Why was this the case? Because they had kind of proto-SBOMs. Josh said the financial sector had woken up to this problem when he did – with the Apache Struts 2 attacks.

7.      After this, Josh decided to really dig into the idea of SBOMs and started reading Deming, who had stressed the importance of bills of materials for manufactured products. Having BOMs gave manufacturers the following advantages.

a.      They could have fewer, but better parts.

b.      They could compare quality of different suppliers and buy more from the high-quality ones.

c.      They could track which parts went in which products, so that if there were a problem with a part, it could be tracked down and replaced in any product in which it had been used.

8.      All of these benefits have direct analogues in SBOMs. 

      Another seminal event both for Josh and for awareness of component vulnerabilities, was the 2015 SamSam ransomware attack on Hollywood Presbyterian Hospital. This attack exploited a vulnerability in the JBoss Java development platform (now called WildFly). The hospital had to shut down patient care for about one week (I'm told that isn't a good thing for a hospital to have to do).

9.      The hospital knew about SamSam, but didn’t have any idea whether it was affected by the JBoss vulnerability and if so, where. Of course, this was because they had no SBOMs to provide them that information.

10.   It was this and the Wannacry attacks that caused the Food and Drug Administration, which regulates medical devices like pacemakers and infusion pumps, to put out a “Pre-market guidance” for those devices. While it didn’t require SBOMs immediately, it said they would be required in the future. This galvanized the medical community to start working on the problem of SBOMs and led to the creation of the NTIA Initiative.

But there’s a lot more. Watch the video!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Monday, July 12, 2021

If you please, Sir, would you be kind enough to patch this serious vulnerability in that software you charged me a lot of money for?


Last week, the Wall Street Journal reported that Kaseya was warned in early April of the previously-unknown vulnerability used in the recent devastating ransomware attack on hundreds of organizations worldwide (including MSP customers of Kaseya and customers of those MSPs).

It had been previously reported that a Dutch security research organization had informed Kaseya of the vulnerability (along with others linked to it) some time before the attack. Now we know that time was three months ago. Kaseya patched some of the vulnerabilities in April and May, but unfortunately, they didn’t get around to this vulnerability (actually one in a chain of vulnerabilities) before the successful attack. Darn the luck! Moreover, Kaseya still hasn’t fully patched the vulnerability, because of some sort of technical issue.

At the same time, we’ve learned about the potentially devastating PrintNightmare vulnerability in the Windows print spooler. It’s a long story, but the gist is that in late June, some researchers mistakenly released a proof-of-concept exploit for the vulnerability. When the mistake became clear, they pulled the code back, but not before it had been copied and improved upon. Now the ambitious hacker has at least three sets of exploit code to choose from. So there is some good news in this story…for the hackers.

Of course, all this vulnerability does is allow attackers to take control of the Windows domain controller…nothing serious or anything like that. We have to assume they (and probably our Russian government friends, as usual busy as beavers in their never-ending quest to make life hard for Western countries. All without having to resort to nuclear weapons, since using those is messy and is regarded as a real faux pas in polite company) have already penetrated as many targets as they possibly can, since they assume that Microsoft will finally fix this vulnerability.

Indeed, Microsoft did issue a patch for the vulnerability last Tuesday. However, on Wednesday a researcher demonstrated online how exploits could bypass the patch. So it seems we’re not out of the woods yet.

Clearly, leaving important software companies – critical infrastructure, if the term has any meaning at all – to make all the decisions about when, or even if, they’ll patch important vulnerabilities isn’t working. This isn’t like your dry cleaners messing up one of your shirts. Both of the above failures have potentially huge consequences, just like SolarWinds did.

Maybe there should be fines that kick in X number of days after the company learns of a serious vulnerability, and increase every day that the vulnerability isn’t patched (and if there’s no way to patch the vulnerability for some reason, then the software company should order their vulnerable product to be taken down, and be required to compensate their customers for whatever damage this causes them).

With great power comes great responsibility. The companies are quite happy with the former, but they’re not so keen on the latter.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.