Tom Alrich's Blog: October 2020

Friday, October 30, 2020

Here’s how somebody could really impact the US grid

If you're looking for my pandemic posts, go here.

As I write this, US hospitals are being hit with an unprecedented ransomware attack by Russian-speaking criminals; patient care is already being affected. There’s no evidence that this is anything more than criminal activity at this point, and if anything the hospitals are just catching up with other sectors in terms of being targeted for ransomware – sectors like local government and school districts have already been hit very hard.

It seems the criminals have come to realize that a hospital faces much more pressure to quickly return to normal operations after a ransomware attack than for example an insurance company, and thus may be more likely to pay the ransom rather than wipe their systems and restore from backup. Part of the pressure may be due to the fact that a death due to a ransomware attack on a hospital in Germany may have been the first documented death due to a cyberattack.

However, I think the current attacks on hospitals are different and provide a warning signal for the operators of the US electric grid. What catches my attention is that these attacks are clearly coordinated. Sure, they’re probably coordinated by criminals, who aren’t likely to see much advantage in targeting the grid. But there’s nothing to prevent them from being coordinated by the Russian state instead. And there’s no doubt that the Russians want to have the power to cause big outages on the US grid, even if they don’t want to exercise it currently.

As you know, ransomware attacks aren’t addressed at all by the NERC CIP standards, and – given the current mostly prescriptive nature of those standards – I don’t think they should be now, either. But I do think there should be a NERC effort to make sure that electric utilities are taking the necessary steps to protect against ransomware, including both technical and non-technical steps (with anti-phishing training and testing being no. 1 on my list).

Some people will want to point out that ransomware affects IT networks, not OT ones. I’ll agree that’s true in the case of substations, where the most important programmable grid control devices – electronic relays and remote terminal units (RTUs) – are almost entirely impervious to most ransomware. But this isn’t the case with Control Centers, where the devices almost all run Windows or Linux. They’re much more like IT than “true” OT networks, and some utilities consider them part of IT, not OT. However, the fact is they play a crucial role in monitoring and controlling the grid, which is why they play such a prominent role in the CIP standards.

And now someone might point out to me that, since Control Centers are well-protected by NERC CIP, it would be virtually impossible for ransomware to spread to them. I used to think that was the case, until I heard about this event in 2018. Anybody who thinks that Control Centers are immune to the effects of ransomware is living in a fool’s paradise. They would be a great vector for a serious ransomware attack aimed at disrupting the grid itself.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

Thursday, October 29, 2020

The case for SBOMs

In yesterday’s post, I started what will most likely be a long series of posts on what I’ve learned about SBOMs recently. This is the second in that series.

Of course, before I start writing a lot of posts, I need to tell you why I think SBOMs are an important topic you should know about (if you’re in any way involved with cybersecurity), vs. say new developments in beekeeping. Here is the way I see it:

· According to a 2020 report by Sonatype, the average software product has 135 components included in it.

· If we assume that any software (whether a component or an end user product) has a probability X that a vulnerability will be identified in it in a given year, then the probability that a vulnerability will be identified in any component is also X. This means that the probability that a component of a product will develop a vulnerability is 135 times the probability that the product itself will.

· However, the vast majority of vulnerabilities in components don’t end up being exploitable in the product itself, for various reasons (e.g. a library has three modules, but only two of them were incorporated into the product – and the vulnerability was in the other module). Veracode estimated recently that around 5% of component vulnerabilities are actually exploitable, meaning that the probability that a component will develop a vulnerability is about seven times the probability that the product itself will. While this is better than 135 times, it still means that component vulnerabilities are seven times as much of a threat as are product vulnerabilities themselves.

· This wouldn’t in itself be a huge problem. It means you would have to spend 7 or 8 times as much time patching or mitigating vulnerabilities in components as you now spend addressing product vulnerabilities, but in principle it could be addressed by hiring more warm bodies. However, the problem is that in many cases, you will never hear about the component vulnerabilities.

· In fact, a 2017 Veracode study said “A mere 52 percent of companies reported they provide security fixes to components when new security vulnerabilities are discovered. Despite an average of 71 vulnerabilities per application introduced through the use of third-party components, only 23 percent reported testing for vulnerabilities in components at every release.”

And here is the problem: While there are certainly many software suppliers who make sure to patch (or otherwise mitigate) vulnerabilities in components, half of them don’t do that. And it’s highly unlikely those suppliers will even tell you about the component vulnerabilities they haven’t patched, so you could at least take some action on your own to mitigate them. You just won’t know about them, period.

Enter SBOMs. If you have an SBOM that shows the components of a software package your organization has installed on its network, you can then keep an eye out for vulnerabilities in those components. Currently, you’ll probably have to do that manually, but in the near future there will be services that will do this for you. That is, they will:

1. Ask you up front to provide them a list of software packages, which you want to monitor for component vulnerabilities.

2. Receive current SBOMs from suppliers (since an SBOM needs to be updated whenever the software changes, or whenever a component changes) and identify vulnerabilities that apply to the components.

3. For each software package that you’ve listed, provide you a list of the components[i] in it, along with any vulnerabilities identified in those components (of course, this information would need to be regularly updated).

Sounds pretty simple, right? Just three steps! However, I’m 99% certain that currently no such service exists, mainly because very few suppliers are providing SBOMs now. So how can we get suppliers to start providing SBOMs? Beat them over the head with sticks? Threaten to throw their CEOs in jail? Clearly not. Suppliers will provide SBOMs when it’s clear to them that their customers are demanding them.

But how can customers demand SBOMs if they don’t know what to ask for or how they will be able to use them? It seems clear to me that both the software suppliers and the software consumers need to work together to figure this out. That’s what the NTIA does, as described in my previous post. They do it through conducting proofs of concept for particular industries, including (hopefully) the power industry in the near future.

But the lack of SBOMs is far from the only problem that needs to be addressed, before SBOMs are freely available and usable. I'll discuss other problems in further posts, coming soon to a blog near you.

[i] Initially, it will be very hard to get anything other than a two-level SBOM; the first level is the software package itself and the second is its immediate components. It will be a while before you will be able to see the components of those components, the components of the components of those components, etc. The holy grail would be to have a nested list that you could expand until you reached components that themselves have no components. But that’s probably impossible in any practical sense, since almost all software has components. And frankly, the farther down you go in the tree, the lower the likelihood that a vulnerability in a component will be exploitable in the software you’re running on your network. So it’s probably not likely that anyone would invest resources in going down more than say four or five levels. However, just having a two-level SBOM is challenging enough nowadays. Once that problem is nailed in say 5-10 years, we can worry about more levels.

Wednesday, October 28, 2020

What I’ve learned about SBOMs – part I

If you're looking for my pandemic posts, go here.

I’ve written about seven posts that touched on various aspects of software bills of materials, but the only post in which I’ve tried to set out the case for SBOMs is this one from late August. However, I can now say I know a lot more about SBOMs than I did in August. This is because since that time I have been able to become involved with the Software Transparency Initiative led by Dr. Allan Friedman of the National Telecommunications and Information Administration (part of the Dept. of Commerce), working for a client that would like to see SBOMs become widely used in the electric power industry[i].

Thanks to that client, I can now say I have a much better understanding of SBOMs than I did in August, including a lot of important nuances that I had no idea about then. It would be nice if there were a single book that you could pick up to learn everything there is to know about SBOMs, but that simply isn’t the case now and I’m not sure it will ever be the case. If you really want to learn about SBOMs, there’s no substitute for getting involved (preferably at least a couple of hours a week) in the weekly meetings conducted by the NTIA, and reading the documents that are being produced by this initiative.

You can read about NTIA’s mission here. They help foster new technologies having to do with among other things the internet and cybersecurity. They don’t do this by publishing regulations or even guidelines, but by convening “multistakeholder processes” that include people from industries affected by the new technology (of course, every industry is affected by the internet and cybersecurity). These people meet and decide among themselves how they can best move forward the technology in question.

As an aside, I want to point out that NTIA has often been very successful in accomplishing their missions. One example is something you use every day (in fact, often every waking hour of every day since you use it whenever you use the internet in any way): DNS. NTIA didn’t develop DNS, but they did get it operating and functional at scale, and turned its operation over to the Internet Assigned Numbers Authority (IANA) in the late 1990s.

In the case of SBOMs, what’s most needed now is for software suppliers (which includes open source communities, since 9/10 of software components are open source) to understand how and why to produce SBOMs, and for organizations that use software (i.e. just about every organization on the planet) to understand how they can use SBOMs to increase their security.

Most people, when they think of a government-affiliated organization that’s trying to foster a new technology, will immediately understand that to mean the organization is developing standards or guidelines (mandatory or not). That’s how I understood the NTIA Software Transparency Initiative when I wrote the August post linked at the top of this post. However, that’s definitely not what this group is doing! They have decided (perhaps implicitly) that the best way to accomplish their goal is to foster proofs of concept (PoCs) in particular industries.

I believe that the reason they’ve done this is because the needs of different industries can differ a lot, so the formats and procedures agreed on for one industry won’t necessarily be usable for another industry. Even more importantly, the PoC itself will be the best way to achieve the twin goals of having suppliers provide SBOMs and software users ask for (or even demand) them. These are actually two sides of the same coin: No supplier is going to waste their time producing SBOMs (and there will be a lot of work required, as I’ll discuss in future posts) if none of their customers are asking for them. And until those customers have some sort of concrete demonstration of how they can use SBOMs and benefit from them, they won’t ask for them.

The first (and so far the only) industry that has done an SBOM proof of concept is healthcare. In 2018, a group of medical device makers and hospitals (about five of each) started meeting regularly. Their initial goals were:

1. To determine how suppliers would produce SBOMs - i.e. in what format. There are three primary formats currently, none of which is a “standard” or “endorsed” (in fact, I’ve already found one vendor that says their SBOM-related service is “endorsed by the NTIA”. There ain’t no such thing), all of which the group considered. The group also considered the option of altering one of those formats to address specific industry needs. They ended up choosing the “vanilla” SPDX format for the first PoC, but for their current PoC (which is really their third, although they consider this the second iteration of their second PoC), they are considering the other two formats as well, and also considering making industry-specific additions to whatever format they choose.

2. To determine the most important use for SBOMs in the hospitals, and what would be required for them to achieve positive results. This has perhaps been the easiest part of the PoC to define: The primary purpose of having SBOMs is to be able to track vulnerabilities that are identified in software components of a medical device (like an infusion pump) in use at the hospital.

So far, the only PoCs that have been started are the three (or two, depending on who’s counting) PoCs that apply to medical devices used in hospitals. However, a PoC for the auto industry (of course, tracking all of the software components that are included in your car’s engine) is supposed to begin soon. I’m waiting for stickers in auto showrooms to show, right under miles per gallon, statistics like “Number of software component vulnerabilities currently unpatched”. I’m sure that will be the deciding factor in a lot of purchase decisions. After all, who cares whether the car has a sunroof as long as there aren’t many unpatched component vulnerabilities?

But this is a power industry blog, and you may be wondering when a power industry PoC will start. Up until a couple weeks ago, I would have told you “It will be a while.” Now I’d say it’s something that you’ll at least hear about this year, although my guess is it won’t start until next year. I’m hoping there might be something to announce[ii] in a few weeks, although that might be over-optimistic.

I have decided that I’m going to be posting quite regularly what I’ve learned (and am learning) about SBOMs, since they will be getting a lot more attention from the power industry next year. You’ve been warned.

[i] I’m also now involved with another organization, the DBOM Consortium, that is pursuing a complementary technology path called Distributed Bill of Materials. I’ll discuss them in another post in the future.

[ii] It’s actually much more certain that a power industry PoC will start soon for the DBOM consortium, which would be complementary – and perhaps coordinated with – the SBOM PoC for the industry; there is already a group meeting to discuss and frame this PoC. If you’re interested in participating in this group, send me an email and I’ll connect you to the people running it.

Saturday, October 24, 2020

Well, this is embarrassing…

If you're looking for my pandemic posts, go here.

This recent article in the Daily Beast, subtitled “Ich bin ein Red Flag”, describes how a German cybersecurity expert, known for his contacts with government officials throughout Europe and the US, was outed by the German government early in 2019 for being part of a Russian government influence operation. However, somehow six months later he was meeting in Washington with DHS cybersecurity officials, as well as Microsoft and Amazon executives. All of this at a time when a simple Google search would have discovered the true story about him.

But there’s more to the story. Not only did he meet with the above parties, he met in Washington with executives of a little-known power industry organization: the North American Electric Reliability Corporation. And he met with the California Public Utilities Commission. Have any of you heard of these two organizations?

To be fair, he had been traveling to the US for a number of years before he was outed, and the article doesn’t specifically say that he met with NERC and the CPUC after the outing. But this does go to show that you need to be careful about the people you meet with, not just the ones you hire.

Thursday, October 22, 2020

A great supply chain security presentation – part I

If you're looking for my pandemic posts, go here.

I recently put up a post describing some excellent points regarding firmware security, that were made in a paper written by Matt Wyckhouse of Finite State. However, I’m not done with Matt; he seems to be a fount of interesting observations. In this post and the next two parts (which won’t necessarily appear consecutively), I want to discuss Matt’s excellent presentation at the Midwest Reliability Organization’s annual security conference (virtual this year, of course) two weeks ago. This presentation made some really excellent points about supply chain cybersecurity, although as you’ll see I think Matt also missed – or at least underemphasized – an important point.

The title of Matt’s presentation was “Don’t trust – verify everything: Scaling product and supply chain risk assessments through firmware analysis”. The slides – and maybe the recording – should be posted on MRO’s website in the near future, but they’re not there now. Matt addressed three important points[i], which I’ll discuss in the three parts of this post.

The first point has to do with determining “foreign adversary” influence. As you probably know, the May 1 Executive Order is focused on that problem. Its stated purpose is to place DoE in charge of finding threats to the Bulk Power System that might be found in products with some degree of “influence” from six countries, including China and Russia.

First off, there are no products sold by Chinese (or any of the other five foreign adversaries) companies that are used to control or monitor elements of the US Bulk Electric System, as Kevin Perry and I documented in this post five days after the EO came out. Moreover, the only BES products we could identify that might even be assembled in China were Dell and HP servers and workstations used in Control Centers. And since the same servers and workstations are used by literally every industry, it’s hard to see how the Chinese could launch an attack on the US grid through such devices.

So now, we need to look at “influence” in a general way – that is, cases where a hardware or software manufacturer is somehow under the influence of the Chinese, even though it isn’t headquartered in China and even though its products aren’t manufactured or even assembled there.

But in his presentation, Matt pointed out that there’s no good way to state what constitutes influence, without including many companies that are highly unlikely to be under foreign adversary “influence” in any meaningful sense. Here are some of the points he made:

· Not buying Huawei is an easy choice; the US government has already made that choice for you. But no Huawei products are now used to monitor or control the US BES.

· But how about Honeywell? They’re a longtime critical infrastructure and defense supplier, and some of their products are definitely used on the grid. Yet some of their products include components from Huawei, Dahua, Hikvision and HiSilicon – all banned Chinese companies. Should you stop buying from Honeywell? (Tom says: I hope not. They’re my former employer and I still think highly of them)

· How about Siemens? They’re a huge supplier to the power industry and many other industries. Yet they have 21 R&D hubs in China and over 5,000 R&D and engineering staff there. Should you stop buying from Siemens because of that?

· Let’s say somebody located in China contributed to an open source software product (which probably happens every day, of course). Do we need to ban that product? More importantly, how would we ever verify the location or even the identity of all contributors? And remember, the whole idea of open source is that there are many eyes looking at the code. Even if say a Chinese People’s Liberation Army soldier placed some sort of backdoor in an open source project, there’s no guarantee at all that it wouldn’t be removed before the code was made available for download.

As Matt points out, the bottom line is that it’s a losing proposition to try to ban products based on connections to a particular country. Connected devices have complex global hardware and software supply chains. If we don’t want adversary countries in our supply chains, where do we draw the line? Any line we draw will only change the attackers’ tactics. Of course, it’s perfectly acceptable to say that no Chinese products can ever be installed in a position where they could be remotely controlled to execute an attack on the BES. As I’ve already said, there aren’t any Chinese products in such a position now, but I have no problem in saying they should never be installed in the future as well.

However, it’s important that no devices from China be banned, if there’s absolutely no way they could be controlled remotely – or even pre-programmed to misoperate at a certain time in the future – to impact the BES. Any device that could be misoperated to have a BES impact at a later date would have to have some sort of logical engine built into the device: a microprocessor, FPGA, etc. Yet about 20 of the 25-odd devices listed in the EO are controlled by an external device like a relay (if at all); they don’t have any logic built into them. So how could they possibly be subject to a supply chain cyberattack?

One of these “non-logical” devices is a transformer. A transformer, taken by itself, operates purely according to the laws of physics; it neither requires external commands to operate, nor is there any built-in logic that could change its behavior.[ii] Yet, in the wake of the EO, transformers manufactured in China have been pointed to as some sort of dagger pointed at the heart of the US grid, when in fact they are no more likely to be subject to a supply chain attack than my steam iron is.

I want to point out that I’ve never heard of a vulnerability or backdoor that was deliberately planted in critical infrastructure equipment by a nation-state in order to attack the US. On the other hand, the US has definitely done this to other countries. For example, the mother of all supply chain attacks was conducted by the US and resulted in a huge pipeline explosion in the Soviet Union in 1982, which played a role in the collapse of the USSR eight years later. And there are suspicions that the backdoor that was found in Juniper routers in 2015 was actually planted by the NSA.

This doesn’t mean that other countries wouldn’t try to do the same thing to us. But the question is why they would do this, given that the discovery that an adversary had deliberately caused a serious critical infrastructure disruption would very likely be taken as an act of war. And there’s no country in the world, other than perhaps Russia, who would be able to “win” a war with the US.

Of course, there are many supply chain risks due to nation-states that have nothing to do with cybersecurity. For example, if your organization gets in a dispute with a supplier in a country that doesn't have an independent legal system, you may find that your company won't be treated fairly in the courts. You definitely need to learn all you can about risks in foreign countries, whether on the list of foreign adversaries or not.

But it simply amazes me that people talk about supply chain cyber attacks by foreign adversaries as if they’re very likely to happen, when they’re almost impossible to carry out. Sure, we need to take steps to prevent such attacks, no matter how improbable they are. But what’s much more probable (although even then, not very probable) is that a couple bright teenagers in India would be able to cause a grid outage by exploiting a garden-variety vulnerability in firmware or software. This is a much greater risk. But of course, it’s not addressed at all in the EO.

[i] And other points as well. But I found these three points the most interesting. Not surprisingly, they all tie in with issues that have been discussed a lot in the industry lately, especially in relation to the May 1 Executive Order.

[ii] Transformers sometimes have load tap changers (or dissolved gas analyzers), which are controlled by built-in microprocessors and might themselves be attacked to impact the BES, although it’s very hard to imagine how an attack on either one could result in anything more than a small local outage. But load tap changers (LTCs) are often manufactured by third parties, so any attack would really be on the LTC, not on the transformer itself. However, load tap changers aren’t even listed in the EO. One big manufacturer of LTCs is GE. Maybe LTCs from China should be banned, but not Chinese transformers themselves.

Monday, October 19, 2020

Still a long road to the cloud

If you're looking for my pandemic posts, go here.

A little more than a month ago, I wrote a post about the fact that it’s currently highly “illegal” for NERC entities to implement BES Cyber Systems in the cloud, e.g. using outsourced SCADA. I described what would need to be done with the CIP standards to change this – essentially, rewrite them entirely as risk-based. I provided one example of how this could be done: by replacing the patch management requirement (CIP-007 R2) with a requirement to address risks due to unpatched software vulnerabilities (since patch management is the most important mitigation for unpatched software vulnerabilities, but only in cases when a patch has been released).

I was essentially saying in this post that all of the existing CIP requirements should be pulled apart to see which risks they address, and a risk-based requirement could be written to replace each one. And how would that point the way for NERC entities to place BES Cyber Systems in the cloud? Because once the requirements are all risk based, there are many ways to prove compliance, rather than just the single way specified in the requirement. And that opens up the way for the NERC entity to point to a cloud provider’s FedRAMP certification (or perhaps another like SOC II) as evidence that they are taking appropriate steps to mitigate each of the risks addressed in the existing CIP requirements.

I discussed the idea of using the FedRAMP certification as compliance evidence in this post, but I closed with a hard question that still needs to be answered, before even cloud providers with FedRAMP certification should be considered safe:

Are there any serious cyber risks that apply to cloud providers, that aren’t addressed either by CIP or by FedRAMP? If so, doesn’t that mean there might need to be some new CIP requirements before the Good Housekeeping Seal of Approval is bestowed on the cloud providers, FedRAMP or no FedRAMP?

The answer to both of these questions is yes, of course. I discussed one of those risks – the risk that led to the Capital One breach in 2019 – in this post a few weeks ago. Now I’d like to point out one very big risk I identified in this post at the end of last year, based on a great article in the Wall Street Journal.

I just reread the article, and it’s quite impressive. It describes in great detail how the Cloud Hopper attacks discovered in 2016 were actually much larger than originally known. Essentially, a Chinese team (two members of which were indicted by the US in 2016, but of course remained at large in China) had penetrated more than a dozen cloud providers and far more than the 14 companies named in the indictment. They had stolen lots of data from some very big companies.

Most importantly, they didn’t break into all of these companies from the internet, like Paige Thompson did with Capital One. They were able to hop from one company to another within a single cloud provider. As the article says, “Once they got in, they could freely and anonymously hop from client to client, and defied investigators’ attempts to kick them out for years.”

This is clearly not a problem with cloud customers not configuring their systems properly, as AWS had alleged about the Capital One breach; it seems the “walls” between cloud customers were like Swiss cheese, as far as the Chinese attackers were concerned. And once again, this is clearly a problem that isn’t addressed in FedRAMP, since so many cloud providers proved vulnerable to the attackers.

The bottom line is that allowing BES Cyber Systems to be placed safely in the cloud will require more thought than “just” rewriting the CIP standards or making it easy for a cloud provider to prove compliance based on their FedRAMP certification. It will require a careful examination of the real risks to be found in the cloud.

Thursday, October 8, 2020

Will there ever be supply chain attacks on firmware?

Last Friday, I watched an excellent webinar on software supply chain security, which focused on aDolus – a really interesting company offering a potentially very useful service for supply chain security. The webinar started with an excellent presentation on supply chain security by Patrick Miller – who I believe needs no introduction to anybody in the North American electric utility cybersecurity/CIP compliance world. In his presentation, Patrick focused on both hardware and software supply chain security risks.

After Patrick spoke, there was an active chat session conducted on the webinar site. Somehow it moved to the subject of hardware vs. software supply chain risks (or maybe I moved it myself. I can’t remember, and I don’t have access to the chat text now). I expressed the opinion that supply chain risks to software were much more pervasive than supply chain risks to hardware (I’ve hinted at this idea in several posts, but never devoted a post to it. I hope to do so in the not-too-distant future).

Was this last statement simply an intuition due to pure brilliance on my part? I’m afraid not. It was definitely due to brilliance, but not mine – rather, that of Matt Wyckhouse, the founder and CEO of Finite State, which is another very interesting company. I’ve known Matt since the beginning of this year, but what I said about firmware attacks came from a very good paper from Finite State entitled “Huawei Supply Chain Assessment”, and specifically the section titled “SUPPLY CHAIN SECURITY CHALLENGES”, which starts on page 12.

I will summarize the discussion I think is most relevant from this section, but I strongly recommend you read the whole section, because there is a lot more in there. What I said about firmware is based on the following chain of logic, which isn’t exactly in the order it’s presented in the paper:

· A vulnerability is a flaw in software or firmware, and a backdoor is a vulnerability that is uniquely known to the attacker. Of course, at least 95% of backdoors (or so I would think) are intentionally inserted by the manufacturer to make it easier for them to troubleshoot the device later.

· There are three kinds of supply chain cybersecurity attacks: hardware, firmware and software. A hardware supply chain attack requires physically altering the microcode of a microprocessor or a field programmable gate array, or adding another component onto a board that can enable access or data exfiltration. These attacks are fiendishly difficult to execute; moreover, “No software defenses can truly overcome a hardware backdoor, and they cannot be patched after detection.”

· Of course, we know there are software supply chain attacks. They usually involve insertion of a backdoor or a hidden user account. These happen regularly. I discussed two examples in this post and this one.

· And then there are supply chain attacks on firmware. To understand why I say there’s never been a supply chain attack on firmware, read the section titled “Modern Electronics Supply Chains” on pages 12 and 13 of the report. That section describes the complex web of suppliers and integrators that contribute to a component that goes into an electronics product, each of which contributes to the final firmware image, with no supervision of the overall process.

· The section concludes by saying “In the end, that image could contain software written by thousands of engineers at dozens of companies across many different countries.” Of course, this inevitably results in lots of vulnerabilities (certainly many more than are found in most software products).

I need to confess that I told Patrick on Friday that there’s never been a supply chain cyberattack on firmware; that isn't accurate. It is accurate to say that it would be close to impossible, if a vulnerability were exploited in a cyberattack, for an investigator ever to conclude it was due to a supply chain attack (i.e. a backdoor) - rather than just due to somebody exploiting one of the numerous vulnerabilities found in your average firmware package.

And it’s those numerous vulnerabilities that point to another reason why it’s unlikely there will ever be a clear supply chain attack on firmware: With so many different vulnerabilities to exploit in firmware, why would your average Joe Hacker – or even your average Vladimir Nation-State – go to all the trouble of crafting and executing a supply chain attack? As Patrick pointed out in his presentation, supply chain attacks are awfully hard to execute and usually take a lot of resources; it’s better to go in the wide-open front door, not the back door with multiple locks, a security camera and guard dogs. After all, cyber-attackers need to pay attention to costs, just like the rest of us do.

Sunday, October 4, 2020

When will a ransomware attack impact the Bulk Electric System? 2018

If you're looking for my pandemic posts, go here.

You probably saw the news story last week about the massive ransomware attack on Universal Health Services, a large chain of 400 hospitals. About 250 of those hospitals lost partial or complete use of their computer and phone systems. While the official announcement said that no patient data or services were disrupted, Bleeping Computer reported, based on examination of an online employee bulletin board, that there were at least four deaths due to lab results arriving too late to take actions required to save patients (since the results had to be delivered by hand, not electronically).

Moreover, I was in an online meeting with a number of healthcare cyber security people last Thursday, when they started discussing this. They all agreed that four deaths is probably an underestimate of the total due to this attack, given how many hospitals were involved and the many ways in which loss of computer or phone systems could lead to a death, even if it isn’t the immediate cause (for example, the patient who was turned away two weeks ago from a hospital in Germany – due to the hospital being crippled by a ransomware attack – didn’t directly die due to the attack; she died of whatever her illness was. However, her death could have been avoided had the hospital been able to receive her, since she died on the way to the next-nearest hospital). In fact, these people said that, statistically speaking, there must have already been a number of patient deaths due to cyberattacks on hospitals, such as the worldwide Wannacry attack of 2017, which had a devastating impact on the UK’s National Health Service yet was officially reported not to have led to any deaths. It was only because the hospital in Germany directly attributed the death there to the attack that the unfortunate lady became the first person who officially died from a cyberattack anywhere in the world.

So it won’t surprise you if I say that ransomware is almost without a doubt the number one cyber threat worldwide, including in the US. But since this post is about the Bulk Electric System, let me ask you: Have you ever heard of a ransomware attack affecting the BES? Like most of us in the industry – and like I myself would have said until a year ago – you would probably point out that utilities might have been compromised on the IT side of the house, but it’s virtually impossible that a ransomware attack could affect the BES, at least at a Medium or High impact BES asset. After all, there’s no email within Electronic Security Perimeters, and it’s almost impossible for ransomware to spread into an ESP through an Electronic Access Point (which has lots of protections); moreover, since all interactive remote access needs to go through an Intermediate System, that wouldn’t be a likely attack vector either.[i]

But what would you say if I told you that a few years ago, there was a huge ransomware attack on a major electric utility that did in fact impact the BES, although it didn’t lead to an outage? And what would you think of me if I told you that the ransomware had a huge BES effect on two High impact Control Centers, yet the utility was correct when they asserted that the ransomware didn’t actually penetrate those Control Centers? Would you want to have me locked up (you might want to see that for other reasons, but please confine yourself to the example at hand)? Would you think I was describing something like quantum tunneling in physics, where just the laws of quantum mechanics allow a particle (or wave, same thing) to penetrate through a barrier – in fact, to be on both sides of the barrier at the same time?

No, I’m not (excessively) crazy when I say this. Just listen to my story:

In 2018, a major utility reported publicly that they had been the victim of a large malware attack (they didn’t use the term ransomware, but it wasn’t as fashionable then to be a ransomware victim as it is nowadays) that had affected a large number of systems on their IT network. However, they swore up and down that there had been no operational impact. They issued a statement saying "There is no impact on grid reliability or employee or public safety. The systems used to operate…(our)…transmission and distribution systems are on separate networks, and are not impacted by this issue.”

I read that and thought “Well, it’s certainly honorable that they went the extra mile and reported this incident, since they really didn’t have to. I’m glad the BES wasn’t impacted.” I imagine most of you thought the same thing.

However, just about a year ago a friend of mine told me an interesting story. At the time of this incident, he worked for a smaller utility that was in the control area of the utility that was attacked. Of course, his utility was always in contact with the large one, since they were constantly exchanging data, both electronically and verbally.

He said that on the day this attack happened, they were called by people in the main Control Center of that utility and told that all communications with them would need to be by cell phone for the time being, since all of the systems in the Control Center – as well as the backup Control Center – were down; moreover, the VOIP system was down as well - hence the cell phones. And indeed, it wasn’t until the next day that all systems seemed to finally be restored.

Fortunately, an event where a Control Center is totally down, or loses all connectivity, is something that utilities rehearse for all the time. There don’t seem to have been any serious operational issues caused by the fact that the Control Center operated by cellphone for 24 hours. So why do I call this a BES incident?

Because, as anybody familiar with NERC CIP compliance knows, the Guidance and Technical Basis section of CIP-002-5.1a (the version currently in effect) lists nine “BES Reliability Operating Services”, affectionately known as BROS. “Impact on the BES” - in the CIP-002 sense - means a loss or compromise of one of the BROS. If this impact is caused by a cyber incident, it needs to be reported to the E-ISAC (and probably to DoE on form OE-417) as a BES cyber incident.

One of the nine BROS is “monitoring and control”. Of course, this is what Control Centers do, and these CCs lost the ability to fulfill this BROS during the outage; ergo this was a BES cybersecurity incident. You might argue that the utility still had control of the BES, since they continued to be able to – and did – call their partner utilities to issue instructions. But they had definitely lost the capability to monitor the grid in real time in their control area.

A year later, a renewables operator in the West reported to DoE on Form OE-417 that they had lost connection with their wind or solar farms for brief periods during one day, due to what appeared to be a random cyberattack on their Cisco™ routers – in other words, they briefly lost the ability to “monitor and control” their remote assets. Unlike the 2018 incident, this was reported to DoE (and also the E-ISAC), so it was made public. Note that in both cases there was a loss of the ability to "monitor and control", although in only one of those cases was this reported as the BES incident that it was.

At the time, most of us thought this was the first true cyber attack on the grid, yet it turns out that the first attack was really a year earlier. What was lost in 2018 was real time monitoring of the grid within a multi-state area, not just monitoring of some wind or solar farms as in 2019 (also, wind and solar farms are usually quite happy to operate completely on their own, and being limited to phone communications with the control center wouldn't usually cause a problem).

You might wonder why, given that there was no grid event in the case of either attack, either of them should have been reported. The problem is that, had the right event come along in 2018 (e.g. some disturbance that cut off two important transmission lines), it might have overwhelmed the control center staff’s ability to control it – or even understand it – simply through talking on cell phones. Fortunately, that didn’t happen.

Why do I say the 2018 BES incident was caused by ransomware, since the ransomware probably never touched the IT network? Here’s what my friend told me about the incident:

1. A user on the utility’s IT network clicked on a phishing email, and ransomware quickly spread throughout the IT network. The IT department decided there was no alternative but to wipe over 10,000 systems on that network, then re-image them and restore key systems from backups. They also had to require all of the thousands of employees in the entire company to log off their corporate (IT) computer accounts for 24 hours during the restore process. Furthermore, they had to deploy malware scans of thousands of end user computers for employees and contractors during the 24 hours of down time. Of course, this was a huge, expensive operation.

2. The primary and backup Control Centers didn’t appear to have been affected by the ransomware, but here’s the problem: IT realized that, if the ransomware had spread to just one system in the Control Center, that system alone might end up reinfecting the whole organization again – both the IT and the OT (ESP) networks - once the IT network came up again.

3. So IT decided they had to wipe all of the Control Center systems and restore them as well, even though there was no indication that any of them had been compromised; not doing so was too big of a risk to take.

4. They also decided they had to do this to the systems in the backup Control Center as well, since it’s likely that any ransomware in the primary CC would have been quickly replicated to the backup CC. Again, it would simply be taking too big a risk if they didn’t do this.

And that, Dear Reader, is why the Control Center staff had to run the grid by cell phone for many hours that day. Their employer was technically correct in the first sentence of their statement: “There is no impact on grid reliability or employee or public safety.” However, the second sentence - “The systems used to operate…(our)…transmission and distribution systems are on separate networks, and are not impacted by this issue.” – is definitely not true.

Unless, of course, you think that being shut down for 12-24 hours is the same as not being “impacted”. The fact is, this utility is damn lucky there wasn’t a big outage due to this incident. And being lucky isn’t one of the BES Reliability Operating Services, at least the last time I checked.

[i] There is still the possibility that ransomware could spread into the ESP by means of a computer that was conducting machine-to-machine remote access into the ESP, since until three days ago that was specifically exempted from Requirement Parts CIP-005-5 R2.1 – R2.3. However, on October 1 CIP-005-6 R2.4 and R2.5 went into effect, which offer at least some protection against compromise through machine-to-machine remote access.

Friday, October 2, 2020

Report says 92% of US CISOs surveyed experienced a supply chain attack in the last year

My longtime friend Jim Ball, CISO of the Western Area Power Authority, sent me an interesting article from SC Magazine this week. It described a report by the cybersecurity services company BlueVoyant. The report was based on a survey of CISOs in five countries, although this version of the report discussed mainly the US results. The CISOs were in a range of industries, including “utilities and energy”.

The article had some pretty striking findings, so being skeptical of them, I downloaded the report itself. I found it was based on a survey of over 300 CISOs in all industries, and it seems to be credible. Among the most interesting findings were (quoting from the article):

· “92 percent of U.S. organizations suffered a breach in the past 12 months as a result of weakness in their supply chain.” The report didn’t say exactly what form these supply chain breaches took, which was disappointing but doesn’t indicate the report shouldn’t be believed.

· “When four other countries (the U.K., Singapore, Switzerland and Mexico) are included in the research, 80 percent of the more than 1,500 CIOs, CISOs and CPOs suffered a third-party-related breach in the past 12 months.” So it seems the US isn’t alone in having this problem.

· “ ‘Time and again, as organizations investigate the sources and causes of malicious cyber attacks on their infrastructures, they discover that more often than not, the attack vector is within the infrastructure owned by third-party partners,’ said Debora Plunkett, who sits on the BlueVoyant board of directors and was formerly the NSA’s director of information assurance.”

· “A third of the survey respondents said they had no way of knowing if a risk emerged in a third-party’s operations, while only 31 percent said they monitor all vendors, and only 19 percent monitor just critical vendors. (According to the report, U.S. organizations use an average of 1,420 vendors.)”

· 31% of US survey respondents monitor all of their vendors. 19% monitor just the most critical vendors. 33% don’t monitor their vendors at all.

· Only 42% of respondents said they work with their vendors to fix problems they have identified, which is interesting. If you know an important vendor has security problems, why wouldn’t you at least follow up with them to see how they are coming on fixing them? That’s the best way to make sure they do something about the problems. Simply requesting that the vendor do something – whether in contract language or just by a phone call – doesn’t in itself mitigate any risk. The vendor has to do what they said they’d do. Maybe you won’t be able to get them to keep their promise, but you need to try (and BTW, not following up could result in a CIP-013 violation).

· On the other hand, 86% of respondents said their budgets for third party risk management were increasing. Which is good, of course.