Sunday, May 31, 2020

Joe Weiss solves the mystery!



Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


The Wall Street Journal, E&E News and this blog have all written about the fact that a large transformer, custom-made in China for the Western Area Power Administration, was transferred from the port of Houston – where it arrived last summer – to Sandia National Laboratory (owned by the Department of Energy) in Albuquerque, New Mexico; it evidently remains there as of today, presumably being examined for…what? What could be planted in a transformer, that would pose a threat to the Bulk Power System? And was it something that was found during this examination that led to the May 1 Executive Order? None of us know the answer to that question.

However, it seems that longtime control system security guru Joe Weiss knew about the transformer being diverted more than two weeks before the WSJ article, judging by the date on this post by him. Not only that, he says he knows exactly what was found by the examiners – and he is quite definite that a serious problem was found. There’s only one problem with what he says: it doesn’t make sense.

I’ll let you read Joe’s post (and he certainly makes some good general points in it; I’m not disputing those). Here are the problems I’ve found with it:

1.      In the fourth paragraph, he says “Government and public utility procurement rules often push organizations into buying equipment due to price and without regard to origin or risk. In this case, it resulted in a utility having to procure a very large bulk transmission transformer from China.” I pointed out recently that utilities definitely don’t procure sensitive grid equipment based just on cost. But in this case, there are two additional problems with Joe’s statement.
2.      The first of these is that the utility, Western Area Power Authority (WAPA) isn’t strictly speaking a utility at all. It’s one of four Power Marketing Agencies owned and run by the Department of Energy; WAPA’s job is to distribute power from federal dams to cooperative and municipal utilities in the West. And I can assure you that WAPA wouldn’t think two seconds about what to do, if someone informed them that the transformer they were about to buy could be purchased for less somewhere else, but with perhaps a lesser degree of security. Of course, Joe didn’t seem to know, when he wrote the post, that the “utility” was WAPA, but the same can be said for any other utility. It’s too bad to see this old canard still alive.
3.      Joe continues to say “When the Chinese transformer was delivered to a US utility, the site acceptance testing identified electronics that should NOT have been part of the transformer – hardware backdoors.” First off, the WSJ article makes clear that the transformer was never even delivered to WAPA – it went right from the port of Houston to Sandia National Labs. But this in itself doesn’t invalidate Joe’s point that a “hardware backdoor” was discovered, since he may not have known this.
4.      But I’ve never heard of a “hardware backdoor”. I have only heard of software backdoors; these are a big supply chain risk, as various entities like Juniper and Delta Airlines have found out to their chagrin. Since I’m sure Joe doesn’t mean a literal back door in the housing of the transformer, he must mean firmware (i.e. software that is embedded in chips, not read from a storage device like a hard drive) that controls a microprocessor performing some function within the transformer. But as Kevin Perry and I have pointed out in this post and this one, there is no microprocessor[i] that controls the transformer in any way; at most, there’s usually one that reports operational data out to the control center. So Kevin’s and my question from yesterday remains: Where is the microprocessor that’s going to be affected by this “backdoor”?
5.      Joe goes on to say, in the same paragraph “It is unclear just how widespread the impact of compromised transformers and other grid equipment are (sic) though it is safe to say it is more than just one transformer. Could this be considered an act of war?” Sure it could, if this “hardware backdoor” were found in multiple transformers. But first I want to know what this miraculous hardware backdoor is, which seems to be able to cripple a transformer without having a microprocessor to run on.
6.      The next paragraph begins “The need for having spare transformers started almost 20 years ago because it was recognized these very expensive, long-term procurement items could have a major impact on grid availability. However, unless the devices that are inside or supporting the operation of the transformers (and generators, motors, valves, capacitor banks, etc.) are also addressed, the pool of spare transformers and other large equipment can be quickly exhausted by damaging the equipment from “within”.”
7.      Wow! This one is the mother of all FUD. Let’s try to unpack it. Joe talks about “devices that are inside or supporting the operation of the transformers”. Then he lists four “devices”; none of them are either found inside a transformer or support it. He’s correct that all of these devices have something to do with electricity, but that’s about all they have in common with a transformer. And his phrase “the pool of spare transformers and other large equipment can be quickly exhausted by damaging the equipment from ‘within’” seems to say that spare transformers – which of course won’t be connected to the grid at all – will be “exhausted” because of some unnamed attacks (perhaps the “hardware backdoor” attacks?). Or something like that. But who cares what this means? It sure sounds serious!

Finally, Joe brings up the Aurora vulnerability, which was used in a demonstration by Idaho National Labs in 2007 to cause a generator to literally blow itself to pieces. In fact, what could be considered the summation of his whole argument is printed in boldface type: "What the Chinese did was install hardware backdoors that can cause an Aurora or other type of damaging event at a time of their choosing.However, the Aurora vulnerability affects rotating equipment like the generator. It couldn’t affect a large transformer at all, since there are no moving parts in a transformer[ii], rotating or not.

Joe is obviously aware of this objection, since he goes on to say “Remotely accessing the protective relays can cause an Aurora event damaging the transformer and AC rotating equipment such as generators and motors connected to that substation. What the Chinese did was install hardware backdoors that can cause an Aurora or other type of damaging event at a time of their choosing.” So it seems the “hardware backdoor” – embedded in firmware that controls the non-existent processor that “controls” the transformer, even though the latter is controlled by nothing other than the laws of physics – is somehow able to damage not only the transformer, but generators and motors “connected” to the substation. Yet nobody has even suggested before that Aurora could damage anything more than the generator it directly attacks. I sure don’t understand this, but that obviously means this is a super-serious problem! Maybe we should call in the air force...

As Kevin pointed out in an email, “…the Aurora test is designed to destroy large rotating machines, such as generators, by connecting them to the grid out of phase.  120 degrees out of phase produces maximum damage.  No such vulnerability exists with breakers, transformers, and the like.  I have never seen a phase synchronization process for closing a breaker and energizing a transformer.”

But here’s another reason why it’s not believable that the people at Sandia found something really amiss with the transformer: There would surely have been some sort of notice to the industry, since presumably this whatever-it-is would be found in other Chinese transformers as well. If it’s such a big threat to the grid, you don’t want to hide the news. Of course, since they would undoubtedly be classified, the authorities wouldn’t publish the details in the newspaper; but they would set up classified briefings, etc. And the notifications of these briefings would go to the entire utility community. Neither Kevin nor I have heard anything about this.

And here’s yet another reason: DoE held a couple briefings for the industry after the EO came out. In those briefings, they bent over backwards to assure the listeners that nothing needs to be done now, other than what they’ve always been doing. This hardly sounds like the EO was issued in response to some grave danger.

So definitely take everything that Joe says in his post with a grain of salt. Unless, like me, you’re on a low-sodium diet. Then skip the salt.

Tom 5/31: Orlando Stevenson of NERC pointed out in a comment on Friday's post that tap changers have their own microprocessor-based controllers. If that were to be compromised and the tap changer itself malfunctioned, there could be a BES impact, although this would probably have to occur in multiple substations simultaneously. Kevin agrees with that, although he points out that the controller is always external to the transformer itself, and sometimes it resides in the substation control house (and it is sometimes made by a different manufacturer than the manfuracturer of the transformer. For example, GE makes a tap changer controller that works with multiple manufacturers' transformers, not just their own). And BTW, Kevin - ever the auditor! - adds that these tap changer controllers should be identified as BES Cyber Assets, since they could have a 15-minute BES impact.

So this means there might be a way for the Chinese to affect the BES through a transformer, by planting malware in the external tap changer controller (and remember, they'd have to do this in multiple transformers in multiple substations, in order to have a BES impact). But now I have to go back to the question I asked in yesterday's post: Why on earth would the Chinese want to do this, since it would likely be interpreted as an act of war?


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!


[i] Dick Brooks pointed out to me that Field Programmable Gate Array (FPGA) chips can execute commands, so a microprocessor isn’t necessarily required. But it still comes down to the fact that the transformer doesn’t operate based on controls; it operates according to the laws of physics. The only thing that Kevin and I can think of, that could impact the operation of the transformer, is if the microprocessor/FPGA activated a bomb to blow the transformer up.

[ii] Other than a tap changer. But these aren’t found in large transformers like the type in question.

Saturday, May 30, 2020

How could someone remotely attack a transformer?


Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


After I wrote yesterday’s post, Kevin Perry provided some important information to me about how a large transformer could or couldn’t be attacked remotely. As I pointed out in the post, the transformer isn’t controlled by a microprocessor but acts on its own, guided by the laws of physics. However, there is at least one processor included with the transformer, and that’s the one that gathers data from the sensors monitoring the transformer’s operations (including dissolved gases, temperature and the crucial oil level) and transfers it back to the control center.

Kevin pointed out to me that the processor that does the latter work isn’t normally part of the transformer but is external to it in a separate device. Sometimes, this is a fairly intelligent device that can communicate back to the control center or substation engineering, where the data are analyzed. These devices usually aren't manufactured by the transformer manufacturer; in fact, the device he’s seen most often in his audits is made by GE, even when the transformer itself isn't one of GE's.

There are also less-intelligent devices that gather sensor data from the transformer and send it to the RTU (Remote Terminal Unit) in the substation. The RTU then forwards the data to the SCADA/EMS system in the control center, where it is analyzed - and an alarm is generated if something is found to be suspicious. In either case, the analysis of the data is looking for indications that the transformer is operating outside its normal bounds, such as by having a low oil level or overheating (which of course could be caused by a low oil level).

Kevin believes the processor in this external device is the only processor that is associated with the transformer, either inside or outside the transformer’s casing itself (some transformers have a tap changer, which is controlled by a microprocessor. But they are smaller models, not the type we're talking about here). Obviously, if this device is made by a non-Chinese company like GE, it’s very hard to see how the Chinese could embed malicious code into it. It would only be if the device were made by the same company that made the transformer that there would there be an opportunity to do this.

Yet since this device doesn’t control the transformer itself (as I’ve said, the laws of physics do that), what damage could be done, even if it did have malicious code embedded in it? The only thing that Kevin can think of is that the device would somehow manipulate the data gathered from the sensors to present a false picture of the transformer’s health to the control center.

But what would the attack do to the data? Would it make it look like the transformer is in trouble? In that case the control center would just dispatch someone to find out what’s wrong. They’d see the transformer is working fine and they’d instruct the control center not to trust the data until the problem is found.

The only way for the monitoring data to actually cause a problem would be if it were changed in such a way as to make the control center believe there's no problem, when in fact the transformer is having a problem. That way, the control center wouldn’t send someone to check a problem out, since they wouldn’t know about it. And if the problem were due to the transformer overheating, the transformer might fry itself before the control center knew anything was wrong. But the problem with this scenario is it requires someone physically damaging the transformer itself, since as I’ve said there’s no way to attack it by purely cyber means.

There is a good analogy to this situation: the Metcalf attack in 2013. In that attack, someone fired high-powered rifles at the transformers in the Metcalf substation that serves Silicon Valley. Their goal was to drain the oil out of the transformers, so they would be in danger of overheating. Before the attack started, the attackers cut the communications cables that let the control center "see" what was going on with the transformers. They did this so that the control center wouldn't remotely shut down the transformers before they'd fried themselves.

The attackers succeeded in draining the oil out of most of the transformers, but when the control center realized communications with the Metcalf substation were lost, they dispatched people to find out why. They arrived quickly and saw the drained oil – then shut the transformers (and the whole substation) down, preventing the disastrous outcome of the transformers frying. That would have been hugely expensive - much more than the actual $30MM total cost of the attack - but most importantly would have resulted in the substation being out of commission for many months or even a year, because of the long lead time for getting new large transformers. Of course, this is because these transformers are always custom built, and no large transformers are currently made in the US. They're made either in Europe or - dare I say it - China.

Metcalf was a “successful” attack, since it shut down the substation for months and cost PG&E a lot of money to fix. But it never caused even a local outage, let alone a BES incident. And most importantly for our story, it required someone to be onsite shooting at the transformers. There was no way this attack could have been executed purely remotely.

The only way that Kevin and I can currently conceive of a purely remote attack on a transformer would be if a microprocessor with a bomb were attached inside the transformer housing, coupled with a satellite or cell phone transceiver. That way, a signal could be sent by satellite or through the cellular network, and the transformer would blow up. But as I said in my post yesterday, an attack on the BES would require having a number of rigged transformers already deployed on the grid (meaning the Chinese company would have to have been installing the bombs in transformers going to the US for at least a few years), and sending the signal to at least a few of them. One lost transformer probably won't result in any outage at all, or at the most a short local one.

If this happened, it would be immediately recognized as China's responsibility (since all the transformers that blow up would be Chinese) and would be taken as an act of war; this would inevitably go badly for China. This might make sense if we were in an active war with China now. But remember, they would have to have been installing these bombs in transformers for at least a few years. If a single one of those bombs had been discovered, that itself would probably have been considered an act of war. It's very hard to see why China would ever even consider doing this.

Bottom line: Kevin and I don’t see a way to cause a BES incident through a purely cyber attack on large transformers, like the one seized last year by the Feds.

Tom 5/31: Orlando Stevenson of NERC pointed out in a comment on Friday's post that tap changers have their own microprocessor-based controllers. If that were to be compromised and the tap changer itself malfunctioned, there could be a BES impact, although this would probably have to occur in multiple substations simultaneously. Kevin agrees with that, although again I see the second-to-last paragraph of this post as being the overriding one: Why on earth would the Chinese want to do this, since it would likely be interpreted as an act of war?


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!



Friday, May 29, 2020

The plot thickens


Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


On Wednesday evening, Rebecca Smith of the Wall Street Journal published a great article on what has to be one of the stanger events in the ongoing story of efforts to improve the cybersecurity of the US electric power grid: Last summer, federal officials took control of a very large transformer that had arrived at the Port of Houston. It was custom built by the Jiangsu Huapeng Transformer Company in China for the Western Area Power Authority (WAPA) and was intended for installation in a WAPA-owned substation outside of Denver. It was taken to Sandia National Laboratory (part of the Department of Energy) outside of Albuquerque, NM and closely examined there. It most likely is still there.

Why did the authorities do this? Of course, nobody at Sandia or DoE would comment. The article does say “Other people, with more limited knowledge of the situation, said federal officials probably commandeered the transformer because they suspected its electronics had been secretly given malicious capabilities, possibly allowing a distant adversary to monitor or even disable it on command. But these people said they didn’t know whether any such alterations were found.” But exactly how would “malicious capabilities” – presumably malicious cyber capabilities – be embedded in a transformer?

Three weeks ago, in a post written with Kevin Perry, I wrote

 ..transformers are extremely important to the grid, since the grid wouldn’t work without them – in fact, news articles I’ve seen consider these to be a big target of the order. Yet these don’t have microprocessors. They don’t need direction in order to do their job, either; the laws of physics give them almost all the instructions they need. My friend Kevin Perry wrote “There may be some new, smart transformers that have microprocessors, but as a general rule, I don’t think the high voltage transformer has electronic systems that can be hacked.  At best, there are sensors throughout the transformer that allow operating conditions to be monitored.  That is not much different than the transducers scattered around a generating plant.  To the extent the transducer voltage output can be recalibrated to produce false readings is about the only issue I am aware of.  But usually you need to be in close proximity to be able to manage such a device.”

Rebecca says something similar in her article:

Federal officials have long worried that foreign adversaries might hack into the utility computer networks that control power flows on transmission lines and cause blackouts.

However, transformers hadn’t typically been seen as products that could be easily isolated and hacked. That is because they don’t contain the software-based control systems that foreign actors could access. They are passive devices that increase or reduce voltages in switchyards, substations and on power poles according to the laws of physics.

So it seems the problem that the government was looking for wasn’t a cyber problem at all, but a physical one. Of course, there are certainly lots of ways a transformer might be rigged to suddenly start malfunctioning at some point, or even to blow up. If a transformer like this failed, it would certainly cause problems for the power distribution system in the area served by that transformer; there might even be an outage as a result of it.

But local outages happen all the time. If you’re looking to greatly reduce the threat of local outages, I recommend you focus on the number one cause of those outages: squirrels (although Kevin pointed out to me that in larger substations, snakes are a bigger problem). If a genetic modification were introduced into the squirrel population so they no longer see insulated wires as a possible food source, that would be a huge step forward in the fight against local outages. But local outages that are caused by substation events are rare, since the grid has tremendous redundancy built in. Even if all the transformers in a substation are brought down (as was the case with the Metcalf attack in California in 2013), that usually wouldn’t lead to an actual outage (for example, there was no outage due to the Metcalf attack).

However, when we talk about attacks on the power grid, we’re not talking about a local outage, but some sort of event – the worst being a cascading outage like the 2003 Northeast Blackout - on the Bulk Electric System (or Bulk Power System, the term used in the Executive Order) itself. This is the network of high voltage power lines and substations that moves power around the country and feeds it into local distribution substations. A true BES attack would have to affect multiple transformers in multiple substations at the same time.

In principle, a BES event could be accomplished by a cyber attack. If a number of these transformers were microprocessor-controlled and connected to a routable (IP) network, and if all of these had some sort of malicious logic embedded in software or firmware, a foreign attacker could in theory send a signal and cause a number of these to go down or malfunction at the same time, which might cause a large-scale grid event. However, transformers aren’t microprocessor-controlled, so this isn’t a realistic scenario.

The WSJ article pointed out that the transformer has sensors to monitor the level of insulating oil (and presumably some sort of microprocessor and communications link to relay that information to the control center), since transformers generate a lot of heat, which needs to be carried away to keep the unit from frying. In theory, the sensors could be recalibrated to report false readings, but as Kevin pointed out in the last three sentences of his quote above, that would almost certainly require someone being onsite to recalibrate the sensors.

Note from Tom 5/29: Kevin provided some good clarifying information to me this afternoon, which I'll pass on in a post tomorrow. It doesn't change any of the conclusions of this post, but it does fill in a few logical gaps.


There are also some smaller transformers that have tap changers, which are controlled by a microprocessor. It's unclear what harm could be caused by misusing a tap changer, but in any case, we don't believe one would be found in a large transformer like the one in question.

Since there’s nothing in the transformer normally that could be the vehicle for a successful cyberattack, it seems the people at Sandia must be looking for some logical device that was implanted in the system in China. What would this device do? It might trigger a bomb to blow the transformer up. It’s hard to see anything else that it could do, since the transformer operates according to the laws of physics – it doesn’t need any sort of commands to operate.

But remember, just having one transformer blow up isn’t going to cause a BES event, and it probably won’t cause even a local outage. There would have to be a coordinated attack on multiple transformers. The article says that Jiangsu Huapeng has installed about 100 of these units in the last decade in the US and Canada. If a significant number of these had some sort of microprocessor attached to a bomb inside of them, some of them might blow up at the same time – which would probably cause a big problem for the grid. But for that to happen, they would all have to receive some signal telling them it’s time to blow up. Since this implanted processor is unlikely to have an external communications port (which would be immediately noticed), then it would need some sort of satellite receiver embedded in it.

So yes, if Jiangsu Huapeng has been implanting these devices in their transformers for some time, and if nobody has ever noticed them before this, then a BES attack might be possible. But here’s the bigger question: What possible benefit would the Chinese reap from conducting such an attack? If it happened, it would be immediately traced back to China and would be rightly treated by the US as an act of war.  And given the relative sizes of the two countries’ militaries, I’d say China’s guaranteed to come out on the worse end of the deal.

Of course, the real question is whether the folks at Sandia have found anything wrong in the transformer. If they had, US utilities – especially the ones who have purchased these transformers – would presumably have been notified immediately. And that would explain why the Executive Order was released very precipitously about a month ago. The industry would have been placed on high alert.

Did that happen? E&E News published a very good article today that quotes a number of industry executives saying they were left completely in the dark about the EO until it was published. And the Department of Energy made clear in a couple calls with the industry recently that the EO doesn’t require utilities to do anything different now that what they were doing before the EO was issued - this despite what seems to be clear language in the EO saying that all procurements of Bulk Power System equipment need to be cleared with DoE as of the day of the order. So this is another good indication that no problems have been found so far with the transformer at Sandia.

Ironically, while the government is making a big effort to find a problem which seems unlikely to exist, there’s a serious foreign cybersecurity threat that the government itself has warned about multiple times (as described in this post, this one and this one, and in this WSJ article from 2018), which still has never even been investigated. Why don’t we investigate that one, too? Either we’ll find something, or the industry can sleep a lot more soundly, knowing that all of these reports were wrong.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!



Thursday, May 28, 2020

A different perspective on cloud risks


Note from Tom June 3: I received an email from an AWS spokesman yesterday that disagreed with the statements in this post. Over the next few days, I'll work with AWS and Dick Brooks to figure out whether any of the statements below are wrong. I'll put out a post when this is done, hopefully next week. 


I’ve been carrying on a number of good email conversations with Dick Brooks of Reliable Energy Analytics, who has been dealing with software vulnerabilities and malware since the mid-80’s. In one of them he mentioned some things he’d discovered when he moved his product to the AWS cloud platform. In his words:

Amazon’s PAAS offering is called EC2. This is where you’re given a choice of platforms to choose, which form the basis of your operating environment. In my case I chose the most up to date offering, which is a Linux platform running Python 3.6. NOTE: this version of Python is 2 releases behind the current offering, 3.8. I could upgrade to the latest version of Python after selecting the platform, but that requires me to perform the upgrade.

The default web server that automatically starts on the platform has no protection. I received a deluge of web vulnerability attempts from lots of different IP addresses, which I had to guard against.

I don’t know about other cloud offerings, e.g. SalesForce which offers SAAS solutions, but I can say without a doubt that I had to take security and upgrades into my own hands on Amazon’s EC2 PAAS offering.

Of course, the fact that Amazon’s customers need to take security into their own hands isn’t a surprise to most of us. I discussed that in my posts prompted by the Capital One breach last year, including this one. But it’s certainly worrisome that they don’t even bother to take the step of putting minimal protections on the default firewall that comes with their PaaS solution, or that they don’t have the current version of Python running (which would presumably have the most recent security updates), in that solution.

The problem is that questions like this are probably not asked in FedRAMP, yet these are omissions that could come back to bite the Amazon customer. Where is that customer going to learn about problems like this? Note that this isn’t a question of whether the customer will use Amazon or not – it’s a question of how they can learn about the problems beforehand, rather than after they’ve been hacked.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!



Wednesday, May 27, 2020

The NATF Questionnaire: Deciding which questions to use (part II)



Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


This is the second part of last week’s post, on NATF’s recently-released set of questions to vendors; it is designed for electric utilities, and especially those that have to comply with the upcoming NERC CIP-013-1 Reliability Standard. That post discussed the fact that a large number of NATF’s questions (more than half) don’t, in my opinion, address significant risks to operations – and CIP-013 is all about operational risks, namely risks to the Bulk Electric System.

At the beginning of that post, I listed three reasons why I try not to include questions that don’t address significant BES risks in my questionnaire (which I’ve developed with input from my CIP-013 clients). The first was simply that it wastes a lot of time – both the vendor’s time, since most of these questions require research to develop an answer (sometimes significant research), and the NERC entity’s time, since each answer needs to be evaluated (and if you’re not going to evaluate the answer, why did you ask the question in the first place?). I addressed this reason in part I.

But while wasting time is certainly undesirable, this is the least important of my three reasons. The other two are:

  1. There are important risks that I and my clients have identified that aren’t in NATF’s list, although we believe they’re important to address with questions. More generally, every NERC entity faces its own set of risks, and they shouldn’t feel they can’t ask any more than what’s in the NATF questionnaire.
  2. As I pointed out in this post, asking unnecessary questions increases compliance risk for CIP-013. So asking questions about risks that don’t impact the BES can literally lead you into non-compliance, as I’ll discuss in part III of this post.
Let’s look at the first of these two reasons. After going through all of the NATF questions, I found about 25 of my questions that aren’t addressed at all in NATF’s questionnaire. This includes:

a.      Does your product require authentication of firmware updates? The fact that the RTU’s in the substations that were attacked in the Ukraine in 2015 didn’t require authentication for firmware updates allowed the Russians to brick them.
b.      Do you require separate authentication for access to your software development network and/or hardware manufacturing network?
c.      Will you inform us within 5 days of any new vulnerability discovered in any third-party or open source component of your software or firmware, whether patched or not?
d.      Does your security policy prohibit the use of binary or machine executable code for which you are unable to verify the integrity of the software?

Of course, nobody has to agree with me that these questions address significant risks to the BES. But if you do, wouldn’t you want to ask them, instead of questions that you may not believe pose a significant BES risk?

You might ask “What’s to stop me from asking these questions, along with all the NATF questions?” Of course, there’s nothing to stop you, and in fact two of the vendors listed in the webinar as being on board with the NATF questionnaire (SEL and OSI) told me recently (in the posts just linked) that they’ll be glad to answer any questions provided to them. On the other hand, I’m sure they’d both like you to look through their answers to the NATF questionnaire (which I’m sure they’ll make available to customers, although perhaps not the general public) first, to see if some of your questions have already been answered there.

However, I know at least a few vendors have said at various times – and this was said during the NATF webinar, although not by a vendor – that they just want to have a single questionnaire that the whole industry will use, and they’d prefer not to answer any questions not on that questionnaire. This raises the possibility that some vendors will simply refuse to answer any questionnaires that include questions not in the NATF questionnaire.

Here’s my opinion on this issue: This is a free country (at least it was as of this afternoon at 5:06 PM Central Time). If a vendor doesn’t want to answer one of your CIP-013 questions – even though you think it addresses an important BES risk – that’s their prerogative. However, CIP-013 still requires you to assess the vendor on this risk; if they won’t cooperate, then you should probably assume (unless you have good reason not to, of course) that they likely pose a high level of risk for the questions that you asked.

This means you should take steps to mitigate these risks on your own (as NERC asserted in their CIP-013 FAQ). The strongest mitigation is to stop buying from this vendor altogether, although that’s often impossible to do. In that case, you would want to implement a mitigation that is something you can totally control. To use the example of the risk that the product will allow unauthenticated firmware updates (the basis for the first of the four questions I listed above), you could restrict physical and electronic access to the facility where the product will be located.

And this happens to be one mitigation that you almost certainly have already implemented, since any Medium or High impact BES asset is subject to compliance with CIP-005 and CIP-006, and presumably has very good controls in place for physical and electronic access. You just need to document that fact, and IMHO you shouldn’t feel obligated to implement any further controls. But you won’t always be this lucky.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!




Tuesday, May 26, 2020

Interesting comment on yesterday’s post



Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


This morning, I got a very interesting comment on my post from yesterday, in which I took back – based on comments from Kevin Perry – my assertion in a post last week that any NERC entity who decided to challenge an assessed NERC CIP violation in the administrative law courts would very likely win. I said last week that this is because of the many cases in the CIP standards in which key words or requirements were left out – but are nonetheless required for compliance, as in the case of the missing word “mitigate” in CIP-013-1 R1.1.

I concluded with this paragraph:

I agree that Kevin’s right. However, I’m not backing away from the last sentence of the post: “This means that sooner or later, the NERC community is going to realize that the standards need to be rewritten from the bottom up, as I discussed in this webinar last year.” My case for saying the CIP standards need to be rewritten doesn’t rest at all on legal grounds, and I shouldn’t have implied that it does. I agree that Kevin’s right. However, I’m not backing away from the last sentence of the post: “This means that sooner or later, the NERC community is going to realize that the standards need to be rewritten from the bottom up, as I discussed in this webinar last year.” My case for saying the CIP standards need to be rewritten doesn’t rest at all on legal grounds, and I shouldn’t have implied that it does.

The comment came from a senior cybersecurity officer at a large organization. I have known him for a number of years and have great respect for his opinions. He appeared to be cueing off that last paragraph, when he said “A very interesting short post.  I think the larger point is the fact that the RE’s (Regional Entities) are so unsure of the CMEP program that they have to resort to these tactics to protect it.  That in and of itself argues for a reconsideration of the standards.”

What does this person mean when he says, “these tactics”? I think he means Kevin’s statement, “The Regions will often give the benefit of the doubt to the entity if there is any chance that the entity reasonably interpreted the expectations of a vaguely or incompletely worded Requirement.” In other words, Kevin was saying that any Potential non-Compliance (PNC) finding, which is based on an ambiguously or incompletely worded Requirement (possibly including CIP-013-1 R1.1), would be dismissed during the Region’s review of the PNC, before an actual violation was identified.

In his comment about “these tactics”, my friend was saying something like, “It’s distressing that Regional Entities would run into so many cases in which they were faced with the choice between dropping a potential violation and having an Administrative Law Judge rule in the entity’s favor because the requirement was ambiguous or omitted key points. This shows the CIP standards need to be reconsidered and rewritten.”

However, I also want to point out that, when I’m talking about reconsideration of the CIP standards, I’m not talking just about restoring words that were left out, fixing ambiguous phrases, etc. I’m saying that the CIP standards should all be written something like CIP-013 (although with the word “mitigate” in place!). Specifically, they should all state a goal (e.g. securing the BES supply chain, in the case of CIP-013) and allow the NERC entity to determine the best way to achieve that goal, based on a) their own particular environment and b) considerations of risk. If you would like to hear more about this, take a look at this webinar.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!



Monday, May 25, 2020

What happens if you take NERC to court?



Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP post, you’ve come to the right place.


In my post last Thursday, I concluded with this paragraph: “But that’s not the end of this story. This just demonstrates that a good part (or even all) of the NERC CIP regulatory program hangs on very tenuous legal grounds. If one or two entities want to seriously challenge NERC on these grounds, the whole NERC CIP program might be brought crashing down. This means that sooner or later, the NERC community is going to realize that the standards need to be rewritten from the bottom up, as I discussed in this webinar last year.”

Earlier in that post, I’d pointed out that a NERC entity who gets a violation and has made no headway getting NERC’s Enforcement group to change their mind on it, can always file a suit in the administrative court system (since NERC standards are regulatory law). I opined that, in a question like the status of “mitigation” in CIP-013 – where the standard clearly assumes the NERC entity will mitigate risks that are identified, but where the word “mitigate” was actually left out of the requirements – an administrative law judge (ALJ) would probably rule in the entity’s favor, without having to think too hard about it.

However, Kevin Perry, former Chief CIP Auditor of SPP Regional Entity, emailed me over the weekend that he very much disagrees with that position. He said:

In your scenario, you suggest that the entity will prevail before the ALJ and their violation (and fine) will be thrown out.  To that point, I very much disagree.  I am very confident that no ALJ would overturn this violation unless the Region totally bolloxed up their case in front of the judge.

My point is the Region Enforcement staff are not likely to allow a contested violation to get that far, if there is any chance the entity will prevail.  In my experience, Enforcement has overturned violations found at audit -some Regions more than others.  It is all part of the checks and balances built into the CMEP process.  It is very unlikely a violation will ever get to a hearing (before an ALJ) unless the Region is confident that its view of the compliance issue is correct and can be persuasively argued in court.  The Regions will often give the benefit of the doubt to the entity if there is any chance that the entity reasonably interpreted the expectations of a vaguely or incompletely worded Requirement. 

So Kevin’s point is that neither the Region nor NERC wants to have to defend a less-than-solid case in front of an ALJ, mainly because of the huge cost in time and money of doing so. This means that, if there’s any question about whether they’ll win or not, they’re likely to drop the violation before it even gets that far. But in a case where they’re quite sure their position is correct, they’re not likely to lose in court – unless they totally botch their case.

I agree that Kevin’s right. However, I’m not backing away from the last sentence of the post: “This means that sooner or later, the NERC community is going to realize that the standards need to be rewritten from the bottom up, as I discussed in this webinar last year.” My case for saying the CIP standards need to be rewritten doesn’t rest at all on legal grounds, and I shouldn’t have implied that it does.


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!



Friday, May 22, 2020

The NATF Questionnaire: Deciding which questions to use (part I)


Note from Tom: If you’re only looking for today’s pandemic post, please go to my new blog. If you’re looking for my cyber/NERC CIP posts, you’ve come to the right place.


This is my second post on the NATF questionnaire. As I said in the first post, I think it’s a very important document, but it’s certainly not perfect. Its biggest problem is that it has too many questions – by my count, there are about 230, and some of these will require a lot of work to answer (moreover, as discussed below, each of these questions has to be answered 3 times, so there are actually 690 answers required). A supplier is going to have to put in a significant amount of work to answer this questionnaire, and this will be a very significant burden for small suppliers.

But my objection isn’t really the sheer number of questions. My objection is threefold:

  1. A large number of these questions don’t address important risks to operations, and specifically to the BES.
  2. There are important risks that I and my clients have identified that aren’t in NATF’s list, although we believe they’re important to address with questions. More generally, every NERC entity faces its own set of risks, and they shouldn’t feel they can’t ask any more than what’s in the NATF questionnaire.
  3. As I pointed out in this post, asking unnecessary questions increases compliance risk for CIP-013 (which I’ll elaborate on in part III of this post). So don’t ask any question that doesn’t address an important risk to operations.
I’ll discuss each of these points in turn. In this post, I’ll discuss the first point; I’ll discuss the other two in parts II and III respectively, coming soon to a blog near you.

I’m assuming you’re using the NATF questionnaire either for compliance with NERC CIP-013-1, or else more generally for your program to address cyber security of your operational technology supply chain (either in the power industry or in another critical infrastructure industry like gas pipelines or oil refining). As I’ve said before, I don’t think you should ask any supplier a question if it doesn’t address an important risk to the BES. No risk (or not a significant risk), no question.

Just about every cyber security questionnaire I’ve seen addresses primarily IT security risks. While those certainly need to be addressed, they shouldn’t be addressed in the same questionnaire as OT security risks. But how do you identify an IT risk? For every question in the questionnaire, I asked myself “Is there a real possibility that, if a supplier didn’t mitigate this risk, there could be an impact on the BES?” And when I say “real possibility”, I mean “If you score the likelihood of this risk as low, moderate or high, is the likelihood of a BES impact moderate or high?”

The main difference between IT and OT risks is that the former focus on the confidentiality and, to some extent, the integrity legs of the CIA triad, whereas the latter focus on the availability and, to some extent, the integrity legs. This is because the primary goal of IT security is to protect data stored in IT systems, while the primary goal of OT security is to protect the availability of OT systems. This isn’t to say that protecting data is unimportant with OT systems, since a supplier might be holding confidential information on those systems (e.g. how they’re configured or their IP address), but the fact is that OT systems are in place to operate machinery, etc. in the real world, not to store and process data.

Therefore, after first running through the questions in the questionnaire (I consider the questions to start with the “Qualifiers” section and include everything beyond that), I divided them into the following groups:

The first group was questions based on the NATF Criteria v1. I consider all of the Criteria address significant risks to the BES, so I had already incorporated them into my list of questions (and also my list of risks). However, I found that in some cases NATF’s wording of the question was an improvement on the wording of the criterion itself, so in those cases I used NATF’s wording. I counted 57 of these questions in the NATF spreadsheet.[i] One example of these is “Do you implement encryption or technologies to restrict access to and obfuscate data in transit (e.g., cryptography, public key infrastructure (PKI), fingerprints, cipher hash)?” This is question DATA-03, which addresses criterion 42.

The second group was questions in the NATF spreadsheet that I identified as addressing significant risks to the BES (by asking myself the likelihood question above), that aren’t included in the NATF criteria. Some of these were already identified in my lists of risks and questions, although even in those cases I often combined my question with NATF’s (I found a number of cases where I didn’t think NATF’s wording adequately described the risk). I counted 26 of these. An example of one of these is “Have your systems undergone third-party penetration testing?” This is VULN-18.

The third group is “essay questions”, which I discussed in this post. I listed three of these, but there were more that fell into one of the other groups, so this is an undercount. My guess is there were 10-20 in all.

The fourth group is duplicates, of which I found four (i.e. they were almost the same as other questions in the same spreadsheet – perhaps they were drawn up by two different people).

The fifth group is general business questions, such as “Describe how long your organization has conducted business in this product area.” (COMP-04) These are general business questions which don’t address risks at all. Of course, they definitely need to be asked, but I assume that every NERC entity already has a set of standard questions like this that they already ask all suppliers, whether OT or not. Business questions should be in a general business questionnaire, not one that assesses OT risks. I counted 16 of these.

The sixth group is product feature questions, such as “Does the computing system support client customizations from one release to another?” (CHNG-11) Of course, it’s very important to ask questions like this before you buy any product for any purpose; this is because you need to decide whether the product meets your purpose. Again, I can’t imagine that any NERC entity isn’t already asking questions like this anyway; they need to be incorporated in a questionnaire on product features, not this one. I counted 25 of these.

I do want to point out that there are some questions that on the surface appear to address security risks but really don’t; they also fall in the above category – i.e. these are really questions about product features, which happen to be security features. An example of this is “Does your computing system support role-based access control (RBAC) for end-users? (Depending on type of computing system, this may be your users internally, or potentially client users of your product.)” This is IAM-26.

Of course, RBAC is always a good capability to have, from a security point of view. But it certainly isn’t always necessary. For example, if there are only a few people who will ever be allowed to access a particular system, it’s obviously a waste of time to go through the process of creating a special role for them and assigning only that role to this system; it’s much easier just to list those people as the only ones who can access the system. There’s certainly no problem with having this question in your standard feature questionnaire, if it’s not there already. But including this question in your CIP-013/OT risk assessment questionnaire both wastes your and the supplier’s time and increases your compliance risk, for no good reason.

The last category is questions that address legitimate security risks that aren’t likely to have an impact on the BES. This category is by far the most numerous, with 97 questions by my count. Almost all of these are “IT” questions that would be perfectly legitimate in a questionnaire used for IT suppliers, which I don’t think should be included in an OT-focused questionnaire. I’ll give a few examples of these.

First example: Question THRD-01 reads “Describe how you perform security assessments of third-party companies with which you share data (i.e., hosting providers, cloud services, PaaS, IaaS, SaaS, etc.). Provide a summary of your practices and/or controls that assure the third party will be subject to the appropriate standards regarding security, service recoverability, and confidentiality.” Of course, this is a very important question to ask a supplier of say data services. They’ll presumably store some of your data, and they may well do it in the cloud. But I can’t think of a data services provider that would be considered to provide services for BES Cyber Systems, which is of course what brings a service vendor into scope for CIP-013.[ii]

OT providers sometimes will want to store information on your systems or networks that relate to the BES. In general, I think you should push back when they ask to do that (in fact, one of my questions is whether they will need to store data at all, and if so whether they will talk to the NERC entity first and conduct a risk assessment, before doing so). If they convince you that they definitely need to store this data and their plan for mitigating the risks looks adequate, then you should give them permission. But if you have questions about cloud security, you should just forbid them to store the data in the cloud, period.

Second example: Question CSPM-04 reads “Does your organization have a data privacy policy that applies to your computing systems?” Again, this would be a good question for an IT services supplier who is likely to have data that needs to be kept private, in this case especially data regarding your employees’ health, financial information, etc. But unless you’re storing personal health information on systems within your ESP (!), this question has close to zero relevance to the BES.

A third and last example: Question CHNG-06 reads “Do you have a systems management and configuration strategy that encompasses servers, appliances, and mobile devices (company and employee owned)?” Once again, this would be a very important question to ask any supplier of IT or data services – in those cases, you definitely want to make sure that the supplier manages configurations of all their servers, appliances and mobile devices. But given that an OT supplier is unlikely to need to store much data from you – and if they do, you will want to have some sort of agreement with them that addresses how they will protect whatever data they ask to store – IMO this risk, while real, isn’t likely to have significant impact on the BES.

But of course, if you disagree with me on any of these questions and you believe they do address significant BES risks, by all means include them in your questionnaire! The point is that you shouldn’t assume up front that every question in the NATF questionnaire addresses a significant BES risk.

Before I leave this point (and this post), I want to mention that the questions in the NATF questionnaire are actually all “times three” (i.e. there are actually 690 questions, not 230). This is because each question has to be answered three times, once for “Supplier Corporate Systems”; once for “Supplier Product” (i.e. whatever you are buying from them); and once for “Supplier Product Development Systems”. Of these, the most relevant for OT/CIP-013 purposes are the latter two. If you were to tell OT suppliers just to answer for those two areas, they would “only” have to answer about 450 questions. That’s still a lot!

But beyond the sheer number of questions, I think it’s a bad idea to ask a single question for all three areas. This is because the question will often need to be worded differently for each area. The risk will be different in each area, and therefore the question needs to be different as well.

As an example of this, let’s look at CHNG-03, which reads “Do you have a process to assess and apply security patches in your environment within a predetermined timeframe?” Let’s think about how it applies to each of the three areas. For Supplier Corporate Systems, your concern is that the supplier is regularly patching all systems in the company. This question might be fine for that area.

However, for Supplier Product, this question makes no sense. The real question is whether the supplier will provide patches to the NERC entity on a regular basis or else within a certain (short) amount of time (of course, there are a number of other NATF questions - based on the NATF Criteria - that directly address patching for products. As I said, I've already incorporated every one of the NATF Criteria into my list of questions).

For Supplier Product Development Systems – which could pose a substantial risk to the BES if someone attacked them and planted malware or a backdoor in a product– the basic question format would be OK, although it would be important to ask more pointedly what exactly is their timeframe for applying patches. If it’s say six months, you would definitely want to talk to them about this!


Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Are you working on your CIP-013 plan and you would like some help on it? Or would you like me to review what you’ve written so far and let you know what could be improved? Just drop me an email!


[i] Although there isn’t always a one-to-one relationship between my questions and the NATF criteria. For example, I decided that most of the criteria that deal with incident response plans could be combined into one question, of the form “Do you do each of the following…?”

[ii] I’ll admit this is kind of a complicated question, and I might be missing something in making such a blanket statement. If anybody knows of a provider of data services that could actually be considered in scope for CIP-013, I’d love to hear about it (no name needed, of course).