Thursday, April 29, 2021

Finally, some information on the upcoming software EO

 

Today, NPR ran this story about the upcoming Executive Order on software security. And it looks even better than it did when I first wrote about it. Here is what I learned from this article:

1.      I especially like the idea of a cyber NTSB (National Transportation Safety Board). The article points out that there’s currently no agency whose job includes investigating cyber events to find out the causes and recommend remedies. Having this group will be a huge help, although even better would be if CISA were given this power (the article says CISA doesn’t have investigative powers, but I think what they really meant is CISA isn’t currently given explicit responsibility for investigating all major cyber incidents). If CISA had that, they’d build up a body of cross-industry and cross-discipline expertise like the NTSB has – although I hope the new agency doesn’t conduct investigations at the same speed as the NTSB, which seems to investigate on geologic time).

2.      Make no mistake about it: The EO will order regulation of software suppliers. It says “The administration is trying to change the way we all think of code: It isn't just zeroes and ones — it is critical infrastructure.” Those last two words are dear to my heart. I’ve been saying since after the SolarWinds attack that at least some software suppliers need to be regulated as critical infrastructure.

3.      Of course, the administration can’t impose requirements on software suppliers in an Executive Order; that requires legislation. But they can impose requirements on suppliers to the federal government. And I think this amounts to effectively regulating all suppliers, since any supplier that sells to – or wants to sell to – the Feds will need to follow these requirements. And if a supplier follows these regulations for one part of their business, they’ll follow them for the whole business. After all, it would probably be much more expensive to have two different software development organizations, both developing the same software but for different end users, than to just have one, no matter how regulated it is.

4.      What will the feds require of their software suppliers? The only clue from the article is that the EO will define “a set of requirements for the way software is built. Federal contractors will have to prove that they have secure practices like separating where they develop software from the internet, and things like requiring proof of multifactor authentication.”

5.      This doesn’t exactly tell you a lot, but I hope this turns out to be something like the IoT Cybersecurity Improvement Act of 2020, which similarly regulates suppliers of IoT devices to the Feds. It doesn’t do this by imposing prescriptive requirements, but by directing NIST to develop “standards and guidelines” for “use and management” of IoT devices.

6.      The guidelines will fall into four areas: “(i) Secure Development, (ii) Identity management, (iii) Patching, and (iv) Configuration management.”

7.      Note that only the first of these areas applies to suppliers, and even then the regulations won’t apply directly to those suppliers (as the ones contemplated under the EO will, although they won’t apply to any supplier who doesn’t plan to sell anything more to the Feds, even if they’ve sold software to them in the past). But what I like about this is the fact that the “guidelines” will probably turn into something like a framework for IoT suppliers.

8.      Similarly, I hope the requirements in the EO will also be developed by NIST and will be more guidelines than prescriptive requirements. But they need to be come down heavily on the need for secure software development. The fact that the Russians were inside the SolarWinds network for about 16 months, and weren’t discovered until somebody at FireEye noticed a new device had showed up on their account unknown to them, shows that there’s – ahem! – room for improvement there.

9.      However, there’s one thing I hope the new standards don’t mandate, and that’s software bills of materials. I’m told it’s 100% certain that the EO will strongly encourage software suppliers to produce SBOMs, which is good. But actually trying to regulate them now would be a disaster, just as would be an attempt to regulate say solar sails for travel to other solar systems. First let the technology get off the ground (no pun intended) and become widely used, then regulate it. But by all means, do everything possible to encourage SBOMs – maybe even set a date for rulemaking to begin, perhaps three years from now. That will get people’s attention, without suffocating the baby.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Tuesday, April 27, 2021

Back at the ranch – part 3


This is the third installment of my post following up on the Texas power (and financial and human) disaster, which I call the Valentine’s Day Storm. In the second installment, I posted the comments of Andrew Gallo, a former ERCOT employee on the first post, primarily pointing out mistakes (nicely, of course) in my discussion of the ERCOT blackstart plan, and what FERC does and doesn’t regulate in Texas.

Now I’m posting the comments of Kevin Perry, former Chief CIP Auditor of the SPP Regional Entity (next door to Texas, of course). He provides tons of detail on blackstart plans, and provides a great explanation of how the restoration process works after blackstart (which of course wasn’t needed in Texas, but it almost was). He also addresses my assertion (based on news articles I read during the crisis) that the whole ERCOT grid might have been down for months, had a particular system (generation ride-through, which works in conjunction with UFLS) not worked as designed between about 1:50 and 2:00 AM on February 15. My comments on what he wrote are in red.

Here is some enlightenment to pass along to Tom Alrich, Boy Engineer, when you next see him…(oh, the snark! I self-deprecatingly referred to myself this way, and now he repeats it like it’s true! The ingratitude is astounding…After I taught him everything he ever knew about NERC CIP…😊).

Texas RE, the Regional Entity that provides oversight of the ERCOT entities, does enforce the NERC Reliability Standards.  ERCOT is registered as a Balancing Authority, Planning Authority/Planning Coordinator, Reliability Coordinator, Resource Planner, Transmission Operator, and Transmission Service Provider.  So, ERCOT is subject to a large number of NERC Reliability Standards, including those applicable to BAs, TOPs, and RCs.  There are 19 TOPs in the ERCOT Interconnection, and I will admit I am not sure what role ERCOT plays in this space.  I suspect they are contractually the TOP for some TOs that are not TOPs themselves.  That is more common than one might think.  They are the only BA and the only RC.  To save you some time attempting the near impossible task of finding anything on the NERC website (hear, hear! I’ve often said that I know a surefire way to protect critical infrastructure information from ISIS: post it on the NERC website. They’ll never find it there!), the registration files are found at Organization Registration and Organization Certification (nerc.com).  Scroll down to the Registration: Compliance Registry Files ‎(3) link, expand the link, and download the “NCR Active Entities List” spreadsheet.

 So, why is this important?  It is important because system restoration (black start) requirements and applicability are spelled out in EOP-005.  EOP-005 is applicable to Transmission Operators, Generator Operators, Transmission Owners identified in the Transmission Operator's restoration plan, and Distribution Providers identified in the Transmission Operator's restoration plan.  ERCOT is a TOP and therefore must have a system restoration plan.  ERCOT is also a Reliability Coordinator.  The Reliability Coordinator used to be required to develop a detailed Regional Restoration Plan, which was basically a merging of the individual TOP plans.  That was done away with some years ago and today the RC role is to have a high-level restoration plan that mostly addresses TOP restoration coordination, and to review and approve the TOP plans on an annual (or as otherwise agreed to) schedule (per EOP-006). 

So, getting back to the TOP where the heavy lifting is performed, the TOP plan must also be submitted to the RC when it substantively changes.  The TOP must verify that its plan will work through testing, steady state and dynamic simulations, or analysis of actual events at least every five years.  And the TOP must test each black start resource at least every three years to make sure it will start without BES support and can energize a bus to provide the expected cranking power. 

Each TOP has to perform system restoration training to its operators annually.  And applicable TOs and DPs (referenced in the TOP plan) have to have at least two hours of training for their field switching personnel every two calendar years.  SPP (the RC) conducts Regional training annually, which accomplishes the TOP training requirement.  If ERCOT (the RC) is smart, they will do the same.  GOPs with black start resources must provide a minimum of two hours training in starting the resource every two years.  TOPs and GOPs are required to participate in the RC restoration drills as requested by the RC.  O&P violations are published with detail by FERC, so if ERCOT was not complying with EOP-005 (and EOP-006), that information would be readily available within a couple of years of the violation. 

Now, system restoration plans are very detailed, with unit starting sequences and cranking path switching procedures a small portion of the overall plan.  They usually anticipate problems and thus have options.  One entity I am familiar with will try starting each one of its black start units in a defined sequence until they get one to come up, and if all else fails, will wait on their neighbor to provide sufficient energy on the grid to provide cranking power. Another entity I am familiar with relies entirely on its neighbors.  Relying on the neighbor requires defined cranking power switching procedures from defined inter-utility connection substations (tie line points). 

So, what if all of ERCOT fails and none of the black start units can fire up for some reason (and I suspect you have greater chances of getting a blood clot from the J&J COVID vaccine)?  The DC Ties between the Eastern/Western Interconnections and ERCOT come to the rescue (You may believe the ERCOT grid is totally isolated from the two big Interconnects, but there are actually DC ties that can import and export power. However, DC ties require manual activation – they don’t pass power solely according to the laws of physics, like AC ties do. At the height of the crisis on February 15, the DC ties wouldn’t have helped since they take some time to activate – AC ties would have. But DC ties can certainly help in restoring the grid after a disaster, as Kevin describes, since timing isn’t so critical then).

You need roughly 40-50 MW power to energize the station auxiliaries.  The DC ties typically pass several hundred MW of power through the tie station.  So, you build a cranking path to start up a fossil unit (and more importantly to restore external station power to the nukes) and once you get one or more fossil units up, you rebuild more and more of the grid from there.  Regardless of how you do the restoration, you have to carefully balance generation and load to keep the energized system stable, or the whole thing collapses and you have to start all over again.  But if all works to plan, you will bring up islands across the blacked out area and the RC then coordinates the knitting of the islands together.

 The generating unit stators, magnets, etc., (what you referred to as electro-magnets) do not need power themselves in order to generate power.  It is the pumps, blowers, pressure systems, even the control room gear, what are referred to as station auxiliaries, that need the power in order to fire up the fossil unit.  The turning of the turbine shaft connected to the generator itself creates power through the magic of coiled wires and magnetic fields.  Maybe I am wrong, but I am not aware that the magnets in the generator are themselves energized in order to create the necessary magnetic field.  And, oh by the way, even hydro units need station power to operate the wicket gates.  At a hydro unit, you will find a very small water turbine that always runs.  It provides enough station power to keep the lights on and bring the big turbines online. 

Moving on to under-frequency load shedding, I am not ready to give all the folks at ERCOT a congratulatory shout out just yet.  Under frequency load shedding (UFLS) requirements are defined in PRC-006, which is applicable to Planning Coordinators and all entities responsible for the ownership, operation, or control of UFLS equipment as required by the UFLS program established by the Planning Coordinators.  That includes TOs, DPs, and UFLS-only DPs (not sure what the distinction is there).  The Planning Coordinator designs and coordinates the UFLS program and load shed schedules, and has to assess the UFLS plan at least once every five years using dynamic simulation.  It is not that the ERCOT RC staff had nine minutes in which to scramble and shed load.  UFLS relaying is automatic.  So maybe ERCOT, as the PC, gets a shout out for designing a plan that worked.  The UFLS relay owners get a shout out for properly implementing the UFLS schedules in the relays they own.  SPP’s plan, as an example, is three tiered.  More and more load is automatically dumped as frequency declines.  You might even see deliberate separation (islanding) as part of the plan in order to avoid a complete collapse. 

What I have the hardest time with is the notion that the ERCOT grid would be down for possibly months if it totally collapsed.  When the grid collapses, the generating units are tripped offline and substations go on battery backup.  Fossil units disconnect from the grid and big valves open up with a tremendous bang and an ear splitting shriek (you really need to be there and experience it one time) as they dump steam pressure.  The fossil unit turbines come to a near stop (they need to be kept slowly turning because they are hot and will distort if completely stopped).  Now, maybe if all of ERCOT collapsed, that is the concern – no power to turn the turbines and resulting turbine damage will prevent restart (the news articles I based this on said that there would be damage to the equipment in the plants, which probably meant the turbines)

Otherwise, a typical fossil unit can warm-start in roughly 24 hours, often less.  Cold starting takes a bit longer because you gotta boil water to turn the turbine.  Nukes have to be cleared by the NRC before they can restart, and that could be days once external station power is restored.  Wind turbines are simply parked and locked until ready to turn back on.  So, unless there were significant turbine damage, I cannot conceive how ERCOT would be down for months (but I think that was the point of the articles I read – there could have been turbine damage. Fortunately, this theory wasn’t put to the test).  And I suspect it takes longer than a few months to get a replacement turbine and install it, much less one for each and every unit in ERCOT-land (yes, that’s the idea. Once you have widespread turbine damage, you’re in big trouble. And of course, we’re not talking about wind turbines here. We’re talking about huge monsters costing sometimes tens of millions of dollars, that rotate at 1800-3600 rpm, i.e. 30-60 times a second. You don’t just order a new one on Amazon and have it show up in a week).  In the event there is some damage, you may see Texas turn into post-9/11 Baghdad with rotating load shed (rolling blackouts for the lay person), until enough generation is back online to fully serve load.  But I cannot see Texas being in the dark for months (I never thought I’d see planes fly into buildings in NYC, but it happened. Again, the UFLS system did what it was supposed to do at 1:50 AM on February 15, so this is now an academic question)

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Saturday, April 24, 2021

Back at the ranch, part 2

On Tuesday, I received emails from two knowledgeable people, commenting on Monday’s post. The emails were both fairly long and detailed, but both people made some very good points (they both have a lot of experience to draw on). Here are the comments from the first of those people. I’ll put up the comments from the second person next week.

The first commenter was Andrew Gallo of Proven Compliance. He was formerly the Assistant General Counsel of ERCOT and was the Director of Corporate Compliance of Austin Energy for 11 years. He commented on particular statements in my post, so below I’m reproducing first a statement from my post, and then his comments on it in red.

1) In reference to this sentence in my post: “Had the operators not been able to bring frequency above 59.4 Hz by nine minutes later, a protection scheme called ‘generator under frequency ride-through’ (I hadn’t heard of that, either) would have been automatically activated”, he says:

I’m not an engineer but this is my understanding as a person who has worked in the industry a long time.  I understand generator “ride through” to be when a generator does “ride through” a system event (like a frequency drop). Generator control systems are set in such a way as to not trip during momentary fluctuations of voltage or frequency. When voltage or frequency recovers quickly, the generator “rode through” the event. If it’s not a short-lived event, the generator’s protection scheme will trip off the generator to protect it from physical damage.

Andrew is clarifying that this protection scheme really works at the generator level – it’s not something triggered on a system-wide level, as I had thought.

2) In my post, I said “ERCOT is technically not regulated by FERC, although they do enforce a lot of the FERC and NERC requirements – including all of the NERC CIP requirements – anyway.” In this statement, I misspoke. I really meant to say that ERCOT follows FERC and NERC reliability requirements (including all NERC CIP requirements) voluntarily, although I was wrong about the voluntarily – they are a NERC entity and have to follow them. Enforcement of those requirements in ERCOT’s service area (i.e. on the utilities as well as ERCOT itself) is the responsibility of the Texas Reliability Entity, which at one point was part of ERCOT but separated about ten years ago.

However, Andrew pointed out that FERC doesn’t regulate markets in ERCOT (so they bear no blame for the $9,000 price!), because Texas’ grid is separated from the Eastern and Western Interconnects. This makes it exempt from FERC’s market regulations. But the NERC Reliability Standards (including the CIP standards) potentially apply to all power entities in the US (and certain provinces in Canada, at the discretion of the individual province) – except for those that don’t own or operate assets that are part of the Bulk Electric System.

3) Regarding my discussion of ERCOT’s blackstart plan, Andrew says:

ERCOT does have a system-wide blackstart plan and ERCOT is regulated by FERC (for reliability; the ERCOT market is not regulated by FERC). ERCOT must comply with the NERC Reliability Standards per the Energy Policy Act of 2005.

4) Andrew continued by making notations on my post in red (so what’s in black below is from my post):

The writer of the article wondered about this and checked out what ERCOT has said. He found that “there is a reference saying ERCOT has a black start plan, but it has never been used since there has never been a system-wide blackout.” Fair enough, but the plan should be tested regularly through non-intrusive means. Was this done?  [Yes. See below] The writer couldn’t find any reference to drills. [See below]

…In fact, he found another reference that said “…there are 13 units capable of black start operations in ERCOT, but six of those units experienced outages because of the extreme weather.” In other words, even if the blackstart plan had been tested, it might not have worked if needed, probably because the plan was written in anticipation of a hot-weather outage, when the generation would still all have been available. [Yes, all the blackstart units would have to be available to fully restore the grid; the units that were available could have re-energized their “islands.” Additionally, I believe there are other black start units in the ERCOT Region but ERCOT may not have contracted with them for black start service. Those units may have been available (although I don’t know if they were).]

He later provided some information on ERCOT’s blackstart tests and the generator under-frequency ride-through protections:

  • I believe ERCOT tests its blackstart plan every two years w/ a simulation
  • ERCOT has a blackstart plan template as Section 8(E) of the Nodal Operating Guides for utilities to use when they provide black start service
  • Per the ERCOT Operating Guides (Sec. 2.6.1), Transmission Operators must have Under-Frequency Load Shedding (UFLS) relays set to shed load as follows:
    • 5% of load automatically sheds at 59.3 Hz
    • 15% of load automatically sheds at 58.9 Hz
    • 25% of load automatically sheds at 58.5 Hz
  • Per ERCOT Nodal Operating Guides Sec. 2.6.2, generators must have UFLS relays set as follows:
    • > 59.4 Hz – generator cannot trip (i.e. the generator must ride through the frequency drop)
    • 58.4 – 59.4 Hz: generator can be set to trip after 9 minutes or more
    • 58.0 – 58.4 Hz: generator can be set to trip after at least 30 seconds
    • 57.5 – 58.0 Hz: generator can be set to trip after 2 seconds
    • < 57.5 Hz: No delay 
  • Thus, the information in your blog about ERCOT having ~9 minutes to correct the frequency drop seems correct (so long as the frequency was <59.4; in which case generators would have begun automatically tripping off-line after ~9 minutes). 

Andrew then came back with some further clarification of the generator ride-through:

…the concept of “ride through” applies to generators. The gist of it is ERCOT wants to ensure generators don’t trip randomly and in an unsynchronized way. Thus, the ERCOT Nodal Operating Guides (Sec. 2.6.2) require generators to ride through frequency drops down to 59.4 Hz. Once the frequency gets to (or below) 59.4 Hz (and above 58.4 Hz), the generator must ride through at least 9 minutes. If the frequency drop lasts > 9 minutes, the generator can trip.

Now you know why you read my blog: People have to explain stuff to me in great detail – and repeatedly – for me to finally understand it. If I were really “Tom Alrich, Boy Engineer” (rather than just playing him on TV), you wouldn’t see half of this. Worth every penny you pay me, right?


Comments by Kevin Perry, former Chief CIP Auditor, SPP Regional Entity

Andrew said all of the black start units needed to be available to fully restore the grid.  I disagree.  All units would have to be available to fully restore the grid as quickly as possible per the written and tested plan.  Otherwise, as Andrew noted, there will be multiple islands that then need to be knitted together.  And there will be segments of the grid that are not energized but will need to be in order to bring the island boundaries together.  That will take some time to accomplish because the engineers will need to plan a new cranking path from an island to get energy to a unit to be started.  Once you get a unit up and stable in a blacked out TOP’s grid segment, “initial restoration” is complete and the TOP continues to bring up load and generation as the assets and conditions permit.

Remember, some entities plan on receiving cranking power from a neighbor if all else fails.  And some only have neighbors in their plan, having no declared black start units of their own.  But each TOP is normally expected to bring up its own island and await coordination at the regional level to knit the islands together.

Knitting together islands is planned for in the regional plan and the TOP’s plan.  It requires phase angle synchronization, which is accomplished at substations equipped with synchroscopes and protective relays that will not allow breakers to be closed unless the phase is within synchronization tolerance.  Those are usually the tie substations.  The same equipment is used at a generating plant when bringing a unit online.  Now, if you are connecting to a de-energized grid segment, synchronization is not necessary.  In all cases, you still have to be balancing load and generation in order to maintain frequency and voltage stability as you bring the grid back up.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com


Thursday, April 22, 2021

Will someone please drive a stake through the heart of this lie?

This morning, my good friend Mark Weatherford sent me this article about an interview he’d given in conjunction with notable ICS security consultant Joe Weiss.

I read through most of it, thinking that both Mark and Joe made good points and this was a very reasonable discussion. However, that changed when I came to the part where Joe suddenly said “I'd like to mention one thing. And this actually goes back to what Mark first brought up about supply chain. The Executive Order 13920 came out because there was a Chinese-made transformer that had hardware backdoors preinstalled coming from China.”

This immediately set off alarm bells for me. Let’s be clear: Joe has been pressing this fable about the Chinese-made transformer since early last May. While Joe’s story is based on a real incident, first reported by Rebecca Smith in the Wall Street Journal on May 27 (and written about by me in this post on May 29), it’s 100% false.

I had thought this story was dead on each of these occasions:

1.      When I wrote a post about this on May 31.

2.      When Robert M. Lee, Tim Conway and Jeff Shearer of SANS wrote a Defense Use Case that stated that Joe’s report had zero credibility and was based on zero evidence (although other than that, they thought it was great – for instance, they just loved the font it was written in).

3.      When I put up a post describing the SANS document and proposing what I thought was a quite reasonable alternative explanation.

4.      And finally, when I put up a post on February 1 in response to an interview in Forbes where Joe repeated this lie.

Let me be clear: At no point has Joe ever pointed to any evidence to verify his claims, other than the easily-debunked “evidence” he has provided in his posts and interviews. Yet he keeps bringing this lie up again and again, and he did so again in his interview with Mark. I’m going to go though what Joe said on this topic – and Mark’s responses, which never yielded an inch, bless his heart – blow by blow, in the hope that I can finally drive a stake through the heart of this zombie lie:

1.      When Joe first brings the WAPA transformer up and Mark immediately challenges him on it, Joe says “I was on a call with people who were physically at the substation where that transformer was, as it was being installed.” Joe’s point being that the people installing the transformer realized something was wrong with it.

2.      Mark then correctly notes that the transformer was never even delivered to WAPA (it was intended for WAPA’s Ault substation in Colorado). As Rebecca Smith’s great article points out, it was transported directly from the port of Houston to Sandia National Laboratories when it arrived in early 2020. Moreover, the Jiangsu Huapeng Transformer Company, which made the transformer, was ordered by WAPA in June of 2019, while the transformer was still being manufactured, to change the delivery point for the order from the Ault substation in Colorado to a warehouse at the port of Houston. From there, it was transported – presumably by the US government – to Sandia. So obviously Joe’s statement was wrong.

3.      But Joe wasn’t fazed by this. He then said “There were two transformers involved. The first transformer was installed in the WAPA [Western Area Power Administration] Ault substation, Mark, not far from you, outside of Denver. It was installed in August 2019. When WAPA was doing the site acceptance testing, the mechanical and electrical engineers found the extra electronics in that transformer.”

4.      Of course, it’s hard to believe that this squares with Joe’s first statement, and it certainly doesn’t square at all with his statement (in bold type) in his blog post of May 11, 2020, to the effect that “When the Chinese transformer was delivered to a US utility, the site acceptance testing identified electronics that should NOT have been part of the transformer – hardware backdoors.”

5.      I’ll leave behind the issue of a “hardware backdoor”, which seems to have no discernable meaning, as Robert, Jeff and Tim described in their DUC and I pointed out in my post a few days after the WSJ article came out.

6.      Of course, this didn’t faze Joe! He replies “I have pictures of both transformers—Ault and Houston. As a result of that, the next transformer that arrived at the Port of Houston in early 2020 was intercepted by DOE and taken to Sandia [National Laboratories]. There is a utility missing a transformer. It would have never, ever happened if DOE wasn't so concerned about what they found with the first. What’s missing is what DOE found at Sandia.”

7.      Mark asks for evidence, and Joe provides a masterpiece of obfuscation: “Mark, you were within the government. Go ask DOE.” Note he does what I’ve seen other people do when they have just told a lie and are challenged for evidence: They tell the challenger that they can easily find the evidence for themselves; in other words, it’s an insult to them to even be asked for evidence.

8.      But Joe “backs this up” by continuing “I can read you—I won't even mention the country—an email I got from one of our closest allies. From someone very senior. And it's saying, ‘I am hoping you can help me with something. Regarding the transformer issue you discuss, can you please tell me to what level that information is confirmed?’”

9.      In other words, Joe was challenged for evidence for his story by someone overseas. He doesn’t describe any evidence he gave that person – since he didn’t have any – but he seems to be saying that the fact that this person asked him for evidence somehow indicates the evidence exists in the first place. I don’t quite understand his reasoning in this, but of course the whole idea was to shut Mark up, not to answer his question. 

10.   However, Mark didn’t shut up. He said “Well, I think you just confirmed my point, Joe, and that is, if they don't know, we don't know. Maybe there's nothing to know.”

11.   Joe’s reply to Mark is very interesting. He says “We have a utility missing a transformer. Mark, that has never, ever, ever happened. You don't buy a transformer like it without an absolute need to have it installed.” In other words, he seems to be asking “Why would WAPA have ordered the transformer in the first place, if they just wanted to have it shipped to a warehouse and torn apart?” I’ll address that in a moment. Let’s continue to Joe’s next lie.

12.   Joe goes on to say “When you look at Executive Order 13920, they give a detailed list of all of the equipment that is in scope for Executive Order 13920. Every single item in that executive order is out of scope for NERC CIP. Every single thing in NERC CIP, and in the supply chain, is out of scope for the executive order. We have a problem here. This is a real, honest hardware implant. There are over 200 large Chinese electric transformers in our electric grids today. We have no idea how many of them have these hardware backdoors.” I do have an idea how many large Chinese transformers have hardware backdoors: zero, since there’s no such thing.

13.   But Joe is absolutely right that EO 13920 provides a detailed list of equipment in scope – in fact, there are around 25 items on the list. He’s also right that almost all of those items (at least 21, but not all of them) aren’t in scope for the NERC CIP standards. But there’s a good reason for that: The NERC CIP standards only apply to devices that are operated by – or at least contain – a microprocessor or some other logical hardware (e.g. in a PLC), since only a processor can be subject to a cyberattack. Almost all of the devices in the EO don’t have a processor at all – meaning they are no more subject to a cyberattack than a 1920 automobile, my $5 steam iron, or for that matter a brick. Kevin Perry and I documented that in this post.

14.   This includes transformers. They operate according to the laws of physics, period, not the commands from some processor. They run day and night and don’t need external power to operate their core function. The last time I checked, the laws of physics still apply in China. It’s true that a transformer can have ancillary devices with processors, including load tap changers and dissolved gas analyzers (the former are often external to the transformer itself, and are often made by a different manufacturer than the one that made the transformer). But it’s very hard to see how they could be attacked. Moreover, it’s just as hard to see how a successful attack on one of them could lead to anything more than a brief local outage – and if you’re concerned about local outages, I suggest you figure out a way to address the number one cause of those, which is squirrels. The big national security concern is a widespread, cascading outage, not a local one.

Now, let’s get down to the question Joe (implicitly) asked: “Why would WAPA have ordered the transformer if they had no intention of using it?” That was certainly a question I asked myself when I read the WSJ article last May. It didn’t take too long to figure it out, but I didn’t want to raise this point until now, since it wasn’t required by any discussion. However, since Joe has kindly asked me to provide it, I will now. Here’s what I think happened:

1.      It’s no secret that the Trump administration in 2019 was looking for ways to decrease imports from China. Someone pointed out that WAPA – part of DoE – had bought Chinese large transformers and had one on order at the time.

2.      Someone had the bright idea that they could take a look at the transformer when it arrived, to find out if it contained some sort of “hardware backdoor” that would allow the Chinese to compromise it through some sort of cyberattack (never mind that transformers don’t have a processor to attack) launched over some sort of internet connection (never mind that a device without a processor can’t be connected to any communications network, any more than your living room sofa can).

3.      This also ignored a fact that was pointed out in the WSJ article: Since WAPA isn’t staffed with dumb bunnies, they knew perfectly well they had to be very careful when ordering any grid equipment from China; they left nothing to chance. The article says “…the transformer had been built to WAPA’s exact specifications, down to the parts numbers for the electronics that were sourced from companies WAPA chose in the U.S. and U.K.”

4.      Of course, a privately owned utility would have raised big objections to diverting a huge piece of equipment that was – as Joe points out – needed to maintain grid reliability. But since WAPA is 100% controlled by DoE they had to comply, although – knowing a number of people who work for WAPA – I can assure you they must have been furious, both at losing the transformer and for the implicit judgment that they were too stupid to know they should be very careful when ordering any grid hardware from China (the WSJ article points out that there are a number of US utilities that buy transformers from the same supplier. And there are other transformer suppliers in China as well).

5.      So the effort to find a “hardware backdoor” failed, but of course the transformer was totally destroyed.

6.      As I described in this post, shortly after the EO came out, DoE held two briefings for utility executives and made a point of declaring that they didn’t have to do anything differently now (since many utility executives were under the naïve impression that, because the EO required all purchases of equipment for the Bulk Power System to be stopped pending review of the risk by the Secretary of Energy, this actually meant that they had to do that. How silly of them); if they did, they’d be given plenty of warning before they had to do anything.

7.      If DoE had just discovered a serious hardware backdoor in the WAPA transformer at Sandia (which of course is owned by DoE), don’t you think they would have phrased this a little differently? In fact, wouldn’t they have already held a series of briefings – both classified and unclassified - for the industry? That’s what DHS (I believe) did in 2016, in the wake of the first Ukraine attacks.

So I hope Joe stops peddling this lie. The irony is that he’s done a lifetime of good work and is really one of the founders of ICS security. To have all of that tainted this way is really a shame.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com

 

 

Wednesday, April 21, 2021

This time, with the URL!

Note 4/21: I forgot to include the connection information in this post yesterday; I've done it now. I'll also put this up as a new post, so that it gets picked up by the email feed tonight.

As we announced at the last of the four introductory webinars last Monday, we will hold the kickoff meeting of the Energy SBOM PoC next Monday April 26 from 12-1 PM EST. The meeting is open to energy industry asset owners, suppliers, and service providers who want to learn more about SBOMs. I’m also very pleased to point out that the PoC is officially expanding to include oil and gas, along with electric power. So we’re truly the “Energy PoC” now.

The agenda is:

  • Why are we here?
  • SBOM Use Cases for the Energy community
  • Potential Roles in a Proof of Concept
  • Goals: What does "good" look like?
  • Logistics for moving forward

Regarding use cases, you’re encouraged to review the following document before Monday's meeting. https://www.ntia.gov/files/ntia/publications/ntia_sbom_use_cases_roles_benefits-nov2019.pdf 

Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 202-886-0111,,42757105#   United States, Washington DC

Phone Conference ID: 427 571 05#

This meeting will be recorded, and we'll send out a short summary for those who cannot make it. 

No signup is required, but if you aren’t currently on our mailing list, we request that you send an email to Dr. Allan Friedman at afriedman@ntia.gov. If possible, please include your name and the organization you’re affiliated with.

See you then!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

Tuesday, April 20, 2021

The Energy SBOM Proof of Concept kickoff is on Monday!

Note 4/21: I forgot to include the connection information in this post yesterday; I've done it now. I'll also put this up as a new post, so that it gets picked up by the email feed tonight.

As we announced at the last of the four introductory webinars last Monday, we will hold the kickoff meeting of the Energy SBOM PoC next Monday April 26 from 12-1 PM EST. The meeting is open to energy industry asset owners, suppliers, and service providers who want to learn more about SBOMs. I’m also very pleased to point out that the PoC is officially expanding to include oil and gas, along with electric power. So we’re truly the “Energy PoC” now.

The agenda is:

  • Why are we here?
  • SBOM Use Cases for the Energy community
  • Potential Roles in a Proof of Concept
  • Goals: What does "good" look like?
  • Logistics for moving forward

Regarding use cases, you’re encouraged to review the following document before Monday's meeting. https://www.ntia.gov/files/ntia/publications/ntia_sbom_use_cases_roles_benefits-nov2019.pdf 

Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 202-886-0111,,42757105#   United States, Washington DC

Phone Conference ID: 427 571 05#

This meeting will be recorded, and we'll send out a short summary for those who cannot make it. 

No signup is required, but if you aren’t currently on our mailing list, we request that you send an email to Dr. Allan Friedman at afriedman@ntia.gov. If possible, please include your name and the organization you’re affiliated with.

See you then!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Monday, April 19, 2021

Meanwhile, back at the (Texas) ranch…

I haven’t written about the Texas power crisis lately, but I’ve been following the story in the press – to the extent possible, since this isn’t exactly my day job. I’ve learned some interesting things lately, and I’d like to share them with you. I’ll do that in this post and a few subsequent ones (although not immediately following this one).

To summarize where I think things stand, I’ll say that it’s becoming clear what a huge event this was: from the power engineering standpoint, the human impact standpoint, and the financial standpoint. Are any of these three concerns closer to being at least comprehended – let alone resolved - than they were say a month ago? In the case of just the power engineering standpoint, I’d say it’s at least understood now, even if what’s not understandable is how the blackout was allowed to happen in the first place. And as far as resolution goes, none of the three is anywhere close to being resolved.

Let’s start with the power engineering standpoint. I’m very far from being a power engineer, but I was quite impressed with a few of the things I learned from reading this recent article in T&D World:

·        The “theme” of the article is stated early: “In the aftermath of the events taking place from Feb. 14 through Feb. 17, the ERCOT leadership made a series of statements. One quote caught some attention: The Texas power grid had come within four minutes and 37 seconds of total collapse. They went on to say that had the system collapsed, it would have required what is called a “black start” of the entire ERCOT system. Furthermore, the resulting blackout could have lasted weeks or even months: Rebuilding a grid from scratch takes time.”

·        The article goes on to ask and answer three questions: a) “What exactly does ERCOT’s statement mean?”; b) “Is it probably an exaggeration, or did the grid really come close to total collapse?”; and finally c) “Had ‘total collapse’ actually occurred, what would have been the consequences for Texas?”

·        The article answers a) and b) together. It describes (and graphs, using synchrophasor data) the minute-by-minute sequence of events that occurred on the ERCOT grid (monitored from the control center in Round Rock, Texas) between around 1:43 and 2:00 AM on February 15. At around 1:50 AM, the ERCOT system frequency fell below 59.4 Hz. This triggered a nine-minute window. Had the operators not been able to bring frequency above 59.4 Hz by nine minutes later, a protection scheme called “generator under frequency ride-through” (I hadn’t heard of that, either) would have been automatically activated.

·        This scheme is designed to keep generators from suffering permanent damage if the frequency stays too low for too long. It does this by shutting all generators down. As the article states, “The entire Texas power grid controlled by ERCOT would have collapsed and approximately 26 million customers would have been without power.”

·        Why is this scheme necessary? If this protection scheme weren’t in place and frequency didn’t rise sufficiently, the grid would have collapsed anyway, probably because the protection relays on most of the generators would have tripped them. But if those protection relays hadn’t worked, the ERCOT grid wouldn’t just have been shut down; it might have taken literally months (and huge expenditures) to restore it (the article doesn’t go as far as these last two sentences do, but it seems to me – Tom Alrich, Boy Engineer - that this in fact is what would happen in the worst case scenario).

·        Spoiler alert: The Texas grid didn’t collapse. The article describes how, after the frequency dropped below the critical level, the ERCOT operators (or really the under-frequency load shedding system, or UFLS) dropped 6,500 MW of load (a huge amount, needless to say – i.e. 6.5 gigawatts. 11 gigawatt-hours is roughly enough to power New York City for one day).

·        Unlike two earlier 1 GW load sheds, this one (it was really two, spaced a few minutes apart) did the trick and reversed the trend in frequency. Frequency passed out of the critical range with four minutes and 37 seconds remaining in the nine-minute window, which prevented the generator under frequency ride-through protection scheme from kicking in.

·        Had they not succeeded and the nine-minute clock had run out, “The entire Texas power grid controlled by ERCOT would have collapsed and approximately 26 million customers would have been without power.” The article continues,What the men and women of ERCOT did was not an easy task. Their efforts kept the power flowing and saved their grid to operate another day.” Can you imagine the sigh of relief that was collectively let out in the control room when the frequency went back above 59.4 Hz?

·        Of course, this story doesn’t have a happy ending, since the financial meltdown kicked in in the early afternoon, when the Texas PUC made the disastrous decision that the current $1,200/megawatt-hour wholesale power price wasn’t high enough (despite the normal level being about $25/mwh) and bumped it up to $9,000. And ERCOT decided to keep that rate in place through Friday, even though the market price returned to about $25 on Thursday. So while the mean and women who worked in the ERCOT control center were genuine heroes, the men and women who worked in the rates department (or whatever it’s called) were…not heroes.

·        But this isn’t the whole story. If the 9-minute window had passed without frequency going back above the critical level, the protection system would have brought all generation in ERCOT to a “graceful” shutdown – meaning the fact that the protection was in place prevented permanent damage to a lot of generation, which would have taken months to repair. But once the graceful shutdown had happened, how would ERCOT have restarted the grid? Would they just hit another switch and all the generators would immediately kick in and start producing power?

·        Unfortunately, it’s not that simple. This is because of a dirty little secret: It takes power to produce power. Generators require electromagnets in order to work. But electromagnets require…electric power! So if there’s no electric power to be had for love or money – as would have been the case if the ERCOT grid had totally collapsed -  how does any power get produced?

·        This is where “blackstart”, referred to in the ERCOT citation at the beginning of this article, comes in. I described blackstart in more detail in this post, but suffice it to say that the grid can be restarted if you start small and build up. You start with smaller plants that can be restarted with a backup generator or a big battery (as well as hydro plants, whose power source never stops, so the electromagnets are always energized. Of course, Texas isn’t known for an abundance of hydro power, like for example the Pacific Northwest). They energize particular lines, which then energize larger plants, which energize other lines…until finally the grid is operating again.

·        Doing this requires that very detailed procedures be followed in the proper order, which is all laid out in a blackstart plan. Every grid operator is required by FERC and NERC to have a blackstart plan. Did ERCOT have one?

·        The writer of the article wondered about this and checked out what ERCOT has said. He found that “there is a reference saying ERCOT has a black start plan, but it has never been used since there has never been a system-wide blackout.” Fair enough, but the plan should be tested regularly through non-intrusive means. Was this done? The writer couldn’t find any reference to drills.

·        In fact, he found another reference that said “…there are 13 units capable of black start operations in ERCOT, but six of those units experienced outages because of the extreme weather.” In other words, even if the blackstart plan had been tested, it might not have worked if needed, probably because the plan was written in anticipation of a hot-weather outage, when the generation would still all have been available.

·        So the people in the ERCOT service area dodged one bullet, due to the quick-thinking of people in the ERCOT control center. But had they not dodged that bullet and there had been a total (but graceful) grid shutdown, they might have found they had to wait a few days without power (in a severe cold snap, of course), while the ERCOT staff tried to improvise to get the grid running again. So an outage of hopefully just a few hours would have turned into one of at least a few days, and maybe longer.

·        And multi-day outages, especially over most of a big state like Texas, aren’t pretty. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.