Thursday, June 19, 2025

I changed yesterday’s post on NERC CIP


Kevin Perry is the retired former Chief CIP Auditor for the SPP Regional Entity and co-chair of the NERC Standards Drafting Team that drafted CIP versions 2 and 3, as well as a member of the team that drafted NERC Urgent Action 1200, the voluntary predecessor to the NERC CIP standards (and still very much the foundation of those standards). I’ve known Kevin well since he introduced himself to me at an SPP meeting on CIP in I believe 2011.

Kevin and I had huge email discussions - where our replies were all in different colors. We ran through all the primary colors and most of the secondary ones as well – about the many big issues that came up as the CIP version 5 standards were being drafted and implemented in 2011 to 2015 (CIP version 5 is essentially the version we still follow today. It’s where terms like BCS, ESP, PACS, EACMS, ERC, IRA, etc. were introduced into CIP). He often ruined my day by telling me that the post I’d just taken almost a day to write was flawed and needed to be corrected. You’ll be pleased to know that we’re still having some of the same arguments we had then – of course, he continues to be very unreasonable in not accepting my positions (😊).  The nerve of that guy!

True to form, he ruined my day today by telling me that the post I put up yesterday (which took at least eight hours to write on Tuesday and Wednesday. It turned out to be the 1200th post I've written since I started this blog in 2013) had a serious flaw. However, in this case I can’t be blamed for it – it turns out the NERC auditors made a decision I didn’t know about until I received Kevin’s email. Of course, I certainly wouldn’t expect the auditors to tell me about this, but Kevin knows most of them very well (he mentored a number of them when they worked for him at SPP RE).

You can learn all the gory details (or most of them, anyway) in the italicized text I’ve inserted into yesterday’s post. However, the main takeaway is that, even though the NERC Regional auditors decided there is no need for a “CMEP Practice Guide” to remove what many of us believed might become a “showstopper” impediment to NERC entities using SaaS with BES Cyber System Information (BCSI), they say this because they think the problem was already adequately dealt with – specifically, by a document NERC endorsed as Implementation Guidance for CIP-004-7 and CIP-011-3 (the two revised standards that came into effect on 1/1/2024 and were expected – prematurely, as it turns out – to lead to NERC entities feeling comfortable using SaaS with BCSI) in December 2023.

Thus, the moral of yesterday's story is unchanged: SaaS providers (and software developers who want to start delivering their software as a service) shouldn’t be afraid of using BCSI with their products, and NERC entities with high and/or medium impact BES Cyber Systems shouldn’t be afraid of giving SaaS providers access to their BCSI. However, both SaaS provider and NERC CIP customer need to keep in mind that they will still have to provide the required compliance evidence for CIP-004-7 R6, CIP-011-3 R1 and CIP-011-3 R2.[i] 

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve often been told that I should either accept advertising or charge a subscription fee or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] I didn’t emphasize this point in my post yesterday. I probably will in the future, although perhaps just at a high level. If you would like to discuss this topic with me, let me know.

Wednesday, June 18, 2025

NERC CIP in the Cloud: Is it time to start using SaaS?

I’ve written at least a few times about the difference between SaaS (software-as-a-service, although “software that runs in the cloud” is a more accurate description) and BCS (BES Cyber Systems) that are deployed in the cloud (for those unfamiliar with NERC CIP, BCS are the systems that the 13 current NERC CIP standards, CIP-002 through CIP-014, are there to protect. These are the systems whose loss r compromise would impact the Bulk Electric System (BES) within 15 minutes, although the impact wilusually be instantaneo.

Note from Tom 6/19: Kevin Perry, former Chief CIP Auditor for the SPP Regional Entity and co-chair of the team that drafted CIP versions 2-4, pointed out a couple of problems with this post as I wrote it yesterday. I've discussed them in italics below, although I decided not to change what I wrote yesterday - just point out where I was wrong.

Both SaaS and cloud BCS consist of software running in the cloud. They both provide advice and/or monitoring data to their operators. However, since the loss of a BCS, whether deployed in the cloud or not, will by definition impact the BES within 15 minutes, this means a BCS can provide more than advice; it can provide control or real-time monitoring. In other words, a BCS always has some sort of connection with a device (like an electronic relay that controls a circuit breaker) that impacts or monitors the power grid. Most importantly, the BCS needs to directly control, or monitor the output of, the grid-connected device. If a system’s impact on the grid is dependent on a human being taking some action first, then it’s not a BCS. 

Note: Kevin Perry pointed out that the converse of this last statement isn't true. That is, if a system's (negative) impact on the grid is dependent on a human being not taking a particular action (and that negative impact occurs within 15 minutes), then it's a BCS. He used the example of a SCADA system that notifies an operator when there's a problem that needs attention, so the operator can take actions to fix the problem. If that system is compromised and doesn't notify the operator of a problem - and that lack of notification leads to a negative impact on the BES within 15 minutes - then clearly the system can have a 15-minute BES impact and is therefore a BES Cyber Asset and also part of a BES Cyber System. 

This means that the same software product could be deployed in the cloud in two quite different ways. Let’s use the example of software that monitors power flows and can detect a dangerous anomaly in real time. If the software simply sets off an alarm and warns the operator of the anomaly – on the assumption that the operator will perform the steps necessary to protect the BES - then the software is SaaS. However, if the software is directly connected to an electronic relay in a substation, which itself directly controls the circuit breaker, then it may be a BCS. This is because it affects the BES within 15 minutes, without requiring human intervention.

When I mentioned earlier that the purpose of the CIP standards is to protect BES Cyber Systems, I omitted the fact that the CIP standards also protect BES Cyber System Information (BCSI)[i]. Since there are far more than ten times as many requirements (and “requirement parts”, which is NERC’s word for what are usually known as subrequirements) that apply to BCS than there are for BCSI, the main emphasis is almost always on protecting BCS. A BCS consists of one or more hardware devices and the software that runs on those devices.

However, when it comes to the cloud, hardware essentially disappears. Of course, everyone knows that the software in the cloud runs on hardware, but the big advantage of using the cloud is that the end user doesn’t need to ensure protection of the hardware – just the software. There are two basic ways to utilize software installed in the cloud.

1. One way is to install and manage the software yourself (although in many cases the OS and other supporting software products are managed by the CSP). This is how BES Cyber Systems can be utilized. Currently, only a small number of low impact BCS are installed in the cloud, since the CIP standards don’t pose any impediment to installing low BCS in the cloud.

However, it is close to impossible to “legally” install medium and high impact BCS in the cloud. This isn’t because the current CIP requirements directly forbid this happening, but because it would be literally impossible for the CSP (by which I mean a “platform” CSP) to provide the required evidence of compliance with requirements like CIP-007 R2 patch management and CIP-010 R1 configuration management. Any NERC entity that utilizes cloud-based BCS but can’t provide the required evidence for compliance with all current CIP requirements that apply to BCS is likely to be hit with a lot of violations. This is almost certainly why I have never heard of a NERC entity that has knowingly installed medium or high impact BCS in the cloud.

2. On the other hand, since SaaS consists of just software and is completely abstracted from the hardware it runs on, the only CIP requirements that apply to a NERC entity’s use of SaaS are the three requirements (along with a total of seven requirement parts) that apply to BCSI: CIP-004-7 R6, CIP-011-3 R1 and CIP-011-3 R2. NERC entities with high or medium impact BCS are free to use SaaS all they want, but if the information they store and/or utilize meets the BCSI definition, they must comply with those requirements.

In the previous versions of CIP-004, several requirement parts were worded so that they effectively prohibited usage in the cloud. This wasn’t intentional, of course. However, until recently the NERC community and FERC considered the cloud to be too untrustworthy to be considered as a home for anything having to do with the power grid. Therefore, it was never even considered to be an option for systems subject to CIP compliance. CIP-004-6 (the version of CIP-004 that was in effect until January 1, 2024) didn’t make any provision for encrypting BCSI at rest, since all storage locations for physical or electronic BCSI were assumed to be under the direct control of the NERC entity. For this reason, anyone with physical or logical access to the server(s) where BCSI was stored was considered to have access to the BCSI itself, whether encrypted or not.

Fortunately, this attitude was changing by 2018, when a new NERC Standards Drafting Team was constituted to fix the wording problems in CIP-004. The team was tasked with making it clear that encryption of BCSI makes it inaccessible to anyone without access to the decryption key, even if they have physical and/or electronic access to the server(s) where the BCSI is stored.

Besides making changes to CIP-004, the drafting team also changed CIP-011 R1, which requires that NERC entities with medium and/or high impact BCS develop an Information Protection Program for all BCSI, no matter where it resides or is used. The changes (to the Methods section of R1.2) made it clear that encryption – along with some other methods of data obfuscation – renders BCSI unusable to anyone that does not have access to the decryption key. Encryption wasn’t considered to be a protection in previous CIP versions; in fact, encryption was never even mentioned in the CIP standards until CIP-011-3 came into effect (along with CIP-004-7) on January 1, 2024.

In 2018, I thought that the changes needed to fix the BCSI-in-the-cloud problem would be extremely complicated. However, the SDT came up with an ingenious solution that involved minimal changes, including:

1.      Modifying the existing CIP-004 requirements to remove the language that effectively prohibited cloud storage of BCSI;

2.      Adding a new requirement CIP-004-7 R6 that introduced a concept called “provisioned access”. R6 states that provisioned access occurs when “…an individual has both the ability to obtain and use BCSI. Provisioned access is to be considered the result of the specific actions taken to provide an individual(s) the means to access BCSI (e.g., may include physical keys or access cards, user accounts and associated rights and privileges, encryption keys).” (my emphasis added)

3.      In other words, someone with physical or electronic access to a server that contains BCSI, who does not also have access to the decryption key, does not have provisioned access to the BCSI. By the same token, someone with access to both the server that stores the BCSI and the key does have provisioned access even if the BCSI is still encrypted. This is because the person could decrypt the data if they wanted to.

4.      Modifying the wording of CIP-011-3 R1.1 and R1.2 to separate identification of BCSI (in R1.1) from protection of BCSI (in R1.2).

5.      Modifying the Methods section of R1.2 to add a new category of BCSI called “off-premise BCSI”, and to make it clear that encryption is an option for protecting that new category (this was also the first time that encryption was mentioned in the CIP standards, as well as the first time that use of the cloud for any purpose was mentioned).  

The two revised standards, CIP-004-7 and CIP-011-3, including the changes described above, came into effect on January 1, 2024. Since the reason for making these changes was to enable storage and use of BCSI in the cloud, and since use by SaaS applications was the primary intended use for BCSI in the cloud, I and some others in the NERC community thought that NERC entities would be happy that they could finally move computationally intensive data analysis tasks, that required some use of BCSI, to the cloud.

Even more importantly, I thought that vendors of the software that enables those tasks would be even happier. After all, instead of having to individually support each of their customers using their software with on-premises hardware, they could become a SaaS provider. Thus, they would just need to maintain one big instance of the software[ii] for all their customers.

However, what happened was quite different from what I expected: After being told for years that storing or using BCSI in the cloud was verboten, few NERC entities were ready to take the leap to using BCSI in a SaaS application – absent strong encouragement from NERC and/or their own Region(s). But that encouragement, strong or otherwise, was as far as I know totally absent. For example, even though some previous changes to the CIP standards have been accompanied by multiple NERC webinars and presentations at Regional Entity meetings, I have yet to hear of one of these happening that dealt with the two revised standards that came into effect on 1/1/2024.

But there is another explanation for why NERC entities have been reluctant to start using SaaS that requires use of BCSI: Late in 2023, some NERC Regional Entity staff members realized that there was a potential “showstopper” problem regarding the wording of CIP-004-7 R6. I described that problem in detail in this post in January 2024, but here is a quick summary:

1.      BCSI must be encrypted from the moment it is transmitted to the cloud. It needs to remain encrypted through when it is stored in the cloud and utilized in a SaaS application.

2.      Few SaaS applications can make use of encrypted data. Therefore, some person who is an employee or contractor of the SaaS provider will need to decrypt the BCSI and “feed it in” to the application. That person will need to have access to both the BCSI and the decryption key. At first glance, that appears to meet the “definition” of provisioned access included in the first section of R6: “..an individual has both the ability to obtain and use BCSI.”

3.      Requirement Part CIP-004-7 R6.1 makes it clear that provisioned access must be authorized by the NERC entity. Therefore, if Staff Member Y of the SaaS provider needs to feed BCSI from Electric Utility ABC into the application, ABC will have to authorize the provider to provision Y with access to their BCSI. Similarly, if Utility 123 needs to have their BCSI fed into the same SaaS application, they will also need to authorize provisioning of the staff member that does this, even if that staff member is also Y.

4.      Since SaaS is used day and night and since staff members get sick, take vacations and are transferred, there will always need to be multiple people with provisioned access to the BCSI of each NERC entity customer. If the provider has 100 NERC entity customers and at any time there are six staff members who need access to each customer’s BCSI (corresponding to three 8-hour weekday shifts and three 8-hour weekend shifts), that means there will need to be up to 600 individuals with provisioned access to BCSI at any one time.

5.      For each of these individuals, the supplier will need to provide evidence to each NERC entity customer that they are in continual compliance with CIP-004-7 Requirement R6 Parts R6.1, R6.2 and R6.3. Needless to say, this will require a huge amount of paperwork on the part of the SaaS provider; many (if not most) SaaS providers will be unwilling to undertake this responsibility. Therefore, it isn’t surprising that no NERC entity I know of decided to take the leap to using SaaS with BCSI after 1/1/2024.[iii]

Fortunately, help is on the way with this problem. A group I am (a small) part of, the informal NERC Cloud Technology Advisory Group (CTAG), discussed the above problem at length and realized that the BCSI access required for a SaaS provider staff member does not need to be explicitly provisioned; therefore, Requirement CIP-004-7 R6 does not apply in the use case described (which is fundamental to almost all SaaS use, of course).

However, knowing this might not in itself be helpful. This might be simply filed away under the “useless, but  still nice to know” category, if the only way it could be used to change the situation would be by changing one of the NERC CIP requirements (presumably CIP-004-7 R6). I say this because changing an existing CIP requirement could very easily require three years or even longer, starting with the day the change is first proposed to the NERC Standards Committee (in a Standards Authorization Request or SAR).

However, no change to a NERC CIP requirement (or definition) is needed. The CIP auditors (who are all staff members of one of the six NERC Regional Entities) occasionally produce “CMEP[iv] Practice Guides” that provide direction to audit staff on “approaches to carry out compliance monitoring and enforcement activities.” Our group turned over our findings – which are in part based on collateral NERC documents – to the committee that is in charge of drafting new CMEP Practice Guides. While it is not guaranteed that will be the outcome, we are optimistic they will develop a new Guide for BCSI that will make a recommendation on this issue (the last Guide for BCSI was published in 2019; it is based on the previous version of CIP-004, so it is now obsolete).  

Of course, it may be six months (or even longer) before a new CMEP Practice Guide is published. Does this mean that software vendors with current CIP customers should wait six months before they start offering SaaS services to those customers? Or that current SaaS providers need to wait six months before they start approaching NERC entities that are subject to CIP compliance about using their applications? Or that NERC entities that would like to move certain data intensive applications to the cloud should wait six months before they even start talking to SaaS providers about doing that?

In all these cases, the answer is no. The important thing to remember is that the concerns that were raised in late 2023 about provisioned access being necessary for SaaS application provider staff members were in retrospect overblown. The “default position” should always have been that provisioned access was not necessary.

Tom 6/19: Kevin also pointed out a recent development that I didn't know about. Without going into a lot of detail, it seems the NERC Regional auditors, who would need to prepare a CMEP Practice Guide, don't think a new one is needed in this case. In effect, they say that statements in a document that NERC endorsed as Implementation Guidance for CIP-004-7 Requirement R6 and CIP-011-3 Requirements R1 in December 2023 sufficiently undercut the concerns that were being raised about the wording regarding "provisioned access" in the first part of CIP-004-7 Requirement R6. 

Moreover, they said the question is more properly dealt with by the NERC Responsible Entity in the Information Protection Program that is mandated by CIP-011. In other words, they agree that the problem I described above shouldn't be a concern in most cases, but they also don't think a CMEP Practice Guide is required to explain this. Auditors don't like to waste time, especially when it's their own!

Thus, the new CMEP Practice Guide will do nothing more than restore the status quo before the concerns arose. It does not in any way amount to a change in how CIP requirements are normally interpreted. After all, had the concerns been correct, it would effectively have meant that use of BCSI with SaaS was still completely off limits for NERC entities with high and/or medium impact BES Cyber Systems, and the work of the drafting team from 2019 to 2023 was all for naught. 

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve often been told that I should either accept advertising or charge a subscription fee or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] The NERC definition of BCSI is “Information about the BES Cyber System that could be used to gain unauthorized access or pose a security threat to the BES Cyber System. BES Cyber System Information does not include individual pieces of information that by themselves do not pose a threat or could not be used to allow unauthorized access to BES Cyber Systems, such as, but not limited to, device names, individual IP addresses without context, ESP names, or policy statements. Examples of BES Cyber System Information may include, but are not limited to, security procedures or security information about BES Cyber Systems, Physical Access Control Systems, and Electronic Access Control or Monitoring Systems that is not publicly available and could be used to allow unauthorized access or unauthorized distribution; collections of network addresses; and network topology of the BES Cyber System.”

[ii] Of course, in practice I’m sure that SaaS providers maintain multiple instances of their software in the cloud, for various reasons. But that is still a big improvement over maintaining a single instance for each customer, as they did before they moved to SaaS delivery of their product.

However, I’m deliberately overlooking the “multi-tenant problem”. This arises when a standalone enterprise software product that includes its own database is moved without modification to the cloud – with the result that users from different organizations and even different countries might end up sharing a single database. Even though there are protections between different users in the database, they are not likely to be equivalent to the protections that exist when each organization in each country operates its own database instance. I hope to address this topic soon.

[iii] While this sentence is accurate, it’s misleading. The fact is, there are at least one or two SaaS applications that NERC entities have been using to document CIP-010 R1 (configuration management) compliance since 2016 or 2017; of course, configuration data on BES Cyber Systems will almost certainly include BCSI. It is likely those NERC entities are still using those SaaS applications today.

[iv] CMEP stands for Compliance Monitoring and Enforcement Program.

Saturday, June 14, 2025

Will NERC have to audit the CSPs?

 

The NERC Standards Drafting Team that is working on what I call the “cloud CIP” problem seems to be making progress on CIP-016. This is a new standard that will include requirements that “apply” to the cloud service providers (CSPs) – although compliance responsibility will fall on the NERC entity, of course. When CIP-016 comes into effect (which I continue to believe will be 2031, give or take a year), most of the existing CIP standards will be changed in some way as well.

I believe we’re at least 4-5 years away from having all the required changes to the CIP standards (and also to the NERC Rules of Procedure – see below) drafted, balloted (multiple times) and approved by NERC and FERC. Thus, it’s too early to be overly concerned about the details of the new CIP requirements that are being discussed today. However, I’m pleased to see that the SDT is at least starting to debate draft requirements, since doing that will lead them into confronting the big issues they will need to settle before they can even think of drafting the final requirements.

One of the biggest of those issues is that of requirements that apply to the CSPs. In this regard, the SDT is facing a situation almost exactly like the one faced by the SDT that drafted what became CIP-013-1. That SDT knew that the new standard should require good cyber behavior on the part of third-party suppliers of BES Cyber Systems, but at the same time they knew that neither NERC nor FERC has any jurisdiction over those suppliers; any new requirements would have to apply to the NERC entities themselves.

However, they also knew that FERC had made clear in their order that the new standard couldn’t dictate contract terms to NERC entities. So, how were they going to require the entities to ensure their suppliers put in place adequate cybersecurity protections?

FERC had said they didn’t want NERC to develop “one size fits all” requirements that take no account of the individual situation of either the NERC entity or the supplier. While FERC didn’t explicitly use the word “risk-based”, they were clearly asking NERC to develop risk-based[i] requirements.

This is the course the CIP-013 SDT took; in fact, they took it to a fault. CIP-013-1 R1 Part R1.1 required the NERC entity to develop a “supply chain cyber security risk management plan” (SCCSRMP, although that acronym never caught on) that included “process(es) for the procurement of BES Cyber Systems to identify and assess cyber security risk(s) to the Bulk Electric System from vendor products or services resulting from…procuring and installing vendor equipment and software…”

In other words, CIP-013-1 left it to the NERC entity to “identify and assess” risks posed by each supplier of BCS; left unsaid, but certainly intended, was the implicit requirement to work with the supplier to remediate any risks revealed by the entity’s assessment (e.g., in the supplier’s answers to the questions in a questionnaire).

However, one of the big problems with CIP-013 was that R1.1 didn’t provide any suggestion of what areas of supplier risk might be addressed in the SCCSRMP - the risk of compromise due to improper identity and access management controls, the risk of compromise due to inadequate remote access security, etc. As a result, I hear that a large percentage of NERC entities (with high or medium impact BES environments) simply considered the six items in CIP-013-1 Requirement R1 Part R1.2 to be indicators of the only risks that needed to be addressed in CIP-013; while those six items certainly address real risks, they were never intended to be the only ones the entity should be concerned about.

By contrast, the Cloud CIP SDT seems to be writing requirements for the CSPs directly. Of course, they understand that compliance with the requirements needs to be the responsibility of the NERC entity; however, it won’t be hard to rewrite the requirements so that the entity is responsible for making sure their CSP follows them. It seems that the requirements they’re developing now aren’t risk-based, but they are objective-based (which is NERC’s preferred term). Since you can’t achieve an objective without taking account of risk, I consider the two terms to be roughly equivalent.

The requirements seem to be written under the assumption that each NERC entity will need to negotiate with its CSP (by which I mean their platform CSP – i.e., one of the big boys) regarding what evidence they will provide to the entity come audit time. However, it’s highly unlikely that the platform CSPs will be willing to negotiate with individual NERC entities. After all, their business model is based on offering the same hamburger to every customer, not having a discussion with each one about what they want on it.

On the other hand, if the CSPs are going to have to “comply” with the requirements in CIP-016, there will need to be some compliance assessment process for them. It probably won’t be a true audit, but it will be a review of evidence of compliance with each requirement. However, it’s very likely that each platform CSP will demand to be audited by just one organization, not 100.

This is why I’ve already said that I see no alternative to having NERC (or a third party engaged by NERC) conduct an audit of each CSP on behalf of all NERC entities. Of course, the audit will just cover whatever CIP standard(s) specifically targets CSPs (the current CIP standards will hopefully survive virtually unchanged, but for on premises systems only). NERC will gather all the evidence from each CSP and make sure it’s complete and relevant, but they won’t pass judgment on whether the CSP is in compliance with each requirement.

Instead, NERC will pass the evidence from each CSP to entities who utilize that CSP for their medium or high impact BES systems. It will be up to each NERC entity to determine whether their CSP has complied with each of the requirements; if they determine their CSP has not complied with more than a few requirements, it will be up to them to decide whether, and under what conditions, they will continue to utilize that CSP.

For example, if a CSP has multiple deficiencies, the entity will need to decide whether to a) switch to another CSP, b) continue with this CSP but try to work with them to mitigate the deficiencies, or c) ignore the deficiencies entirely and keep utilizing the CSP. All three of these options are acceptable courses of action for a NERC entity, but they will need to justify their decision to the CIP auditors.

Most importantly, NERC will not issue (or deny) a certification for a CSP based on their audit results. If NERC did that, it would most likely be a violation of antitrust law. More importantly, the decision whether to use, or to continue using, a CSP will always be subject to many factors that are specific to the NERC entity. There is no way that NERC or any other organization could make the decision for a NERC entity whether to contract with a particular CSP.

Therefore, I think it is very likely that the SDT will conclude at some point (although it might take them 1-2 years to get there) that NERC will have to conduct the audits of the platform CSPs. However, there’s one huge fly in this ointment: This whole process is almost certainly not allowed (either explicitly or implicitly) by the current NERC Rules of Procedure. This means the RoP will need to be revised before the new or revised Cloud CIP standards come into effect.

What’s the process for changing the Rules of Procedure? If there is a defined process, it must be in the Rules of Procedure now. And if there isn’t a defined process, it will likely have to be drafted and approved (by both NERC and FERC) and inserted in the RoP; only then will it be possible to follow the new process and make whatever changes are required to permit NERC to audit the CSPs.  Of course, that change will also have to be approved by both NERC and FERC.

All of this is to say that granting NERC the authority to audit the CSPs will most likely require multiple years (two at a minimum, but perhaps three to four); this is why I think that 2031 is, if anything, an overly optimistic estimate of when the Cloud CIP standards will be enforced.[ii]

Since it’s likely that the Rules of Procedure changes will need to be dealt with by some other group within NERC (such as an RoP drafting team?), it would speed up the whole process if the RoP changes were pursued at the same time as the changes in the CIP standards. However, I don’t believe anyone is even discussing RoP changes now, so we can’t count on that happening. This is why I continue to believe that, barring a special intervention by the NERC Board, it will be 5-6 years at least before the “Cloud CIP” standards are implemented.

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] NERC’s term for risk-based is “objectives based”. I think they’re effectively the same thing, since it’s impossible to achieve any objective without taking risk into account.

[ii] This includes an estimate of at least one year for FERC to approve the new or revised standards, plus an implementation period of more than one year. These are in addition to at least two years for the SDT to draft, ballot, respond to comments, and revise the standards; that whole cycle will most likely need to be repeated three times after the first ballot, as it has with all major changes in CIP in the past. At a bare minimum, each cycle will take three months.

I will point out that there is some likelihood that pressure will build on NERC to exercise an “in case of emergency, break glass” provision now included in the Rules of Procedure. This allows the Board of Trustees, in an emergency, to order an expedited process to develop new standard(s) that will bypass the normal process. Since there’s currently not even any discussion about doing this, it’s safe to say that even this scenario will result in multiple years passing before full cloud use by NERC entities for their OT environments is permitted by the CIP Reliability Standards.

Thursday, June 12, 2025

Please subscribe!

 

My name is Tom Alrich. In 2013, I started writing a blog about upcoming changes in the NERC CIP cybersecurity standards for the electric power industry. Since then, I have written over 500 posts about CIP and about 700 on other cybersecurity topics. I estimate that I have around 1,000 - 2,000 regular readers worldwide, with 20-30,000 pageviews per month.

I also lead the OWASP SBOM Forum and the OWASP Vulnerability Database Working Group. These groups are currently focused on two issues, which I also discuss extensively in my blog. The two issues are:

  • How to address the lack of machine-readable software identifiers in most new CVE vulnerability records, especially in the National Vulnerability Database (NVD).
  • How to design, fund and implement a free Global Vulnerability Database (GVD). This will provide a single "intelligent front end" to major vulnerability databases worldwide, without requiring creation of a hugely expensive single database.

The other area on which I have been, and will continue to be, focused is the NERC CIP cybersecurity standards. The biggest concern in CIP compliance today is the fact that the larger electric utilities and IPPs are currently "forbidden" to utilize cloud services for their OT assets - while at the same time, software developers are continually moving toward cloud-only delivery of their software.

This is obviously not a sustainable situation. Last year, a new NERC Standards Drafting Team started working on new and/or revised CIP standards to address this problem. I will continue to write about the major issues involved with the new standards, as well as how electric utilities can utilize the cloud today.

In my 12 years of writing this blog, I have been told many times that I should either accept advertising or charge a subscription fee. Neither of those options appeals to me. However, this is becoming an increasingly untenable situation, since I can't continue writing the blog without some financial support.

I would very much appreciate if everyone who reads my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Can you help this blog continue?

Thank you! 

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Tuesday, June 10, 2025

Rebecca Smith warned us, but we didn’t listen – and paid a big price. Now she’s gone.

Note: there are multiple links to Wall Street Journal articles in this post, most of which are behind a paywall. If you would like to read any of these articles, please email me and I’ll send you a PDF of them.

On July 23, 2018, the Wall Street Journal published an article[i] by a reporter named Rebecca Smith (whom I had never heard of at the time) titled “Russian Hackers Reach U.S. Utility Control Rooms, Homeland Security Officials Say”. The article described an online web presentation that had been given by Jonathan Homer of DHS that day. That presentation was repeated three more times within two weeks.

The first four paragraphs of the article were quite startling:

Hackers working for Russia claimed “hundreds of victims” last year in a giant and long-running campaign that put them inside the control rooms of U.S. electric utilities where they could have caused blackouts, federal officials said. They said the campaign likely is continuing.

The Russian hackers, who worked for a shadowy state-sponsored group previously identified as Dragonfly or Energetic Bear, broke into supposedly secure, “air-gapped” or isolated networks owned by utilities with relative ease by first penetrating the networks of key vendors who had trusted relationships with the power companies, said officials at the Department of Homeland Security.

“They got to the point where they could have thrown switches” and disrupted power flows, said Jonathan Homer, chief of industrial-control-system analysis for DHS.

DHS has been warning utility executives with security clearances about the Russian group’s threat to critical infrastructure since 2014. But the briefing on Monday was the first time that DHS has given out information in an unclassified setting with so much detail. It continues to withhold the names of victims, but it now says there were hundreds of victims, not a few dozen, as had been stated previously.

The article went on to describe how the attackers fairly easily penetrated vendors to the utility industry and points out that “It was a relatively easy process, in many cases, for them to steal credentials from vendors and gain direct access to utility networks.” While the article makes it abundantly clear that attackers penetrated the control networks (“control rooms”, presumably meaning Control Centers) of electric utilities, it isn’t completely clear that the “hundreds of victims” were all utilities or whether some of them were vendors. In any case, it does seem that many utility control networks were penetrated and that DHS thought the attackers were in a position to cause outages if they wanted to.

The next day, Tuesday July 23, Rebecca’s article appeared in the WSJ’s print edition. It is no exaggeration to say that it caused a firestorm of reactions in the media, both in the US and abroad; there was general agreement that Rebecca had described a huge threat to US national security. My initial reaction was in a post on Wednesday. I expressed skepticism about the article, although my main point was that these attacks showed the vital importance of CIP-013, the supply chain security standard that FERC had ordered in 2016. By that time, it had been drafted and approved by NERC, but not yet by FERC.

But there was another reaction on Wednesday. I described it in my post on Thursday: “I learned today, from an article on Power Magazine’s web site, and confirmed with a source who knew the contents of Congressional briefings by DHS, that the true number of assets compromised was…. envelope, please….one. And by the way, it was a very insignificant generating plant whose loss would have no impact on the grid.”

This was interesting. It seems that DHS was suddenly trying to take back the presentation they had just given on Monday. However, despite their efforts, the same presentation was repeated unchanged by Mr. Homer once that week and twice the following week. I won’t go through everything that happened in the next few weeks, but I will point out that DHS, now that they had walked back what Homer said once, decided that wasn’t enough.

The next week, DHS held a meeting in New York, attended by Vice President Pence, Secretary Nielsen of DHS, Secretary of Energy Perry, CEO Tom Fanning of Southern Company, and Chris Krebs of DHS (a year before he put together CISA). In that meeting, Mr. Krebs went one better than the story DHS had put out the previous week: He said it wasn’t even an insignificant generating plant that was compromised. It was just two wind turbines!

Despite this contradiction, I believed the general tenor of both the DHS and Krebs statements was true. On August 7, I published this post. It asserted that either Jonathan Homer had been lying in his presentation, or Rebecca had completely misunderstood what he was talking about. In the post, I made some unfair, and – frankly - misogynistic, statements about Rebecca that I regret to this day. My next two posts on this topic continued to drink the DHS Kool-Aid and deprecate what Rebecca had written.

However, on September 4 I changed my tune in this post. I did this partly because the contradictions in the DHS story had become impossible to reconcile. However, the main reason I changed my position was that Rebecca had emailed me the previous Friday (it turns out she had been a longtime reader of my blog). I apologized to her for my statements in the August 7 post and she accepted my apology. Being one of the two WSJ reporters that disclosed the Enron scandal and having written the definitive book on Enron’s collapse (of which her reporting was one of the main causes), I imagine she’d seen a lot worse statements written about her. I described what she said on the phone in my September 4 post, although she wouldn’t let me reveal her name.

In that call, Rebecca made it clear that she didn’t misunderstand Jonathan Homer’s statements (in fact, she had listened to the two repeat webinars that Homer presented the following week. She said all three webinars were close to identical, although the one she wrote about in her article had some technical problems). In the post, I paraphrased what she said:

…the people at DHS who made the statements at the briefings…really meant what they were trying to say (notwithstanding the fact that they confused control centers with control rooms): that the Russians had penetrated more than one utility control center, where they actually could control the flow of power on the grid itself. That meant to me that they had penetrated the Energy Management System (EMS). This system forms the core of the mission of most electric utilities, since it allows them to control power flows over their network.

The same day that I talked with Rebecca, a well-respected CIP auditor (who also had to remain anonymous) pointed out to me that a screen shot, that was captured by the Russian attackers and displayed in Jonathan Homer’s webinar, showed that the Russians had penetrated a combustion turbine gas plant. CT plants are usually fairly large, so most industry observers wouldn’t consider one to be just an “insignificant plant” – the phrase DHS had used to describe the single plant that was penetrated by the Russians. Of course, a CT plant would also never be mistaken for “just two wind turbines”.

In other words, the one piece of documentary evidence contradicted both of the stories that DHS had told so far, as they struggled mightily (but unsuccessfully) to walk back the main content of Jonathan Homer’s presentation.

From that day on, I believed Rebecca and we became good friends, although we only met once in person - at the 2019 RSA Conference. I was devastated to read in the WSJ just before Christmas of 2023 that she had passed away. I strongly recommend you read the WSJ obituary on her. She had a very remarkable career and took the time to really understand the electric power industry. The last piece of hers that I read was one of a series  in 2021 about the disastrous pricing  decisions made by the Texas Public Utilities Commission and ERCOT (the Texas grid operator) at the height of the crisis early in the morning of February 15, 2021. As always, it was well researched and thought out.

There is a lot more to the story of the Russians and the US power grid in the next couple of years (2019 and 2020). While I hope to write a full blog post (or even two) about those events soon, I want to point out three highlights, as well as the lesson that I think can be learned from all of this:

1. On January 10, 2019, Rebecca and Rob Barry of the WSJ published a journalistic tour de force: a very well-researched article on how the Russians had penetrated a number of electric utilities (five were named, but there were clearly more victims) using supply chain attacks (you can read my post about the article here). What was quite interesting about this was that the Russians didn’t seem to have been trying to cause the Big One: a cascading outage like the one that caused the 2003 Northeast Blackout (I had always assumed this was their Holy Grail). Thus, they didn’t have to limit themselves to trying to penetrate vendors of Energy Management Systems (EMS), which are presumably the only vendors that would have access to these systems.  

Instead, the Russians penetrated an excavating company, a technical magazine publisher, a small construction company, some small power generation companies, and other small vendors. Their ultimate targets will still control systems operated by electric utilities, but now the utilities targeted were small ones. None of these utilities would by themselves have been able to cause a cascading grid outage if compromised, but most of them served military bases. Clearly, the Russians were positioning themselves so they could disrupt a US military response in case of war.  

2. A while after the July/August 2018 blowup caused by Rebecca’s article, it became clear to me that the various statements by DHS trying to walk back Jonathan Homer’s presentation were probably not just the result of individuals trying to “defend DHS” as best they could (although Mr. Homer remained in the employ of DHS for at least another year). Instead, an order must have come down from the highest level of the federal government to the effect that a) Homer’s story needed to be discredited, and b) DHS should stop investigating Russian cyber attackers, as they had been doing for several years. It’s very likely the penalty for disobeying the order was being immediately fired.

Indeed, it seems that order was very successful. Before July 2018, I had read a lot of news stories about what DHS had learned about Russian cyberattacks on the grid; after July 2018, those stories ceased. In fact, at a meeting in conjunction with the 2019 RSA Conference, I asked a Director in DHS, who had been discussing their cyber capabilities, about Jonathan Homer’s presentation and the multiple walkbacks that DHS had attempted. When I brought this up, he turned white as a sheet and stammered off some incoherent statements, including "We don't do technical investigations." Really, DHS (this was still before CISA was formed) isn’t capable of doing technical investigations?

3. However, there were a huge number of news articles and blog posts (including mine) starting in December 2020 about by far the most successful targeted Russian attacks on the US federal government (as well as many other organizations, both US-based and international): the SolarWinds supply chain attacks.

What fascinated me most about those attacks was the amazing degree of organization and planning that went into the attack on the SolarWinds development environment; here is my post on that topic. In fact, Microsoft stated that probably 1,000 Russian engineers worked on the attack (which took a year and a half and only ended when the attack was discovered by chance by FireEye). But here’s the big question: With such a massive effort underway in the Russian hacker community, why didn’t the US have any clue that this was going on until the Russians had been helping themselves to secrets they found inside US government and private networks for more than six months?

Although it took me a while to realize it, I now see that the answer to that question is simple: If you tell your investigators not to investigate one country anymore on pain of losing their job, you’ll end up with what you wanted – no investigations. Given how destructive the SolarWinds attacks were and the fact that we’ll never know how many secrets walked out the door into Putin’s hands during those attacks, we have all paid (and will continue to pay) a big price for that policy. I wish we all, including me, had listened to Rebecca in 2018. 

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today? 

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] If you run into a paywall on this or any other WSJ link in this post, send me an email and I’ll send the PDF of the article to you.

Sunday, June 8, 2025

Rules for the Global Vulnerability Database


I recently described my idea for a Global Vulnerability Database. The GVD won’t be a database at all, but rather an “intelligent switching hub” that accepts vulnerability queries that are in the form:

“What Vulnerabilities are found in Product ABC?”, or

“What Products are affected by Vulnerability 123?”

The Product and Vulnerability fields are both intended to be as universal as possible; that is, they should accept all major machine-readable identifiers. For example, the Vulnerability field will accept CVE, OSV, GHSA (GitHub Security Advisory), and other vulnerability identifiers. The Product field will accept CPE, purl, OSV, and perhaps other product identifiers.

While this was not always the case, it is safe to assume that today there is no major vulnerability database that does not accept and/or output machine readable vulnerability identifiers, product identifiers, or both. However, in this regard there are two important differences between the GVD and other vulnerability databases:

1.      With one notable exception[i], it is unlikely there is any vulnerability database today that, in response to a query for vulnerabilities that affect Product ABC, will provide more than one type of vulnerability identifier - for example, both CVE and GHSA. Moreover, with the same exception, it is unlikely there is any vulnerability database today that, in response to a query for products that are affected by a particular vulnerability (e.g., CVE-2025-12345), will provide more than one type of product identifier, e.g. purl and CPE. This is because most vulnerability databases are designed to associate a single type of product identifier with a single type of vulnerability identifier. For example, the NVD only associates CPE names for products with CVE numbers for vulnerabilities; the OSS Index open source database only associates purl identifiers with CVE numbers; etc.

2.      It is also safe to say there is no vulnerability database today that will respond to a query like “Show me vulnerabilities of all types that affect Product ABC”, by displaying all major types of vulnerability identifiers. It’s also safe to say there’s no vulnerability database today that will respond to a query like, “Show me products of all types that are affected by CVE-2025-12345”, by displaying all major types of product identifiers. Yet, my ambition is that the GVD will do both of those things.

However, there is a potential fly in this ointment: There is no way to create an unambiguous mapping either between different types of vulnerability identifiers (e.g., CVE to OSV) or different types of product identifiers (e.g., CPE to purl). Here are several examples:

A. Most vulnerabilities are assigned to products as part of a coordinated vulnerability disclosure process. For example, an open source project (“Project 1”) might report a new vulnerability they have identified in their product to the CVE Program. A CVE Numbering Authority (CNA) will create a new CVE record for the vulnerability and assign it a CVE number like CVE-2024-56789. If the project team also registers the new vulnerability with GitHub, it will receive a GHSA identifier as well. Given that the same team is responsible for both registrations for the vulnerability (CVE and GHSA), the two registrations will usually be considered to identify the same vulnerability.

B. However, if a separate open source project registers a similar vulnerability as a GHSA and asserts it is the same as the vulnerability described in CVE-2024-56789, this assertion may meet with skepticism in the CVE Program, since the two registrations were not by the same team. Since there is no easy way to resolve a dispute like this, the only safe policy is to accept two registrations as being for the same vulnerability only if they were both created by the same organization or person. If that is not the case, the two registrations need to be considered different vulnerabilities.

C. Libraries are widely used by both open source and commercial developers. Usually, a vulnerability will be present in just one module of a library, not all of them. However, since CPE names identify the product that contains the vulnerability and the library itself is the product, this means a CPE name will not usually refer to the vulnerable module[ii].

By contrast, purl (“package URL”) identifies a package. Since each module of a library is its own package, this makes it possible to identify the location of a vulnerability with much more precision.[iii] Thus, there can be no CPE “equivalent” of a purl that references a single library module.

The primary lesson to be drawn from the above examples is that, because there are so many reasons why one type of vulnerability or product identifier will not be “translatable” to another type, it would be a bad idea to try to “harmonize” the identifiers into one type – for instance, make purl the “universal” product identifier or CVE the “universal” vulnerability identifier, with all other identifiers “translated” to one or the other. On the other hand, if it might benefit a vulnerability database user to learn about a vulnerability or vulnerable product that is like the one included in the response to their query, the GVD will usually provide both the exact and the similar match.

This means that, even though the user will usually enter a straightforward query that lists just one or two product identifiers, the response will not necessarily be limited to the same identifiers. The GVD will always assume that the user is interested in seeing as much relevant information as possible, even if they end up discarding some of what they are shown.[iv]

Here are two examples of how a single query might work:

Query 1: “What current vulnerabilities have been identified in the open source project Django version 5.2?”

The query is parsed into three queries to three vulnerability databases:

·        To the NVD: “What vulnerabilities affect Django version 5.2?” The response to this query is this list of four CVE numbers. Each of those can be queried separately for more information on the vulnerability.

·        To GitHub Advisory Database (GAD): “What vulnerabilities affect Django version 5.2?” The response to this query is this list of two CVE numbers, which are both included in the NVD response. The first of the two CVEs corresponds to the GitHub ID GHSA-7xr5-9hcq-chf9, which can be searched on separately. The second CVE corresponds to GHSA-8j24-cjrq-gr2m, which can also be searched on separately.   

·        To Sonatype OSS Index: “What vulnerabilities apply to purl pkg:pypi/django@5.2?”[v] The response to this query is this list of two CVEs. These are the same CVEs shown by the GitHub Advisory Database. However, clicking on either of the CVE lines provides additional information not provided by either the NVD or GAD.

All three results will be provided to the user, as well as results from queries to any other vulnerability database like OSS Index or OSV, if different results are obtained. Note that, while the NVD and GAD queries are identical, the OSS Index query uses the purl for Django v5.2.[vi]

Query 2: “What products are affected by CVE-2021-45046?”

The query is parsed into two queries to two vulnerability databases:

·        To the NVD: “What products are affected by CVE-2021-45046?” The response to this query identifies twelve “Known affected software configurations”, which among them list over 50 CPE names.

·        To GitHub Advisory Database: “What products are affected by CVE-2021-45046?” The response to this query illustrates the fact that there is not always a list of machine-readable software identifiers available. The primary feature of this page is the set of references – security advisories by various developers and manufacturers, including patch URLs. These references need to be parsed “manually”.

Of course, even though the response from the NVD includes machine readable software identifiers and the response from the GAD does not, that doesn’t mean the two responses should not be displayed together. Both responses provide a set of references; it is unlikely that the two sets are identical. Since most queries about CVE-2021-45046 are probably motivated by a search for a patch (this is one of the vulnerabilities associated with the log4shell vulnerability in the log4j library), users will want to see as many references as possible. 

The moral of this story is that a query to the Global Vulnerability Database will usually yield multiple responses. These will include

1.      Responses from databases other than the one originally intended in the query, as well as

2.      Responses generated from queries using identifiers that are similar to, but not the same as, the identifier used in the query.

Of course, the additional queries will not be generated by some mechanistic process, but rather by an intelligent process that will run in the “front end” of the GVD. Does this mean that the front end will run a large language model created by generative AI? No. My opinion (which I’ll be glad to discuss with anybody who thinks differently) is that the decisions on alternative queries in the GVD need to be based on a set of identifiable rules that can be audited.[vii] 

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. And please donate as well!


[i] The exception is the OSV vulnerability database.

[ii] In some cases, the person who creates the CPE name creates a “product name” that includes the names of both the library and the vulnerable module. However, there is no consistent procedure for doing this, so it cannot be used for an automated response.

[iii] Because software developers often do not install library modules that are not directly used by their product, this means that a lot of patches for libraries are issued and applied needlessly, since the vulnerable module was never included in the product in the first place. This was the case with the log4shell vulnerability in the log4j library.

Log4shell affected just the log4core module, meaning any developer that had not installed that module didn’t need to patch the library. However, since vulnerability advisories that referred to the CPE name (and thus only designated the log4j library as vulnerable, not the log4core module) didn’t capture this subtlety, many developers probably fell into this category.

[iv] Since some users will not be interested in seeing close matches, a GVD user will be able to suppress display of any match except an exact one. In that case, the output they receive will be close to what they will receive from a search on a single database.

[v] A purl can be easily created using a simple formula and information that a user should have readily available (or else be able to find quickly). In this case, the user just needs to know the package name, version number, and the repository from which they downloaded the package. The repository (known as the purl “type”) is PyPI, which stands for Python Package Index.

[vi] Every purl has a “type” that usually indicates the repository from which the software was downloaded. The purl in this example has the type “pypi”, which refers to PyPI, the Python Package Index. If Django is not available in other repositories than PyPI, this means there is only one possible purl to use in a search for Django in OSS Index. However, if Django were available in other repositories (e.g. package managers), each of those could be used for a separate search in OSS Index, by simply replacing “pypi” with the type for the other package manager and then re-running the search. 

While it might seem odd to search the same vulnerability database three times for the same product name and version number, there is a good reason for doing this: There can be no assurance that a vulnerability that applies to a particular product/version in one package manager will also apply to the “same” product/version in a different package manager. In other words, purl treats products with the same name and version number as different products if they are found in different repositories.

[vii] This is like an early type of AI called “expert system”. These systems were literally created by interviewing an expert in a certain process (e.g., operation of a machine in a manufacturing plant) and codifying their advice into a set of rules. A simulation of the process would then be run, governed by these rules; the rules would be iteratively tweaked to improve the outcome of the process. After the process was running smoothly in the simulation, the rules would then be tested on the physical process itself.

The most important aspect of this procedure was that any change in the rules could be audited. If a rule was changed but that didn’t improve the process, the change would be backed out and a different change would be tried.

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?