Tom Alrich's Blog: February 2025

Monday, February 24, 2025

Why I don’t like security risk scores very much

In some disciplines, risk is an ancillary consideration, meaning that mitigation of risk is more of a necessary distraction than the primary goal. For example, the main objective of a contractor building an office building is to follow best construction practices, basing their work on the architect’s design. However, there are always imperfections in a design, in the materials used, in the people charged with performing the work, etc.

These imperfections introduce risk into the construction process. The contractor needs to constantly identify these risks and take steps to mitigate them, e.g. by closely supervising less experienced employees. However, mitigating these risks isn’t the contractor’s main job; that is getting the building up on time and within the budget.

However, security (whether physical or cybersecurity) is all about risk mitigation. The world is full of security risks. A security professional’s main – in fact, their only – job is to mitigate those risks to the greatest extent possible. Of course, the professional is doing this to make other processes run more efficiently and with less danger to participants in the process.

For example, a cybersecurity professional at a bank is charged with ensuring that the daily activities of the bank are carried out without funds being lost in cyberattacks. If there were no cybersecurity threats to these activities – as would have been the case 100 years ago – there would be no need for this person to be employed by the bank.

Since there are almost an infinite number of cybersecurity risks to be mitigated, it turns out that one of the primary jobs of a cybersecurity professional is prioritizing those risks based on their magnitude. It stands to reason that big risks should be mitigated before small risks – and some small risks aren’t worth mitigating at all. This means it is hard for this person to do their job if they can’t easily measure the risks they face.

If cybersecurity were a physical science, measuring risk would not be a problem. For example, if a bank cyber professional wanted to determine the degree of risk that a particular type of account would be hacked, they would “simply” a) identify all the possible vectors by which a hacker could penetrate the account, b) determine exactly the degree of damage that is possible from exploitation of each of those vectors (measured in dollars lost), and c) add all those damage amounts together.

The sum would be the maximum possible loss from hacking attacks on that type of account. The cybersecurity professional would then design controls that would block each possible attack vector, keeping in mind that the cost of the controls can’t exceed the maximum possible loss (if it did, the bank would be better off buying insurance against cyberattacks, while implementing minimal controls).

However, it is almost never possible to identify all possible attack vectors, especially because hackers are creative people and discover new vectors every day. And even if you could identify all vectors, you could never determine the maximum loss from exploitation of each of the vectors.

Because of this uncertainty, security professionals need shortcuts to measure degree of risk. Perhaps the most popular shortcut is a security score. The score incorporates components that can contribute to the risk. For example, the CVSS Base Score for a software vulnerability includes attack vector, attack complexity, privileges required, user interaction, compatibility impact, integrity impact, availability impact, and scope. Each component is assigned a weight, which presumably can be adjusted (by the members of FIRST.org, which maintains CVSS) based on experience.

Other security scores are similar. For example, security scores for product vendors are based on previous attack history, beaconing activity from bots that are “phoning home” from the vendor’s network, controls that are in place or should be in place, etc.

All these components contribute to the vendor’s cybersecurity risk, but not in the deterministic way that, for example, the number of cars crossing a bridge every day contributes to the risk that the bridge will fail. It’s at least possible in principle to calculate exactly how much stress each car puts on the bridge and determine how strong the bridge’s supports need to be. Obviously, a bridge that has a small amount of traffic does not need to be built to the same specifications as one that carries thousands of cars and trucks a day; that’s just physics.

However, there’s no way, even in principle, to say that the values of the CVSS components listed above, and the weights assigned to each value by the CVSS formula, lead to a certain value on a 1-10 scale, which of course is what CVSS provides. This is because the CVSS formula is based just on prior statistics and a lot of guesswork. There is no way to precisely model how the degree of risk of a vulnerability is determined.

In other words, risk scores are inherently black boxes that ingest a bunch of inputs and output a score. Even if you know the weight assigned to every component in determining the score, there is no way you can determine that one weight should be X, while another should be Y. Risk scores are inherently not understandable. That’s what bothers me most about them. The fact that CVSS scores have two significant digits is, as far as I can see, just a way to show the creators had a sense of humor. And in fact, most users of CVSS just group the scores into critical, high, medium and low categories.

Of course, I’m certainly not advocating that risk scores be done away with, nor am I saying that I don’t advocate they should be used in most cases. But I do want to point out that it’s possible to develop a simple and understandable model that will track actual security (or lack thereof), if you’re OK with not having a numerical score.

I recently came across an example of this. It’s from a series of YouTube videos by Ralph Langner, who discovered Stuxnet. He was talking about how we finds the components of the CVSS score to be more helpful than the score itself (I’ve heard this from other people as well). He pointed out that tracking all three of the items below provides “a pretty good filter” for the likelihood that a CVE will be exploited in the wild:

1. Presence of the CVE in the CISA KEV Catalog, meaning the CVE has already been actively exploited;

2. Whether exploitation can be accomplished through a routed connection, or whether it requires local access. This is the “attack vector” metric in the CVSS vector string.

3. Whether exploitation does not require a complex attack. This is the “attack complexity” metric.

The difference between these three items and a CVSS score (or an EPSS score, which also measures exploitability) is that they’re all readily understandable. They’re not combined in a score and they’re not intended to be. I think what Ralph was saying is that it’s helpful just to look up all three items for every vulnerability that is of concern and consider them together – without worrying about normalizing them, determining weights, etc. Leave that to the scoremakers.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.

Monday, February 17, 2025

NERC CIP: What lessons can we learn from the failure (so far) of the new BCSI requirements?

I’ve mentioned many times that last year a new NERC Standards Drafting Team started the long process of developing new or revised CIP standards to make full use of the cloud completely “legal” for systems and information subject to NERC CIP compliance. This is the second SDT that has addressed the cloud problem, in whole or in part. In 2019, a similar team started meeting to draft changes to multiple CIP standards that would enable BES Cyber System Information (BCSI) to be stored in the cloud, as long as it was appropriately protected and “securely handled”.

I wrote about that drafting team in this post in 2020. The team ultimately produced two revised CIP standards, CIP-004-7 and CIP-011-3; these came into effect on January 1, 2024. When I wrote the post in 2020 and when the standards came into effect, I thought they were perfect for what they were intended to do: require logical (as opposed to physical) protection for BCSI stored in the cloud. This change was needed to make it possible for a NERC entity to store BCSI in the cloud without falling out of compliance with the CIP standards.

That drafting team was constituted to solve a problem caused by a requirement part (in the then-current CIP-004-6) that mandated physical protection for servers in which BCSI might be stored, if these servers were outside the NERC entity’s Physical Security Perimeter (PSP). Of course, any requirement to protect information, by applying special physical protection for individual devices that it’s stored on, won’t work in the cloud. In the cloud, information moves constantly from server to server and data center to data center; this is required by the cloud business model.

When that drafting team first met, there wasn’t much disagreement about what they needed to do: remove the requirement for special physical protection of BCSI stored outside of a PSP and replace it with a requirement that allowed BCSI to be logically protected instead. In other words, the revised CIP standards would allow NERC entities to protect BCSI at rest or in transit by encrypting it (other methods of protecting the data are permitted, but it’s likely that encryption will almost always be the option of choice). If someone can access the encrypted data but they don’t have the keys needed to decrypt it, they obviously don’t really have “access” to the data.

It seemed to me that CIP-004-7 and CIP-011-3 were just what the doctor ordered. Therefore, starting in January 2024 I expected to see lots of BCSI moving to the cloud. However, it turns out that the drafting team – and a lot of other people like me – didn’t recognize that merely storing BCSI in the cloud isn’t much of a use case. BCSI is never voluminous, so it can easily be stored within the on-premises ESP and PSP, where it’s very well protected and easily backed up. In itself, cloud storage of BCSI doesn’t solve a problem.

It turns out that the real problem was that the previous requirement for physically protecting BCSI prevented NERC entities from using SaaS that required access to BCSI - especially online configuration and network management applications. After all, SaaS applications never reside in a defined physical location in the cloud, any more than data does. Since BCSI had to be confined to particular physical locations with special protection, that meant cloud-based SaaS could never use BCSI without putting the NERC entity into a position of non-compliance, even if the BCSI was encrypted. This was unfortunate, since there wasn’t much argument that encryption provides a much higher level of data security than just preventing unauthorized physical access.

Thus, I expected there would be a big surge in use of SaaS applications that utilize BCSI when the two revised CIP standards came into effect on New Year’s Day 2024. Yet as far as I know - i.e., relying on statements made by NERC entities and SaaS providers – today (more than one year later) there is literally zero use of SaaS by NERC entities with high or medium impact CIP environments.

Why is this the case? The answer is simple: Ask yourself when you last saw guidance (or at least guidelines) from NERC or any of the NERC Regions on use of BCSI in the cloud…You’re right, no official guidance has yet been published since the two standards came into effect.[i] In fact, I’ve seen close to nothing written by non-official sources, other than my blog posts.

And NERC entities aren’t the only organizations that need guidance on complying with CIP-004-7 and CIP-011-3. The SaaS provider needs to furnish their NERC CIP customers with some simple compliance documentation; there’s been no official guidance on that, either.

However, the group that I feel sorriest for isn’t the NERC entities or the SaaS providers – it’s the drafting team members. How would you feel if you’d dedicated a good part of a couple years of your life to making some changes to the CIP standards, yet it turns out that those changes aren’t being used at all, more than one year after the changes came into effect? This shows that just drafting new or revised CIP standards and getting them approved by NERC and FERC isn’t enough. NERC entities need to clearly understand what they need to do to comply. Moreover, they also need to understand what compliance evidence to require from third parties – in this case, SaaS providers.

This is an especially good lesson for the members of the current CIP/cloud drafting team. They already have a long road ahead of them. If they reach the end of that road and find that the NERC community is rushing to make full use of the cloud, that will be a great feeling. On the other hand, if they come to the end of their road and realize that few NERC entities are even trying to use the cloud in their OT environments – because the new standards are too complicated or because nobody has made an effort to explain them to the community - how do you think they’ll feel? I know how I would feel…

If you are involved with NERC CIP compliance and would like to discuss issues related to “CIP in the cloud”, please email me at tom@tomalrich.com.

[i] NERC endorsed this existing document as “compliance guidance” in late December 2023. However, it wasn’t originally written to be compliance guidance, and its implications for compliance aren’t always clearly stated.

Thursday, February 13, 2025

Better living through purl!

The CVE Program is getting ready to adopt purl as an alternate software identifier to CPE in CVE Records. When this goes through, software users will be able to use purl to look up open source software products and components that are affected by CVEs. They will be able to do this in several major vulnerability databases, and perhaps later they will be able to do this in the NVD.

However, end users aren’t the only organizations that will benefit from purl being used in the “CVE ecosystem”; in fact, they’re not even the biggest beneficiaries. Here are what I believe are the most important groups who will benefit and why:

1. Software developers that utilize open source components in their products. Unlike CPE, purl currently focuses on just one type of software: open source software distributed by package managers. 90% of the code in most software products today, whether open source or proprietary, consists of open source components. Therefore, it’s important that developers be able to learn about vulnerabilities in those components.

To look up an open source component in a vulnerability database, the developer needs to know the identifier for the product. If the developer wants to use the National Vulnerability Database (NVD), they will first need to search for the CPE name using the CPE Search function in the NVD. However, finding the correct version of the component is challenging. For example, here is the search result for the popular Python product “django-anymail”. You will have to figure out which of the 34 CPE names is the one you need.

On the other hand, if the developer wants to learn the purl for django-anymail, they don’t need to look anything up in an external database. Instead, they just need to know three pieces of information, which they presumably already have:

1. The purl type, which is derived from the package index name, PyPI;

2. The package name, django-anymail; and

3. The (optional) version string, e.g. 1.11.1.

Using these three pieces of information, the developer can easily create the purl: “pkg:pypi/django-anymail@1.11.1” (“pkg” is the prefix to every purl). Note that no database lookup was required!

2. CVE Numbering Authorities (CNAs). As described in this post, a CNA is an organization that reports software vulnerabilities to CVE.org in the form of CVE Records. Today, the only software identifier found in CVE Records is CPE. Unfortunately, more than half the CVE Records created last year – about 22,000 out of 39,000 – don’t contain a CPE and thus can’t be found with simple searches.

However, if purl were also supported in CVE Records[i], the CNA could create the purl for an open source product, exactly as the supplier in the earlier example would create it. Meanwhile, a user searching a vulnerability database for the product could create the same purl. Barring a mistake, the user should always be able to locate the same product and thus learn of any vulnerabilities reported for it.

CNAs that report vulnerabilities for open source software will benefit from two other features that are unique to purl:

a. Every module in an open source library can have a purl, not just the library itself (as is the case with CPE). If a vulnerability is found in only one module of a library, it would be much better for the CNA to report the vulnerability as applicable just to that module. This is because the developer will often include in their product just the module(s) that is required, rather than the whole library. If the CNA reports just the module as vulnerable, a developer that didn’t include that module in their product can (perhaps proudly) announce to their customers that the vulnerabiy doesn’t affect their product and no customer needs to patch it.[ii]

b. Similarly, if a product is found in multiple package managers but the CVE Record includes the purl for only one of them, this means there’s no need to patch the product in the other package managers. However, because CPE doesn’t have a field for a package manager, most users won’t learn of this. Again, this means there will often be wasted patching effort.

Another advantage that CNAs cite for purl is the fact that a purl doesn’t have to be created by any central authority. Every product available in a package manager (or another repository that has a purl “type”) already has a purl, even if nobody has written it down yet. By contrast, CPEs must be created by a NIST contractor that works for the NVD. They often take days to create (if they are created at all). This delays the CNA in submitting a new CVE Record.

3. End users. This term refers to the ultimate consumers of vulnerability information, even if they receive it through some sort of service provider. Their primary concern is completeness of the data. That is, they need to know about all the vulnerabilities that affect the software products they use. Of course, there’s hardly any end user today that doesn’t have about a year’s backlog of patches to apply, so it’s not like they are breathlessly anticipating more vulnerabilities.

Any organization with such a backlog owes it to themselves to learn about every newly released vulnerability. They (or perhaps a service provider acting on their behalf) need to feed all those vulnerabilities into some algorithm that prioritizes patches to apply, based on a combination of a) attributes of those vulnerabilities like CVSS or EPSS score and b) attributes of the assets on which the affected software product resides, such as proximity to the internet or criticality for the business.

CVE is of course the most widely followed vulnerability identifier worldwide. However, just learning about a CVE doesn’t do an organization much good unless they also learn about a product or products that is affected by the vulnerability – and if the organization utilizes one of those products. Because of this, CVE Records always identify the affected product(s). When the CNA creates a new CVE Record, they do this in textual form.

After the CNA creates a new CVE Record, they submit it to the CVE.org database. From there, the NVD downloads all new CVE Records. Up until last year, NVD staff members (or contractors working for the NVD, which is part of NIST) added one or more CPE names to almost every CVE Record. They do (or did) this because, with over 280,000 CVE Records today, it’s impossible to learn the products affected by each CVE simply by browsing through the text included in the record. There needs to be a machine-readable identifier like CPE or purl to search on.

But here’s the problem (already referred to above): Almost exactly a year ago, the NVD drastically reduced the number of CVE Records into which they inserted CPE names, with the result that as of the end of the year, fewer than half of new CVE Records contained a CPE name. The problem seems to have continued into this year. A CVE Record that doesn’t include a machine-readable software identifier will be invisible to an automated search using a CPE name. In other words, if someone searches using the CPE name for a product they utilize, the search will miss any CVE Record that describes the same product in a text field, if the record doesn’t include the product’s CPE. Moreover, the person searching won’t have a way to learn about the missing CPE names.

Because of this problem, end users can’t count on being able to learn about all CVEs that affect one of their products. If purl were implemented as an alternative identifier in CVE Records and if the CNA had included a purl in a CVE Record for the product, then a search using that purl would point out that the product was affected by the CVE. Implementing purl is needed for ensuring completeness of the vulnerability data, at least for open source software (and open source components in proprietary software).

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Also email me If you would like to participate in the OWASP SBOM Forum's efforts to pave the way for including purls in CVE Records, or donate to it through a directed donation to OWASP, a 501(c)(3) nonprofit organization.

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.

[i] Currently, the CVE Record Format includes a field for purl. However, it’s not being used at all today, mainly because there’s been no training or encouragement for the CNAs. That will hopefully change soon.

[ii] In fact, this was the case with the log4j vulnerability CVE-2021-44228. It was reported as affecting the log4j library, but in fact it just affected the log4core module. Since a CPE for a library refers to the library itself, not one of the modules, this meant that in 2021-2022, a lot of time was wasted worldwide in patching every module in log4j.

Monday, February 10, 2025

How can NERC audit risk-based requirements?

The Risk Management for Third-Party Cloud Services Standards Drafting Team is off to a running start this year. I say “start”, because during the 5-6 months that the SDT met last year, they didn’t even start to draft any standards. Instead, they were doing what NERC drafting teams sometimes do: re-think the Standards Authorization Request (SAR) that forms their “charter”. They spent the entire fall discussing what they are going to discuss when they start drafting the new (or revised) cloud standards.

There is nothing wrong with doing this, especially when the SDT has an especially weighty burden – and it’s hard to think of a NERC CIP SDT that’s had a weightier burden than this one has, except perhaps the team (called “CSO706” for Cyber Security Order 706) that drafted CIP versions 2,3,4 and 5. In fact, one member of that team, Jay Cribb, is a member of the cloud team. The CSO706 team first met in 2008. Their last “product”, CIP version 5 (essentially, the version that’s still in effect today), came into effect in 2016.

In my opinion, one essential attribute for any requirements they create is that they be risk-based. That’s my term, but NERC refers to them as “performance-based”. While some CIP requirements today are truly risk-based (even though they may not mention the word “risk”), others are not.

In fact, a small number of CIP requirements like CIP-007 R2 patch management and CIP-010 R1 configuration management are highly prescriptive, and require compliance on the physical or virtual device level. Cloud service providers don’t track systems based on the device on which they reside, since doing so would require breaking the cloud model. This means they will never be able to provide the evidence required for a NERC entity customer to prove compliance with these prescriptive requirements.

This is why I think all CIP requirements going forward, but especially requirements having to do with use of the cloud, need to be risk-based, and can’t refer (even implicitly) to devices at all. In fact, since CIP v5 came into effect in 2016, I believe that all subsequent CIP requirements and some entire standards, including CIP-012, CIP-013, CIP-014, CIP-003-2, CIP-010-4 and others, have been risk-based (some more than others, truth be told).

The problem with risk-based NERC CIP requirements today is there has been very little guidance to NERC entities or Regional auditors on how to comply with or audit risk-based CIP requirements. This was most vividly demonstrated in the Notice of Proposed Rulemaking (NOPR) that FERC issued in September regarding CIP-013, which is an entirely risk-based standard. In my post on the NOPR, I quoted the following section found near the end:

…we are concerned that the existing SCRM Reliability Standards lack a detailed and consistent approach for entities to develop adequate SCRM (supply chain risk management) plans related to the (1) identification of, (2) assessment of, and (3) response to supply chain risk. Specifically, we are concerned that the SCRM Reliability Standards lack clear requirements for when responsible entities should perform risk assessments to identify risks and how those risk assessments should be conducted to properly assess risk. Further, we are concerned that the Reliability Standards lack any requirement for an entity to respond to supply chain risks once identified and assessed, regardless of severity.

In other words, FERC issued the NOPR because they do not think NERC did a good job of drafting either CIP-013-1 or CIP-013-2. They are considering ordering NERC to revise CIP-013 so it truly requires NERC entities to develop and implement a supply chain cyber risk management plan.

I agree with FERC’s opinion, but I want to point out that just asking NERC to re-draft CIP-013 will not necessarily fix the problem. This is because today NERC entities don’t know how to comply with a risk-based standard within the NERC auditing framework. It is also because most CIP auditors have limited experience in auditing risk-based requirements.

Rather than repeat this sorry story with the new cloud standards, it’s important that NERC and the Regions figure out how risk-based requirements can be audited.

What I call risk-based requirements are what NERC calls “objective-based” requirements. I used to think the two terms were synonymous, but I now realize they’re complementary. A requirement to achieve an objective inherently requires the entity to identify risks it faces; the entity must formulate a plan to assess those risks and mitigate them. Of course, “mitigate” doesn’t mean “eliminate”; it just means “make better”. Since no entity has unlimited resources available, a plan to mitigate risks will always leave some risk on the table; of course, that is called residual risk.

This will be easier to understand if we make up an example. Suppose a contractor has agreed to build a new building. The customer requires them to develop a plan to identify and mitigate the risks that could prevent them from finishing the building on time: inclement weather, materials shortages, etc.

The contractor lists all the risks, assesses the likelihood that each risk will be realized, and determines the impact (in this case, days of delay) if the risk is realized. For each risk, the contractor multiplies likelihood times impact and determines the expected delay if the risk is realized.

In this example, the objective is finishing the building on time, while the risks are the different possible causes of delay. So, “risk-based” and “objectives-based” always go hand in hand. If you’re required to achieve any objective, you always must mitigate risks, and if you need to mitigate risks, the only way you can identify them is to know the objective you’re trying to achieve. If there’s no objective, there’s no risk and vice versa.

In the case of CIP-013, the objective is to secure BES Cyber Systems (and also EACMS and PACS) by making sure the suppliers of those systems follow secure policies and practices. Of course, the risks have to do with suppliers not following secure policies and practices – for example, in software development or adequately vetting their employees.

However, what does CIP-013 require today? Only that NERC entities develop a plan to “identify and assess” supply chain cybersecurity risks to BCS, EACMS and PACS. There is no indication of what those risks are. R1.2 lists six specific controls that NERC entities must practice, but those were never intended to be the entirety of supply chain security risks. Rather, they were six items that FERC included at various random places in their 2016 order to develop a supply chain standard; the drafting team just decided to gather them in one place. However, far too many entities limited their CIP-013 programs to just those six controls and ignored the requirement to “identify and assess” risks altogether. This was one of the main reasons why FERC issued their NOPR last year.

Here's how I would rewrite CIP-013 (and I’ve been saying this for years): I wouldn’t require the entity to take specific actions (although it wouldn’t be the end of the world if R1.2.1 – R1.2.6 were allowed to remain in the standard). However, I would require that the plan address specific areas of risk. These can include secure software development practices, vetting employees for security risks, policies to ensure secure remote access to devices inside an ESP (i.e., not just what CIP-005 R2 requires), etc.

For each of those areas, the entity would need to identify and assess supply chain risks. If they say one of those areas doesn’t apply to them, they would need to explain why. For example an entity’s reason for not looking at risks from remote access might be that the entity only allows its own employees to access devices in their ESPs remotely.

For all the other areas of risk, the entity will need to identify a set of risks that they will address in their plan. For example, in the software security area of risk, individual risks include an insecure development environment, the supplier not reporting vulnerabilities when the software is being used by customers, etc.

How will this process be audited? It will come down to the judgment of the auditors that the entity did a good job of identifying and assessing risks in each area of risk. However, a lot of NERC entities are deathly afraid of having to rely on the judgment of their auditors. This isn’t because the auditors don’t exercise good judgment in general (e.g., they have lots of car accidents), but because NERC won’t ever take a stand on what a requirement means and provide true guidance.

If NERC did that, auditor judgment would become a non-issue, since both the entity and the auditor would rely on NERC’s guidance. A guidance document would list the major areas of supply chain cybersecurity risk, as well as the major risks in each of those areas. For each major risk, the entity would need to a) present a credible plan for mitigating that risk, or b) explain why the risk doesn’t apply in their case.

However, NERC doesn’t issue its own guidance on compliance, because to do so would amount to “auditing itself”. That is, if NERC tells the entity how to comply with a requirement, they would…what? Take all the fun out of compliance by making it a dull paint-by-numbers exercise? If the entity thereby achieves exactly what NERC wants them to achieve by following their guidance, why not encourage that behavior?

I’ve also heard other excuses for NERC’s policy, including that it’s included in GAGAS - although nobody has shown me where it says that. What this discussion usually comes down to is someone saying that the NERC Rules of Procedure (RoP) include an admonition against NERC developing its own guidance. Nobody has shown me that either, although I’ll admit I haven’t asked a lot of people about this.

But let’s stipulate that the RoP does prevent NERC from providing compliance guidance to NERC entities. Why would that be so terrible? After all, since NERC’s guidance will presumably reflect their view of what’s best for the Bulk Electric System, wouldn’t it be better for the BES if all NERC entities followed that guidance than if they didn’t? What is gained by withholding that information?

I think the problem here is that NERC has based their auditing on financial auditing, where it’s very important that auditors not offer guidance that dishonest financial institutions could distort to justify improper practices. However, cybersecurity is inherently a risk management exercise, in which one practice might mitigate one risk but not another; therefore, an auditor needs to exercise judgment regarding whether a particular control is effective in a particular situation. Finance isn’t that sort of exercise.

The moral of this story is that auditing risk-based requirements won’t work without the auditors being able to exercise judgment. Of course, the auditing rules (presumably in the RoP) will need to require that auditors distinguish between an entity that made an honest mistake in managing a risk and an entity that decided to ignore a risk entirely because they didn’t feel like bothering with it.

And this, boys and girls, is why I think the “cloud SDT” needs to be prepared for a very long ride. I think they have to deal with this problem of auditing risk-based requirements, which may require changes to the Rules of Procedure. If they don’t do that, they’ll most likely end up repeating the CIP-013 experience: creating a standard that, even if it’s initially approved by FERC, turns out not to be very effective and ends up requiring a complete rewrite.

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.

Monday, February 3, 2025

AI and Critical Infrastructure

The time isn’t far off when critical infrastructure (CI) industries, including the electric power industry, will face overwhelming pressure to start using AI to make operational decisions, just as AI is probably already being used to make decisions on the IT side of the house. Even the North American Electric Reliability Corporation (NERC), which drafts and audits compliance with the NERC CIP (Critical Infrastructure Protection) standards, acknowledged that fact in a very well-written document they released last fall.

However, while it’s certain there will be lots of pressure for this in all CI industries, it’s also certain this won’t happen without some sort of regulations being in place, either mandatory (as in the case of NERC CIP) or voluntary, as is likely in CI industries without mandatory cyber regulations in place, like manufacturing. My guess is those industries will develop their own regulations through industry bodies like the ISACs, since the manufacturers themselves are probably as afraid of the harm that aberrant LLMs could cause as everyone else is.

I used to think that AI security regulations for CI would need to be very much in the weeds, with restrictions on how the models can be trained, etc. However, I now realize that trying to do that will be a fool’s errand, since in fact there only need to be four rules:

1. An AI model can never be allowed to make an operational decision on its own. It can only advise a human, not make the decision for them.

2. The human can’t face a time limit, so that if they don’t decide to do something in X minutes, the model will decide for them.

3. If the human doesn’t make the decision at all, the model can’t raise any objections. We don’t need humans succumbing to “peer pressure” from LLMs!

4. The human can’t be constrained by policies to accept the model’s recommendation. The decision must be theirs alone, including the decision not to do anything for the time being.

Of course, you might be wondering about time-critical decisions, like the millisecond-level “decisions” that are sometimes required in power substations. Those decisions need to be made like they are today: by devices like electronic relays or programmable logic controllers that operate the old-fashioned way: deterministically.

Perhaps one day AI will be so reliable that it can be trusted to make even those decisions on its own. But that day is probably far in the future and may never come at all. Once AI can be as intelligent as the extinct nematode worm Caenorhabditis elegans, whose genes constitute 60% of the genome of humans and almost all other animals, I might be persuaded to change my mind.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Also email me If you would like to participate in the OWASP SBOM Forum or donate to it (through a directed donation to OWASP, a 501(c)(3) nonprofit organization).

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.