This post is available to free and paid subscribers to my Substack blog. If you're not already a subscriber, you can sign up when you open the post.
Thursday, September 4, 2025
Tuesday, September 2, 2025
Q: Is the NVD digging out of the deep hole it dug for itself since Feb. 2024? A: No
This post is available to free and paid subscribers to my Substack blog. If you're not already a subscriber, you can sign up when you open the post.
Saturday, August 30, 2025
The “Cloud CIP” drafting team takes on vulnerability management!
This post is available to free and paid subscribers to my Substack blog. If you're not already a subscriber, you can sign up when you open the post.
Friday, August 29, 2025
A serious new cloud vulnerability – fortunately, it’s fixed. We might not be so lucky the next time.
This post is available to free and paid subscribers to my Substack blog. If you're not already a subscriber, you can sign up when you open the post.
Monday, August 25, 2025
What’s happening with CVE? Where is it going?
This post is available for both free and paid subscribers to my Substack blog. Go here to subscribe and read the post.
Thursday, August 21, 2025
You want answers about software security? No problem! Oh, you want reliable answers?…Why didn’t you tell me?
Note from Tom:
Note from Tom: I am now putting up all my new posts in my
Substack blog, but only select ones in Blogspot. If you would like to read the
full version of the post below, please subscribe to my Substack blog on a paid
($30 per year) basis by clicking on the link for a 7-day free trial below. This
will let you see all of my new posts. It will also let you see all 1200+ legacy
posts that I originally put up in Blogspot, but which are now available to all
Substack subscribers.
This week, a friend of mine, Bill Jacqmein, sent me a link
to this article
by Adam Isles, a cybersecurity consultant with the Chertoff Group. While it’s a
good article and I don’t find any factual problems with it, the problem I do find
is one that I’ve found in a lot of articles and blog posts by consultants,
including more than a few of my posts: a failure to disclose that the fancy
tools and services you’re recommending for the readers have about a snowball’s
chance in hell of being successfully used by any but a few readers or, even
worse, are simply unusable.
Of course, the reason why consultants do this is because they
think their readers, after they realize the recommendations don’t work, will
immediately contract them to provide those same services. However, ask the man who
knows: It almost never works that way. I have identified multiple examples of this
problem in Adam’s article. I will describe the first example in this post and other
examples in subsequent posts.
Here's my first example. The last sentence of the paragraph
that begins with “Legacy code…” points out that modern software contains many
components written by third parties; those components in turn contain many
components, etc. The paragraph concludes with the sentence, “That means the
software supply chain security problem goes back dozens or even hundreds of
upstream software suppliers, requiring each of the next-hop upstream suppliers
to manage this risk.”
The implicit message of this sentence is, “With all these
upstream tiers of suppliers, you shouldn’t even think of trying to learn about
vulnerabilities in the software you use on your own. Just think of all the upstream
components, components of components, etc. To get a realistic inventory of all
the vulnerabilities in just one software product that you use, you will
probably need to learn about the vulnerabilities found in thousands of components.”
….
Keep reading with a 7-day free trial
Subscribe to Tom Alrich's Blog, too to
keep reading this post and get 7 days of free access to the full post archives.
A paid subscription gets you:
|
Access to all posts starting in 2013 |
|
Post comments and questions in the group chat for this
blog |
|
Know you're helping Tom continue to write his posts |
Sunday, August 17, 2025
What cloud risks does CIP need to address? (full version)
Note from Tom: I am now putting up all of my full posts in my
Substack blog, but only select ones in Blogspot. Even though I put this up as a paywalled post on Blogspot two days ago, I'm now making the full post available because of its importance for the NERC CIP community. Enjoy this post, but please
also subscribe to my Substack blog on a paid ($30 per year) basis. You will be able to read of my new posts, as well as all 1200+ “legacy” posts that I originally put
up in Blogspot - but which are now also available to paid Substack subscribers.
Two days ago, I participated in two lengthy web
conversations regarding NERC CIP and the cloud. The first was the bi-weekly
meeting of the informal Cloud Technical Advisory Group (CTAG), a group led by
Lew Folkerth of RF and Chris Holmquest of SERC. The group discusses how NERC
entities can make use of the cloud for systems that are subject to compliance
with the CIP standards.
A lot of this group’s discussions are about why there is so
little cloud use by NERC entities today. There are two main reasons:
1.
Most use of the cloud by high or medium impact BES
Cyber Systems (BCS) is impossible today, because the cloud service provider
(CSP) will not be able to provide the NERC entity with the required compliance
evidence for the over 100 CIP Requirements and Requirement Parts that apply to
high and medium impact BCS.
2.
At least two uses of the cloud for BCS are
possible today, including low impact Control Centers and medium or high impact
BES Cyber System Information (BCSI) used in SaaS applications. However, I know
of only a few instances of these uses today. I believe this is because neither NERC
nor the NERC ERO has provided guidance on how to do this, while maintaining CIP
compliance.
The second meeting was a normal meeting of the NERC Risk
Management for Third Party Cloud Services Standards Drafting Team. Of
course, this is very much an official NERC group; they are charged with
drafting new and/or revised CIP standards that should, when put in place, allow
much more extensive use of the cloud by systems subject to CIP compliance, than
is possible today. This team has been meeting for more than a year and a few
months ago started putting virtual pen to virtual paper to draft some real
requirements.
How long will it take before what I call the “Cloud CIP”
standards (i.e., the standards the SDT is drafting) come into effect? The new standards
will be in effect once they have a) been drafted (which will take a at least
1-2 more years), b) gone through at least four ballots by NERC members
(including a two month comment period after each ballot, plus another month for
the SDT to respond to the comments and make changes to the draft standards), c)
been approved by the NERC Board of Trustees, d) been submitted to FERC for
their approval, e) been approved by FERC, most likely more than a year after
they are submitted, and f) gone through a 2-3 year implementation period.
With some luck, all
the above steps will be completed by…drumroll, please…2031. However, I
think even this estimate may be over-optimistic. I say this because I haven’t allowed
for changes to the NERC Rules of Procedure (RoP) – and I think they are likely
to be needed. Since I don’t believe any new or revised CIP standard has ever
required an RoP change, nobody I know can even tell me how this can be
accomplished, except that it’s certain the SDT does not have the power to
do it themselves. Therefore, I can’t quantify the time required for this step. The
best case is that it might be possible to make the RoP changes during the
multiyear implementation period for the “Cloud CIP” standards. If that happens,
my estimated implementation year would remain 2031.
However, one thing I didn’t allow for in my 2031 estimate is
that the SDT might spend a year or two in what turns out to be totally
unproductive activity – for example, if the team spends 1-2 years polishing off
a set of standards that are subsequently rejected by NERC or FERC for being
unenforceable. Were this to happen the SDT would need to start over from the
beginning. This might sound like a joke, but it isn’t – it could very well
happen. Of course, this would be a big disappointment, since it would push the
implementation date for the new standards to 2033 or later.
This is why I’ve decided to create an online Cloud CIP Working
Group whose goal will be to help move up the implementation date for the Cloud
CIP standards. This group will be open to all members of the NERC community who
are concerned about both the arrival time and the quality of the Cloud CIP
standards; this includes NERC entities, NERC ERO staff members, software
vendors, CSPs, consultants, etc.
Fortunately, there is a way to accelerate development
of the new or revised standards. It is to start performing tasks that the SDT
will otherwise need to perform themselves, if they are to produce a quality
product. In other words, I propose to have this new group “break a path” for
the SDT by working on steps that the SDT will inevitably have to take in the
future, if they’re going to produce a worthwhile set of standards.
I want this group to start at the very beginning. The first question
that needs to be answered is, “What is the problem to be addressed in the new
standards?” Cybersecurity is about risk mitigation. A cybersecurity standard
needs to mitigate a certain set of risks. The risks addressed by CIP in general
are those that apply to systems used to maintain the reliability of the Bulk
Electric System.
The problem that the SDT is addressing is the fact that the
current CIP standards, except for three requirements having to do with BCSI
that came into effect on January 1, 2024, were written on the assumption that
all systems in scope will be on premises. At first glance, the solution to that
problem appears to be to rewrite the existing requirements so they take account
of the cloud (in fact, I believe this is the approach the SDT is currently
taking).
However, that approach doesn’t take account of the fact that,
while the wording of the current standards needs to be changed, the bigger
issue is that the entity will need to provide, for each requirement, evidence
that the CSP has complied with the requirement. The CSPs are more than willing
to make available their audit reports for ISO 27001, Soc 2 Type 2 audits, and
FedRAMP, but they are not willing or able to provide evidence for any
requirement that relates to specific devices or network configurations. This is
because cloud data is always spread across devices and data centers. In order
to comply with a requirement like CIP-007 R2 Patch Management, the CSP would
need to identify each device in any data center that might hold just one part
of a BCS during a 3-year audit period; the CSP would need to apply all of the
parts of CIP-007 R2 to every one of those devices.
Of course, this would literally be impossible, but even if
it were, the CSPs would - rightly – never agree to do it, because the cost of
doing so would be astronomical. Moreover, they certainly won’t be able to pass
that cost on to their NERC entity customers, who expect to be on the same rate
schedule as any other customer.
The cloud business model is based on providing the same
services to huge numbers of customers. If 100 NERC entities start requiring
compliance information for over 100 NERC CIP Requirements and Requirement Parts
(which is the number of requirements in scope for each BES Cyber System
or Electronic Access Control or Monitoring System – EACMS – deployed in the
cloud), that will be 10,000 pieces of information per system in scope. If
each of those entities has 100 BES Cyber Systems deployed in the cloud, that
will be one million pieces of compliance information required for just those
100 NERC entities.
This is one reason why any CIP requirement that refers to
particular systems or network configurations will need to be changed, if it is
to apply to cloud-based systems. Moreover, even requirements that don’t apply
to particular systems at all, such as the CIP-008 and CIP-009 requirements, have
a similar problem. This is because they require specific deliverables from the
CSP – e.g., an incident response plan and a backup and recovery plan, as well
as testing of those plans. The CSPs are no more inclined to prepare those plans
for individual NERC entities than they are to provide information on individual
devices.
However, even if the CSPs were able to provide compliance
evidence for the existing CIP requirements, I don’t understand why the drafting
team would even waste their time making the existing CIP requirements
“cloud-friendly”. This is because almost all those requirements are like
provisions that are already covered in ISO 27001 and Soc 2 Type 2 certifications,
as well as FedRAMP authorizations.
Of course, changing the existing CIP requirements won’t
work, since those requirements still need to apply to on-premises systems. What’s
needed is a separate set of Cloud CIP requirements that just apply to BES Cyber
Systems deployed in the cloud (I’ll call them “Cloud BCS”). Each of these would
be intended to address a similar risk to the “equivalent” on-premises CIP
requirement, but it would be based on wording found in a roughly equivalent
requirement in 27001, Soc 2 or FedRAMP. If that is done, the only compliance
evidence required would evidence that the CSP’s audit for the standard in
question didn’t produce any findings for the requirement in question.
For example, Requirement CIP-004-7 Part 5.1 reads, “A
process to initiate removal of an individual’s ability for unescorted physical
access and Interactive Remote Access upon a termination action, and complete
the removals within 24 hours of the termination action (Removal of the ability
for access may be different than deletion, disabling, revocation, or removal of
all access rights).” FedRAMP AC-2.h reads:
The organization notifies account managers when accounts are
no longer required; when users are terminated or transferred; and when
individual information system usage or need-to-know changes.
Of course, this isn’t exactly the same as Part 5.1 – it is
both more comprehensive and less so (there are other FedRAMP requirements that also
correspond to Part 5.1). However, let’s say the drafting team decides that the
wording is close enough that this can be considered a rough equivalent of Requirement
CIP-004-7 Part 5.1. The language of the FedRAMP requirement would be the basis
for a CIP requirement applicable to Cloud-based BCS. The “Measures” section of
the requirement would require evidence of the absence of audit findings for
FedRAMP Requirement AC-2.h.
By using this approach, the SDT will avoid the fate of
developing a bunch of CIP requirements that can’t be audited, since the CSP
will never provide evidence of compliance with a requirement that they consider
to be already covered in ISO 27001, Soc 2 Type 2, or FedRAMP. Compliance
evidence using the approach I’m suggesting will consist solely of pointing to a
particular section in an audit report.
Of course, if a CSP’s evidence is found to comply with a
Cloud CIP requirement by one NERC entity, all other NERC entities should have
the same finding. This means it would be silly to have each NERC entity require
the CSP to provide the same evidence for each Cloud CIP requirement. Instead,
there needs to be a mechanism in which the CSP provides the evidence to NERC
(or some third party designated by them), which then makes it available to each
NERC entity.[i]
Once the SDT drafts “cloud versions” of the existing CIP
requirements, will they be finished? Hardly. Those cloud requirements are based
on just one type of risks: risks addressed in existing CIP requirements. There
are two other types of risk that the SDT (or the group I’m proposing) should
examine, to determine which of those risks are important enough to merit their
own requirements in Cloud CIP.
The second risk type is requirements in ISO 27001, Soc 2
Type 2 and FedRAMP that don’t have “near-equivalent” CIP requirements, but that
are important enough to include in the Cloud CIP requirements. For example,
FedRAMP requirement AC-2.k reads, “The organization establishes a process for
reissuing shared/group account credentials (if deployed) when individuals are
removed from the group.”
This requirement doesn’t match any existing CIP requirement,
but our working group (or the SDT) might decide it addresses a source of risk
that is important enough to warrant its own CIP requirement (applicable to
Cloud BCS, not onsite BCS). Therefore, this requirement could be reworded to be
applicable to Cloud BCS. The “Measures” section of the requirement would call
for evidence of “no findings” in the FedRAMP audit report.[ii]
The third risk type is by far the most important of the
three, since there is no “up front” evidence (such as a certification or
authorization) that the CSP has already mitigated risks of this type. This
third type consists of risks that only apply to cloud-based systems.
Since both ISO 27001 and FedRAMP can apply to both on premises and cloud-based
systems, I believe they don’t include cloud-only risks (although I’m open to
correction if someone knows otherwise).
Three cloud-only risks that I’ve identified just by reading
the news are:
1. The CSP doesn’t make sure their customers are
adequately trained in the security measures required to protect their cloud
environment. As I described in this
post, Paige Thompson, the woman who single-handedly almost brought down
Capitol One, was a technical staff member who had recently been fired by a CSP.
She got revenge by breaking into the cloud environments of at least 30
customers of that CSP, one of which was Capitol One. She bragged online that all
those customers had made the same mistake in configuring their security
controls; moreover, she said there were many other customers that made that
mistake.
Of course, the CSP shouldn’t be held responsible for every
configuration mistake made by a customer. However, if 30+ customers have all
made the same mistake, that’s clearly a problem that needs to be addressed by
the CSP. And the answer can’t be, “We offered a class for $695 that included
discussion of this issue, but they didn’t take it.” The CSP should provide the
training for free, if reasonably security-proficient customers are prone to make
a serious mistake like this one.[iii]
If this risk were to be addressed in a CIP requirement, the
requirement might call for the CSP to explain what they are doing to make sure
their customers understand how to securely configure their cloud environment.
2. The CSP didn’t vet their online access
providers’ security properly. The Russian attackers that perpetrated the
SolarWinds attack were quite smart. Perhaps the smartest thing they did was to conduct
a classic supply chain attack by conducting their attack on the SolarWinds
development environment by first compromising a popular SaaS application used
by SolarWinds. They did that by first compromising
a third party that sells access to that SaaS application. Through that third
party, they gained access to the cloud environment on which the SaaS
application was running (the SaaS app itself was not compromised). From that
environment, they launched the entire attack on SolarWinds (which was without
much doubt one of the most
sophisticated cyberattacks of all time).
Of course, the solution to this problem is for the platform
CSP to tighten security requirements on the third party access brokers. A CIP
requirement to address this risk might ask the CSP to describe the security
requirements they place on third party access brokers and how they enforce
them, as well as whether they have experienced any breaches through these
access brokers.
3. Cloud Hopper. This attack was revealed in a
Wall
Street Journal article[iv]
by Rob Barry and Dustin Volz in 2018. What’s most scary about it is that the
attackers were able to jump from customer to customer within the clouds of
multiple CSPs, and that they used a variety of techniques to penetrate
different customers. This shows that your security in the cloud is at least
partially dependent on whether your fellow cloud customers also practice good
security – i.e., it’s a kind of herd immunity.
Of course, a platform CSP can’t police the general
cybersecurity practices of all companies that utilize their cloud. However, the
CSP should have measures in place today to detect and counteract
attempts to hop from one customer to another. A CIP requirement might ask the
CSP to describe (at least in general terms) the measures they have in place, as
well as whether those measures have been successful in preventing Cloud
Hopper-type breaches.
Of course, the three risks listed above are not the only
cloud risks faced by NERC entities! The primary task of the group I want to
form will be to review cloud risks identified by various organizations –
federal agencies, the military, CSPs themselves, etc. – and decide which of
them should be included as the basis for Cloud CIP requirements.
You may have noticed that, when I reached cloud-only risks
(the third type of cloud cybersecurity risks), I abandoned my earlier concern
that the CSPs won’t be willing to provide answers to unique questions like these.
This is because the first two types of risks are already addressed in ISO 27001
and FedRAMP, with which the CSP is presumably already compliant. To provide
evidence of compliance for CIP requirements based on those two types of risks, the
CSP can simply point the customer to the audit reports.
However, risks of the third type – cloud-only risks – aren’t
addressed by those other standards. Therefore, the CSP should feel obligated to
provide compliance evidence for CIP requirements based on those risks. Again,
since the CSP’s response to any of these requirements will be the same for all
NERC entities, NERC, or a third party designated by them, should be the single
point of contact for the CSP.
NERC will “audit” each of the CIP cloud-only risk requirements
by evaluating the CSP’s response to a question or small set of questions, e.g.,
“Describe the security requirements you place on third party access brokers and
how you enforce them. Have you experienced any security breaches that came through
one of these access brokers?” The response will be evaluated by asking and
answering the question whether the CSP has adequately mitigated the risk that
is the basis for the CIP requirement. If the response has not convinced NERC
(or the third party) that the CSP has mitigated that risk, they will need to
ask the CSP for more evidence.
The problem with the process I’ve just described is that it’s
not currently allowed by anything in the NERC Rules of Procedure. As I’ve
already said, I doubt there is any way to implement this process without making
RoP changes.
However, in my opinion these cloud-only risks should be the
primary focus of evaluation of the CSPs. This is because the fact that the
major CSPs have all passed ISO 27001 and Soc 2 Type 2 audits, and have all been
authorized for use by federal agencies under FedRAMP, means they don’t present a
big problem when it comes to “normal” risks, like lack of patch management or
configuration management programs. It’s fine to have Cloud CIP requirements
that apply to normal risks, but the only compliance evidence the CSPs should
have to provide is what’s in their audit reports based on those three
compliance regimes (and perhaps others as well).
On the other hand, I strongly doubt that the cloud-only
risks I’ve listed (and many more identified by others) are found in any
standard compliance regimes today. The working group I want to put together
will have one primary responsibility: compile a list of cloud-only risks that
are not currently addressed by standard compliance regimes, then decide which
of these are important enough to be addressed in the Cloud CIP standards. You’re
welcome to participate in this effort if you are with a NERC entity, a vendor
of cloud or software services to NERC entities (including SaaS providers and
platform CSPs), NERC or the NERC ERO, a consulting organization that provides
services based on NERC CIP, or if you’re just a user of electricity.
If you don’t use electricity, you obviously have no stake in
what this group will do, so you’re not welcome. On the other hand, if you don’t
use electricity, I’d like to know how you’re reading this blog post.
If you would like to comment on
what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com or
comment on this blog’s Substack community chat.
[i]
There’s no mechanism today by which NERC can receive audit evidence from a
third party and distribute it to individual entities. This is one reason why I
think there will need to be changes to the NERC Rules of Procedure. As you’ll
see later in this post, this isn’t the only reason why I think that.
[ii] It’s
possible that the SDT (or our Cloud CIP Working Group) might decide that, if
the CSP has received the appropriate certifications (ISO 27001 and Soc 2 Type
2) and authorization (FedRAMP), there’s no need to consider the other
requirements in the certification or authorization, besides those that map to
CIP requirements. This is because the certification means the CSP has either received
no audit finding on each requirement, or they received a finding but have already
mitigated the risk satisfactorily.
Therefore, the SDT and/or the Cloud Security Working
Group might just skip this second risk type altogether and address the third risk
type. The third type is much more important than the first two, since risks of
the third type presumably are not already addressed by any of the three
compliance regimes.
[iii] After
my posts on this incident appeared in 2018, I and Dick Brooks of Reliable Energy Analytics talked
with someone from that CSP, who had contacted me about those posts. They convinced
me that they had already addressed the problem (the meeting was at least a year
after the incident).
[iv] The linked article was originally
made open access but may have slipped behind the paywall. Rob Barry gave me a
link to a PDF of the article on his personal website; if you can’t access the
article itself, email me and I’ll send you that link.
What cloud risks does CIP need to address?
Note from Tom: I am now putting up all my posts in my Substack blog, but only select ones in Blogspot. Enjoy this post, but please also subscribe to my Substack blog on a paid ($30 per year) basis, in order to see all of my new posts, as well all 1200+ “legacy” posts that I originally put up in Blogspot, but which are now available to all Substack subscribers.
Two days ago, I participated in two lengthy web
conversations regarding NERC CIP and the cloud. The first was the bi-weekly
meeting of the informal Cloud Technical Advisory Group (CTAG), a group led by
Lew Folkerth of RF and Chris Holmquest of SERC. The group discusses how NERC
entities can make use of the cloud for systems that are subject to compliance
with the CIP standards.
A lot of this group’s discussions are about why there is so
little cloud use by NERC entities today. There are two main reasons:
1.
Most use of the cloud by high or medium impact BES
Cyber Systems (BCS) is impossible today, because the cloud service provider
(CSP) will not be able to provide the NERC entity with the required compliance
evidence for the over 100 CIP Requirements and Requirement Parts that apply to
high and medium impact BCS.
2.
At least two uses of the cloud for BCS are
possible today, including low impact Control Centers and medium or high impact
BES Cyber System Information (BCSI) used in SaaS applications. However, I know
of only a few instances of these uses today. I believe this is because neither NERC
nor the NERC ERO has provided guidance on how to do this, while maintaining CIP
compliance.
The second meeting was a normal meeting of the NERC Risk
Management for Third Party Cloud Services Standards Drafting Team. Of
course, this is very much an official NERC group; they are charged with
drafting new and/or revised CIP standards that should, when put in place, allow
much more extensive use of the cloud by systems subject to CIP compliance, than
is possible today. This team has been meeting for more than a year and a few
months ago started putting virtual pen to virtual paper to draft some real
requirements.
How long will it take before what I call the “Cloud CIP”
standards (i.e., the standards the SDT is drafting) come into effect? The new standards
will be in effect once they have a) been drafted (which will take a at least
1-2 more years), b) gone through at least four ballots by NERC members
(including a two month comment period after each ballot, plus another month for
the SDT to respond to the comments and make changes to the draft standards), c)
been approved by the NERC Board of Trustees, d) been submitted to FERC for
their approval, e) been approved by FERC, most likely more than a year after
they are submitted, and f) gone through a 2-3 year implementation period.
With some luck, all
the above steps will be completed by…drumroll, please…2031. However, I
think even this estimate may be over-optimistic. I say this because I haven’t allowed
for changes to the NERC Rules of Procedure (RoP) – and I think they are likely
to be needed. Since I don’t believe any new or revised CIP standard has ever
required an RoP change, nobody I know can even tell me how this can be
accomplished, except that it’s certain the SDT does not have the power to
do it themselves. Therefore, I can’t quantify the time required for this step. The
best case is that it might be possible to make the RoP changes during the
multiyear implementation period for the “Cloud CIP” standards. If that happens,
my estimated implementation year would remain 2031.
However, one thing I didn’t allow for in my 2031 estimate is
that the SDT might spend a year or two in what turns out to be totally
unproductive activity – for example, if the team spends 1-2 years polishing off
a set of standards that are subsequently rejected by NERC or FERC for being
unenforceable. Were this to happen the SDT would need to start over from the
beginning. This might sound like a joke, but it isn’t – it could very well
happen. Of course, this would be a big disappointment, since it would push the
implementation date for the new standards to 2033 or later.
This is why I’ve decided to create an online Cloud CIP Working
Group whose goal will be to help move up the implementation date for the Cloud
CIP standards. This group will be open to all members of the NERC community who
are concerned about both the arrival time and the quality of the Cloud CIP
standards; this includes NERC entities, NERC ERO staff members, software
vendors, CSPs, consultants, etc.
Fortunately, there is a way to accelerate development
of the new or revised standards. It is to start performing tasks that the SDT
will otherwise need to perform themselves, if they are to produce a quality
product. In other words, I propose to have this new group “break a path” for
the SDT by working on steps that the SDT will inevitably have to take in the
future, if they’re going to produce a worthwhile set of standards.
I want this group to start at the very beginning. The first question
that needs to be answered is, “What is the problem to be addressed in the new
standards?” Cybersecurity is about risk mitigation. A cybersecurity standard
needs to mitigate a certain set of risks. The risks addressed by CIP in general
are those that apply to systems used to maintain the reliability of the Bulk
Electric System.
The problem that the SDT is addressing is the fact that the
current CIP standards, except for three requirements having to do with BCSI
that came into effect on January 1, 2024, were written on the assumption that
all systems in scope will be on premises. At first glance, the solution to that
problem appears to be to rewrite the existing requirements so they take account
of the cloud (in fact, I believe this is the approach the SDT is currently
taking).
However, that approach doesn’t take account of the fact that,
while the wording of the current standards needs to be changed, the bigger
issue is that the entity will need to provide, for each requirement, evidence
that the CSP has complied with the requirement. The CSPs are more than willing
to make available their audit reports for ISO 27001, Soc 2 Type 2 audits, and
FedRAMP, but they are not willing or able to provide evidence for any
requirement that relates to specific devices or network configurations. This is
because cloud data is always spread across devices and data centers. In order
to comply with a requirement like CIP-007 R2 Patch Management, the CSP would
need to identify each device in any data center that might hold just one part
of a BCS during a 3-year audit period; the CSP would need to apply all of the
parts of CIP-007 R2 to every one of those devices.
Of course, this would literally be impossible, but even if
it were, the CSPs would - rightly – never agree to do it, because the cost of
doing so would be astronomical. Moreover, they certainly won’t be able to pass
that cost on to their NERC entity customers, who expect to be on the same rate
schedule as any other customer.
The cloud business model is based on providing the same
services to huge numbers of customers. If 100 NERC entities start requiring
compliance information for over 100 NERC CIP Requirements and Requirement Parts
(which is the number of requirements in scope for each BES Cyber System
or Electronic Access Control or Monitoring System – EACMS – deployed in the
cloud), that will be 10,000 pieces of information per system in scope. If
each of those entities has 100 BES Cyber Systems deployed in the cloud, that
will be one million pieces of compliance information required for just those
100 NERC entities.
This is one reason why any CIP requirement that refers to
particular systems or network configurations will need to be changed, if it is
to apply to cloud-based systems. Moreover, even requirements that don’t apply
to particular systems at all, such as the CIP-008 and CIP-009 requirements, have
a similar problem. This is because they require specific deliverables from the
CSP – e.g., an incident response plan and a backup and recovery plan, as well
as testing of those plans. The CSPs are no more inclined to prepare those plans
for individual NERC entities than they are to provide information on individual
devices.
However, even if the CSPs were able to provide compliance
evidence for the existing CIP requirements, I don’t understand why the drafting
team would even waste their time making the existing CIP requirements
“cloud-friendly”. This is because almost all those requirements are like
provisions that are already covered in ISO 27001 and Soc 2 Type 2 certifications,
as well as FedRAMP authorizations.
Of course, changing the existing CIP requirements won’t
work, since those requirements still need to apply to on-premises systems. What’s
needed is a separate set of Cloud CIP requirements that just apply to BES Cyber
Systems deployed in the cloud (I’ll call them “Cloud BCS”). Each of these would
be intended to address a similar risk to the “equivalent” on-premises CIP
requirement, but it would be based on wording found in a roughly equivalent
requirement in 27001, Soc 2 or FedRAMP. If that is done, the only compliance
evidence required would evidence that the CSP’s audit for the standard in
question didn’t produce any findings for the requirement in question.
For example, Requirement CIP-004-7 Part 5.1 reads, “A
process to initiate removal of an individual’s ability for unescorted physical
access and Interactive Remote Access upon a termination action, and complete
the removals within 24 hours of the termination action (Removal of the ability
for access may be different than deletion, disabling, revocation, or removal of
all access rights).” FedRAMP AC-2.h reads:
The organization notifies account managers when accounts are
no longer required; when users are terminated or transferred; and when
individual information system usage or need-to-know changes.
Of course, this isn’t exactly the same as Part 5.1 – it is
both more comprehensive and less so (there are other FedRAMP requirements that also
correspond to Part 5.1). However, let’s say the drafting team decides that the
wording is close enough that this can be considered a rough equivalent of Requirement
CIP-004-7 Part 5.1. The language of the FedRAMP requirement would be the basis
for a CIP requirement applicable to Cloud-based BCS. The “Measures” section of
the requirement would require evidence of the absence of audit findings for
FedRAMP Requirement AC-2.h.
By using this approach, the SDT will avoid the fate of
developing a bunch of CIP requirements that can’t be audited, since the CSP
will never provide evidence of compliance with a requirement that they consider
to be already covered in ISO 27001, Soc 2 Type 2, or FedRAMP. Compliance
evidence using the approach I’m suggesting will consist solely of pointing to a
particular section in an audit report.
Of course, if a CSP’s evidence is found to comply with a
Cloud CIP requirement by one NERC entity, all other NERC entities should have
the same finding. This means it would be silly to have each NERC entity require
the CSP to provide the same evidence for each Cloud CIP requirement. Instead,
there needs to be a mechanism in which the CSP provides the evidence to NERC
(or some third party designated by them), which then makes it available to each
NERC entity.[i]
Once the SDT drafts “cloud versions” of the existing CIP
requirements, will they be finished? Hardly. Those cloud requirements are based
on just one type of risks: risks addressed in existing CIP requirements. There
are two other types of risk that the SDT (or the group I’m proposing) should
examine, to determine which of those risks are important enough to merit their
own requirements in Cloud CIP.
The second risk type is requirements in ISO 27001, Soc 2
Type 2 and FedRAMP that don’t have “near-equivalent” CIP requirements, but that
are important enough to include in the Cloud CIP requirements. For example,
FedRAMP requirement AC-2.k reads, “The organization establishes a process for
reissuing shared/group account credentials (if deployed) when individuals are
removed from the group.”
This requirement doesn’t match any existing CIP requirement,
but our working group (or the SDT) might decide it addresses a source of risk
that is important enough to warrant its own CIP requirement (applicable to
Cloud BCS, not onsite BCS). Therefore, this requirement could be reworded to be
applicable to Cloud BCS. The “Measures” section of the requirement would call
for evidence of “no findings” in the FedRAMP audit report.[ii]
The third risk type is by far the most important of the
three, since there is no “up front” evidence (such as a certification or
authorization) that the CSP has already mitigated risks of this type. This
third type consists of risks that only apply to cloud-based systems.
Since both ISO 27001 and FedRAMP can apply to both on premises and cloud-based
systems, I believe they don’t include cloud-only risks (although I’m open to
correction if someone knows otherwise).
Three cloud-only risks that I’ve identified just by reading
the news are:
1. The CSP doesn’t make sure their customers are
adequately trained in the security measures required to protect their cloud
environment. As I described in this
post, Paige Thompson, the woman who single-handedly almost brought down
Capitol One, was a technical staff member who had recently been fired by a CSP.
She got revenge by breaking into the cloud environments of at least 30
customers of that CSP, one of which was Capitol One. She bragged online that all
those customers had made the same mistake in configuring their security
controls; moreover, she said there were many other customers that made that
mistake.
Of course, the CSP shouldn’t be held responsible for every
configuration mistake made by a customer. However, if 30+ customers have all
made the same mistake, that’s clearly a problem that needs to be addressed by
the CSP. And the answer can’t be, “We offered a class for $695 that included
discussion of this issue, but they didn’t take it.” The CSP should provide the
training for free, if reasonably security-proficient customers are prone to make
a serious mistake like this one.[iii]
If this risk were to be addressed in a CIP requirement, the
requirement might call for the CSP to explain what they are doing to make sure
their customers understand how to securely configure their cloud environment.
2. The CSP didn’t vet their online access
providers’ security properly. The Russian attackers that perpetrated the
SolarWinds attack were quite smart. Perhaps the smartest thing they did was to conduct
a classic supply chain attack by conducting their attack on the SolarWinds
development environment by first compromising a popular SaaS application used
by SolarWinds. They did that by first compromising
a third party that sells access to that SaaS application. Through that third
party, they gained access to the cloud environment on which the SaaS
application was running (the SaaS app itself was not compromised). From that
environment, they launched the entire attack on SolarWinds (which was without
much doubt one of the most
sophisticated cyberattacks of all time).
Of course, the solution to this problem is for the platform
CSP to tighten security requirements on the third party access brokers. A CIP
requirement to address this risk might ask the CSP to describe the security
requirements they place on third party access brokers and how they enforce
them, as well as whether they have experienced any breaches through these
access brokers.
3. Cloud Hopper. This attack was revealed in a
Wall
Street Journal article[iv]
by Rob Barry and Dustin Volz in 2018. What’s most scary about it is that the
attackers were able to jump from customer to customer within the clouds of
multiple CSPs, and that they used a variety of techniques to penetrate
different customers. This shows that your security in the cloud is at least
partially dependent on whether your fellow cloud customers also practice good
security – i.e., it’s a kind of herd immunity.
Of course, a platform CSP can’t police the general
cybersecurity practices of all companies that utilize their cloud. However, the
CSP should have measures in place today to detect and counteract
attempts to hop from one customer to another. A CIP requirement might ask the
CSP to describe (at least in general terms) the measures they have in place, as
well as whether those measures have been successful in preventing Cloud
Hopper-type breaches.
Of course, the three risks listed above are not the only
cloud risks faced by NERC entities! The primary task of the group I want to
form will be to review cloud risks identified by various organizations –
federal agencies, the military, CSPs themselves, etc. – and decide which of
them should be included as the basis for Cloud CIP requirements.
You may have noticed that, when I reached cloud-only risks
(the third type of cloud cybersecurity risks), I abandoned my earlier concern
that the CSPs won’t be willing to provide answers to unique questions like these.
This is because the first two types of risks are already addressed in ISO 27001
and FedRAMP, with which the CSP is presumably already compliant. To provide
evidence of compliance for CIP requirements based on those two types of risks, the
CSP can simply point the customer to the audit reports.
However, risks of the third type – cloud-only risks – aren’t
addressed by those other standards. Therefore, the CSP should feel obligated to
provide compliance evidence for CIP requirements based on those risks. Again,
since the CSP’s response to any of these requirements will be the same for all
NERC entities, NERC, or a third party designated by them, should be the single
point of contact for the CSP.
NERC will “audit” each of the CIP cloud-only risk requirements
by evaluating the CSP’s response to a question or small set of questions, e.g.,
“Describe the security requirements you place on third party access brokers and
how you enforce them. Have you experienced any security breaches that came through
one of these access brokers?” The response will be evaluated by asking and
answering the question whether the CSP has adequately mitigated the risk that
is the basis for the CIP requirement. If the response has not convinced NERC
(or the third party) that the CSP has mitigated that risk, they will need to
ask the CSP for more evidence.
The problem with the process I’ve just described is that it’s
not currently allowed by anything in the NERC Rules of Procedure. As I’ve
already said, I doubt there is any way to implement this process without making
RoP changes.
However, in my opinion these cloud-only risks should be the
primary focus of evaluation of the CSPs. This is because the fact that the
major CSPs have all passed ISO 27001 and Soc 2 Type 2 audits, and have all been
authorized for use by federal agencies under FedRAMP, means they don’t present a
big problem when it comes to “normal” risks, like lack of patch management or
configuration management programs. It’s fine to have Cloud CIP requirements
that apply to normal risks, but the only compliance evidence the CSPs should
have to provide is what’s in their audit reports based on those three
compliance regimes (and perhaps others as well).
On the other hand, I strongly doubt that the cloud-only
risks I’ve listed (and many more identified by others) are found in any
standard compliance regimes today. The working group I want to put together
will have one primary responsibility: compile a list of cloud-only risks that
are not currently addressed by standard compliance regimes, then decide which
of these are important enough to be addressed in the Cloud CIP standards. You’re
welcome to participate in this effort if you are with a NERC entity, a vendor
of cloud or software services to NERC entities (including SaaS providers and
platform CSPs), NERC or the NERC ERO, a consulting organization that provides
services based on NERC CIP, or if you’re just a user of electricity.
If you don’t use electricity, you obviously have no stake in
what this group will do, so you’re not welcome. On the other hand, if you don’t
use electricity, I’d like to know how you’re reading this blog post.
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com or comment on my free Substack community chat.
[i]
There’s no mechanism today by which NERC can receive audit evidence from a
third party and distribute it to individual entities. This is one reason why I
think there will need to be changes to the NERC Rules of Procedure. As you’ll
see later in this post, this isn’t the only reason why I think that.
[ii] It’s
possible that the SDT (or our Cloud CIP Working Group) might decide that, if
the CSP has received the appropriate certifications (ISO 27001 and Soc 2 Type
2) and authorization (FedRAMP), there’s no need to consider the other
requirements in the certification or authorization, besides those that map to
CIP requirements. This is because the certification means the CSP has either received
no audit finding on each requirement, or they received a finding but have already
mitigated the risk satisfactorily.
Therefore, the SDT and/or the Cloud Security Working
Group might just skip this second risk type altogether and address the third risk
type. The third type is much more important than the first two, since risks of
the third type presumably are not already addressed by any of the three
compliance regimes.
[iii] After
my posts on this incident appeared in 2018, I and Dick Brooks of Reliable Energy Analytics talked
with someone from that CSP, who had contacted me about those posts. They convinced
me that they had already addressed the problem (the meeting was at least a year
after the incident).
[iv] The linked article was originally
made open access but may have slipped behind the paywall. Rob Barry gave me a
link to a PDF of the article on his personal website; if you can’t access the
article itself, email me and I’ll send you that link.
Saturday, August 9, 2025
CISA affirms they support the CVE Program. Is that good or bad news?
Note from Tom: As of August 11, my new posts will only be available to paid subscribers on Substack. Subscriptions cost $30 per year (or $5 per month); anyone who can’t afford to pay that should email me, since I want everyone to be able to read the posts. To have uninterrupted access to my new posts, please open a paid Substack subscription or upgrade your free Substack subscription to a paid one.
Last Thursday, at the Black Hat conference in Las Vegas, two
CISA officials committed
to “supporting the MITRE-backed Common Vulnerabilities and Exposures Program,
just months after it faced a near complete lapse in funding” (quoting from Nextgov/FCW).
Given that someone at CISA almost cut off funding for the program in April
(although others tried – very
unconvincingly - to deny this was anything more than an administrative
glitch), it was good to hear this.
A MITRE[i]
official set off this firestorm with his letter to the CVE Board members on
April 15. The letter stated that the contract wasn’t going to be renewed and
the program would be cancelled. However, this was followed shortly afterwards by
an announcement that a group of CVE Board members (and others) were already putting
together the framework (and funding) for a privately-run nonprofit organization
called the CVE Foundation. Over
the next few weeks, the group proceeded to fill in many of the details of their
story (this effort had been ongoing for a few months, but it hadn’t been announced
previously. Of course, this was because at the timer there didn’t seem to be
any need to rush the announcement.
The Foundation is an international effort, which already –
from what I hear – has more than enough funding promised for them to take over
the MITRE contract when it comes up for renewal next March (the funding will
come from both private and government sources, although I’m guessing that the
US government isn’t currently supporting it). However, they intend to be much
more than an “In case of emergency, break glass” option if CISA doesn’t renew
the contract (which I still think is very likely, no matter what the two
gentlemen – neither of whom has been at CISA very long – said at Black Hat).
The CVE Foundation was founded (and is led) by a few CVE
Board members who have been involved with the CVE Program since its early days.
Since then, they have been part of the numerous discussions about how the
program can be improved (the Foundation is now led by Pete Allor, former
Director of Product Security for Red Hat. Pete has been very involved with the
CVE Program since 1999. He is an active Board member).
While the CVE Program, in my opinion, has done an
exceptional job and continues to do so, the fact is that government-run
programs almost without exception are hampered by the constraints imposed by
the same bureaucracy that often makes government agencies a stable,
not-terribly-challenging place to work. That is, they don’t exactly welcome
new, innovative ideas and they make it hard to get anything done in what most
of us consider a reasonable amount of time.
This week, one well-regarded person who has worked with the CVE
Program for 10-15 years and is a longtime Board member, wrote on an email thread
for one of the CVE working groups that he was happy to be part of the CVE
Foundation from now on. He wrote that, while he enjoyed working with the CVE program,
“…we measure progress in months and years instead of weeks.” Like others, he
has many ideas for improvements that can be made to the program, but hasn’t
seen it make much progress in
implementing them so far. I’m sure he’s quite happy to have the chance to have
a serious discussion about these and other changes, assuming the CVE Foundation
is placed in charge of the CVE Program.
However, if CISA somehow remains in control of the CVE
Program (i.e., the contract remains with them), it will be a very different picture.
I don’t think CISA ever had a big role in the operation of the program (beyond
having one or two people on the CVE Board and of course paying MITRE under
their contract). Moreover, CISA is unlikely to take a big role if it remains as
the funder of the program.
If CISA retains control of the contract, MITRE will remain in
day-to-day charge of the program. As I said, I think MITRE has done a good job
so far, but like any government contractor, they must adhere strictly to the terms
of their contract. If someone comes up with a great new idea that requires more
money, or even just re-deploying people from what they’re doing now, the only thing
that can be done is put it on the to-do list for the next contract negotiation.
My guess is that, when MITRE’s contract comes up for negotiation
next year, the CVE Foundation will take it over from CISA; it’s hard to imagine
that, given the huge personnel cuts that are being executed now in the agency, there
will be a big effort to retain control of a contract that costs CISA around $47
million a year.
There’s also no question that the CVE Foundation will write their
own contract with MITRE. It will require MITRE staff members to do the
day-to-day work of the CVE Program, but it will give the Foundation a big role
in determining its priorities. Frankly, I think the MITRE people – who are all
quite smart, at least the ones I’ve worked with – will be just as happy as anyone
else to see the program achieve more of its potential than it does now.
I also think the CVE Foundation will try to resolve some serious
problems with the current CVE Program. Doing that has been put off so far, because
the problems are very difficult to fix. For example, up until about ten years
ago, MITRE created all new CVE records. That meant that CVE Records were fairly
consistent, but as the number of new records increased every year, MITRE simply
couldn’t keep up with the new workload.
At that point, the CVE Program moved to a “federated” approach,
in which CVE Numbering Authorities (CNAs) were appointed. These included some
of the largest software developers, who reported vulnerabilities in their own
software as well as vulnerabilities in the products of other developers (in
their “scope”. Today, there are 463 CNAs of many types (including GitHub, ENISA,
JP-CERT and the Linux Foundation).
Of course, it’s good that so many organizations have volunteered
to become CNAs; the problem is that this has led to huge inconsistencies in CVE
records. For example, a lot of CNAs don’t include CVSS scores or CPE names in
the new records they create[ii];
the CVE Program (i.e., MITRE staff members) has been reluctant to press them to
do this. If CISA had made this problem a priority, they could have addressed it
during contract negotiations with MITRE.
So, I see good things ahead for the CVE Program. However,
that requires moving MITRE’s contract from CISA to the CVE Foundation next
March. I confess I don’t want this to happen next March; I want it to happen
tomorrow.
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com, or even better, sign up as a free subscriber to this blog’s Substack community chat and make your comment there.
[i]
MITRE is a nonprofit Federally Funded R&D Corporation (FFRDC) that has
operated the CVE program on behalf of DHS since its inception in 1999 (CISA
came into being six years ago). The idea for CVE came from MITRE researchers.
[ii]
Many CNAs will tell you that the National Vulnerability Database (NVD) had longstanding
policies that they would create CVSS scores and CPE names, and add them to the record;
in fact, if the CNA created either of these items, the NVD would discard what
the CNA created and substitute their own. Fortunately, the NVD now has a new
leader. Hopefully, that will lead to a lot of change there; it’s sorely needed.
Friday, August 8, 2025
One of many good reasons to fix the cloud problem in NERC CIP
Note from Tom: As of August 11, all but a few of my new
posts will only be available on Substack
to paid subscribers. Subscriptions cost $30 per year (or $5 per month); anyone
who can’t afford to pay that should email me, since I want everyone to be able
to read the posts. To have uninterrupted access to my new posts, please open a
paid Substack subscription or upgrade your free Substack subscription to a paid
one.
On Wednesday evening, Microsoft and CISA announced
a “high-severity vulnerability” that affects on-premises versions of
Exchange. The vulnerability also affects the Entra cloud-based authentication
system.
I won’t discuss the details of the vulnerability, since they’re
not important for this post. What is important is the fact that this
high-severity vulnerability only affects the on-premises version of Exchange,
not the cloud version (Exchange Online). Of course, since it’s on-premises,
users have to a) see the patch availability notification, b) locate and
download the patch, and c) apply the patch, to fix the vulnerability. None of
these steps are hard, but since human beings miss emails or forget to follow up
on them, leave on vacation without performing all 1,963 items on their to-do
list, etc., it’s certain that some users won’t have the patch applied even a
year from now.
This is a reminder of one of the biggest reasons for using
the cloud (especially SaaS applications in the cloud): The CSP just needs to
apply a patch once, for all their users to be protected. The users don’t necessarily
need to be told about the patch, although they should be informed for peace of
mind.
Of course, this is one of many reasons why it’s important
that the “Cloud CIP” problem be solved as soon as possible, so that full use of
the cloud will be possible for NERC entities with medium and high impact CIP
environments. Fortunately, I think the
solution is right around the corner in…2031.
What, you say it’s unacceptable that we need to wait so long
for the solution? If it will make you feel better, I’ll point out that it’s
possible that 1) the current Standards
Drafting Team will produce their first draft of the new standards sometime
next year, 2) that it will take just a year for the standards to be debated and
balloted at least four times by the NERC ballot body (I believe this has historically
been the minimum number of ballots required to pass any major change to the CIP
standards), 3) that it will be approved in six months by FERC, and 4) that the
ballot body will agree to a one-year implementation period.
In all of these things come to pass, and with a helping of
good luck, the new and/or revised CIP standards will be in place in mid-2029;
you might think even that is slow, but I can assure you it’s lightning-fast by
NERC standards; it took five and a half years for the last major change to CIP
– CIP version 5 – to go through these same steps. To be honest, I consider the
above to be a wildly over-optimistic scenario. In fact, I think that, if the required
processes are all followed, even the 2031 target may be over-optimistic.
What can be done to shorten this time period? There is an
“In case of emergency, break glass” provision in the NERC Rules of Procedure
that might be used to speed up the whole process. However, it would require a
well-thought-out plan of action that will need to be approved by the NERC Board
of Trustees. I doubt they’re even thinking about this now.
The important thing to remember here is that there are some
influential NERC entities that not only swear they will never use the cloud (on
either their IT or OT sides), but they also are opposed to use of the cloud by any
NERC entity – even though they know they won’t be required to use the cloud
themselves.
Another thing to remember: Unlike almost any other change in
the CIP standards, FERC didn’t order this one. This means they might take a
long time to approve the new standards (I believe it took FERC at least a year
and a half to approve CIP version 1); it also means they might order a number of
changes. These changes would be included in version 2 of the “Cloud CIP”
standards, which would appear 2-3 years after approval of the version 1
standards. FERC could also remand the v1 standards and send NERC back to the
drawing board. However, since one or two FERC staff members are closely
monitoring the standards development process, that is unlikely.
The danger is that, if the standards development process is
rushed and the standards are watered down to get the required supermajority
approval by the NERC ballot body, what comes out in the end won’t address the
real risks posed by use of the cloud by medium and high impact CIP environments.
In fact, this is what happened with CIP-013-1: It didn’t address most of the
major supply chain security risks for critical infrastructure. The fault in
that case was FERC’s, since they gave NERC only one year to draft and approve
the new standard - which was one of the first supply chain security standards
outside of the military.
This is why FERC put out a new Notice
of Proposed Rulemaking (NOPR) last fall. Essentially, it said, “We admit we
should never have approved CIP-013-1 mostly as is. Now we intend to rectify
that error.” The NOPR suggested a few changes, but its main purpose was to
request suggestions for improving the standard by early December 2024. I thought
that, once that deadline had passed, FERC would quickly come out with a new
NOPR – or even an Order – that laid out what changes they want to see in
CIP-013-3 (CIP-013-2 is the current version, although its only changes were adding
EACMS and PACS to the scope of CIP-013-1). However, as my sixth grade teacher
often said, “You thought wrong.” There’s been nary a peep from FERC on this
topic since December. In my opinion, a revised CIP-013 is still very much
needed.
So, I hope the current SDT doesn’t feel rushed to put out a first draft of the new or revised standard(s) they’re going to propose. Just like for on-premises systems, there are big risks for systems deployed in the cloud – and few of them are the same as risks that apply to on-premises systems. It’s those cloud-only risks that need to be addressed in the new standards. There’s more to be said about this topic, coming soon to a blog near you.
If you would like to comment on
what you have read here, I would love to hear from you. Please email me
at tom@tomalrich.com,
or even better, sign
up as a free subscriber to this blog’s Substack community chat and make
your comment there.
Wednesday, August 6, 2025
AI is already powering half the US economy. And that’s only half the story.
Note from Tom: Since 2013, I’ve been publishing “Tom
Alrich’s blog” on Blogspot. I’m now publishing my posts in this Substack blog, named
“Tom Alrich’s blog, too”. I’m posting for free on Substack now, but after
August 11, new posts on Substack will only be available to paid subscribers. A subscription
to this blog costs $30 per year (or $5 per month); anyone who can’t pay that should
email me. To have uninterrupted access to my new posts, please open a paid
account on Substack, or upgrade your free account to a paid one. There are lots
of good posts to come!
My latest post,
which was based mostly on last Saturday’s column by Greg Ip of the Wall
Street Journal, described three negative societal impacts of the massive AI
buildout that is going on:
1.
Investment in other tech areas besides AI is
being squeezed because of the huge amounts that companies like Microsoft and
Meta are spending on the AI rollout (Microsoft alone is likely to spend $80 Bn
this year, mostly on new data centers. I’ve been told they’re opening a new
data center almost every day).
2.
The huge amounts of cash being spent on the AI
buildout are starting to raise interest rates. Given the miniscule revenues
that are now coming in to the big AI players, they need to borrow a lot of the
money to finance the buildout, whether from a bank, the bond market or just
other revenue streams (e.g., I’m sure revenue from Facebook finances at least
some of Meta’s AI buildout). If anything, this trend will accelerate; for
example, Microsoft is likely to spend over $100 billion on AI next year.
What’s the third negative societal impact? While Greg didn’t
mention this in his column, I wrote in a blog
post last year that the huge power needs of AI data centers are causing
more and more electric utilities to postpone retirement of coal plants. Of
course, this will damage our (i.e., humanity’s) ability to combat climate
change.
However, I noted at the end of my latest post that my next
post would talk about the benefits of AI. That goal has been aided by two new newspaper
articles, one
in the Wall Street Journal (this time not by Greg Ip) and the other in
the Washington
Post. Both articles discuss huge economic benefits that are
accruing to the US today, due to the current AI boom.
The fact that these are accruing today is important, since Greg
Ip’s column had spoken of AI’s benefits as coming far in the future. This isn’t
a contradiction, because Greg discusses capital markets; his big concern in
this article is whether the stock market is justified in its apparent belief
that the huge AI buildouts will return concomitant benefits in a reasonable
time frame (say, 5-10 years). He is clearly skeptical that this will happen; he
thinks the full benefits to the companies doing the buildouts won’t arrive for 10-15
years.
On the other hand, both the WaPo article and the new WSJ
article point out that just about half of the growth projected for the US
economy this year will be due to the AI buildout, since most of that money
stays in the US. For example, lots of people are employed in that buildout (at decent
wages, hopefully); those people eat at restaurants, buy clothes for their kids,
buy new TVs, etc. I don’t know how often in the past a single industry has
accounted for half of GDP - other than in World War II, when I’m sure the
military was the dominant industry (for example, a lot of factories that made
cars, planes, etc. were converted to wartime production).
Of course, a lot of the chips, motherboards, and pieces of furniture
those people are installing are manufactured overseas. Will these expenditures result
in an overstatement of the GDP benefits of the buildout? No. In fact, the
result will be just the opposite. Imports are subtracted from GDP. This
means that what’s being spent on domestic labor and products in the AI rollout will
be equal to or larger than the sum of the cost of the imported products (but with
a positive sign) and the yearly increase in all other domestic activity that
falls under GDP. This makes the fact that the AI buildout will account for half
of GDP even more impressive.
To quote the article,
“The AI complex seems to be carrying the economy on its
back now,” said Callie Cox, a market strategist with investment firm Ritholtz
Wealth Management. “In a healthy economy, consumers and businesses from all
backgrounds and industries should be participating meaningfully. That’s not the
case right now.”
“AI executives argue the spending boom will create more
jobs and bring about scientific breakthroughs with advancements in the
technology. OpenAI has said that
once its AI data centers are built, the resulting economic boom will create
“hundreds of thousands of American jobs.”[i]
The WSJ becomes Mr. Softee
The Wall Street Journal usually focuses on hard
numbers that can be easily verified – closing stock prices, trade statistics,
etc. True to form, this
WSJ article starts by focusing on a hard economic number: productivity.
This is defined as the rate of output per unit of input – that is, the amount
by which output varies from one period to another if changes in the “factors of
production”, usually grouped into labor and capital, are accounted for.
For example, suppose a plant has 100 workers in period 1 and
200 in period 2. The plant also has $1,000 of capital (machinery, buildings, cash
on hand, etc.) in period 1, which increases to $2,000 in period 2. If output increases
from 300 widgets in 1 to 600 in 2, that means both inputs and output have
doubled; thus, the ratio of quantity output to quantity of input doesn’t
change. Thus, productivity stays the same.
On the other hand, if the inputs doubled, output only
increased from 300 to 450, this means productivity fell, since the same inputs
produced a lower output. Of course, this isn’t a good thing. Conversely, if inputs
doubled but output increased from 300 widgets to 750, this means output more
than doubled and productivity increased, which is a good thing. There is
thus more money for raises for workers and bonuses for management, as well as
for investment.
When you look at an entire economy, productivity needs to
grow at a certain amount every year, just to keep up with growth of the
population. Let’s assume population grows at 2% per year. This means that productivity
will also need to grow at 2%, just to allow the population to maintain their
current standard of living. If productivity grows at more than 2%, the standard
of living can increase. Conversely, if it grows at less than 2%, the standard
of living will decrease, unless the government increases its borrowing to
maintain living standards. But as the US is learning now, there are limits to
the borrowing strategy.
The best way to increase productivity in the short term is
to grow the amount and/or quality of capital that is used for production (it
takes much longer to “grow” workers). For example, if productive capital grows by
10% but the labor force only grows by 2%, then output per worker will grow
enough that the standard of living can increase.
But the increased capital needs to be the kind that will allow
more output to be produced. For example, suppose there are two types of capital:
Type A machines that produce clothes and food, and Type B machines that produce
pencils. Obviously, if the entire capital investment is in B machines, the
increase in output will consist entirely of pencils; meanwhile, the workers
will all be naked and starve to death.
As Greg IP pointed out, the AI buildout isn’t designed to
raise economic output much in the near term; therefore, it’s much more like Type
B investment than Type A. What keeps valuations of the AI companies high is
that it’s well known there will be a huge increase in economic output (due to
productivity gains brought on by AI) at some point in the future – but that
point is currently not known. Therefore, traditional economic analysis, which
assumes that productivity is the key to prosperity, finds the AI buildout to be
a colossal waste.
However, the authors of the second WSJ article point
out that there’s another economic measure that paints a completely different
picture of the AI buildout. This measure can’t be quantified exactly but can be
estimated through surveys. It’s called “consumer surplus”; it’s the difference between
the price a consumer would be willing to pay for a product or service and its
actual price. Of course, this quantity varies by the consumer, the product, and
even the time of day, so it can never be directly measured. However, the
authors (both academics) have conducted surveys that allow them to estimate the
consumer surplus from AI products at $97 billion (here, “consumers” means
individuals and organizations).
Of course, AI products today are mostly free, or at least
free enhancements to existing for-charge products (e.g., Microsoft’s CoPilot
add-on to its Office 365 suite). The authors point out that free AI products are
almost never included in GDP, which is based almost entirely on sales data. However,
they definitely produce benefits for consumers, just like for-pay products do:
“When a consumer takes advantage of a free-tier chatbot
or image generator, no market transaction occurs, so the benefits that users
derive—saving an hour drafting a brief, automating a birthday-party invitation,
tutoring a child in algebra—don’t get tallied. That mismeasurement grows when
people replace a costly service like stock photos with a free alternative like
Bing Image Creator or Google’s ImageFX.”
In other words, the consumer surplus can be considered a
quantity that should be maximized just like GDP should be maximized, even though
it will probably never be possible to include it in GDP. They describe how they
arrived at the $97 billion estimate in this passage:
“Rather than asking what people pay for a good, we ask
what they would need to be paid to give it up. In late 2024, a nationally representative survey of U.S. adults
revealed that 40% were regular users of generative AI. Our own survey found
that their average valuation to forgo these tools for one month is $98.
Multiply that by 82 million users and 12 months, and the $97 billion surplus
surfaces.”
They continue,
“William Nordhaus calculated that, in the 20th
century, 97% of welfare gains from major innovations accrued to
consumers, not firms. Our early AI estimates fit that pattern. While the
consumer benefits are already piling up, we believe that measured GDP and
productivity will improve as well. History shows that once complementary
infrastructure matures, the numbers climb.
“Tyler Cowen forecasts a
0.5% annual boost to U.S. productivity, while a report by the National Academies puts the figure at
more than 1% and Goldman Sachs at 1.5%. Even if the skeptics prove
right and the officially measured GDP gains top out under 1%, we would be wrong
to call AI a disappointment. Life may improve far faster than the spreadsheets
imply, especially for lower-income households, which gain most,
relative to their baseline earnings, from free tools.”
To paraphrase these two paragraphs, the authors estimate
there will eventually be a big boost in GDP due to AI use, even though today the
boost is mostly outside of GDP. Of course, they are talking about an increase
in GDP due to use of AI, whereas the earlier estimate that half of GDP growth
this year will be due to AI is referring to the massive spending for
infrastructure rollout going on now.
In other words, AI will produce two big boosts to GDP: due
to the rollout (starting this year, but certainly not ending anytime soon) and
due to the productivity gains caused by widespread use of AI products. The
latter gains can’t be measured today, but they will in the future.
The authors conclude,
“As more digital goods become available free, measuring
benefits as well as costs will become increasingly important. The absence of
evidence in GDP isn’t evidence of absence in real life. AI’s value proposition
already sits in millions of browser tabs and smartphone keyboards. Our
statistical mirrors haven’t caught the reflection. The productivity revolution
is brewing beneath the surface, but the welfare revolution is already on tap.”
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com, or even better, sign up as a free subscriber to the Substack community chat for my subscribers and make your comment there.
[i] The
WaPo article points out that a large portion of the growth due to AI is
simply Nvidia’s profits. But it is certainly not the lion’s share of that
growth.