I have been attending VulnCon 2025 remotely this week, although not all the sessions. Even
though the first conference was last year, VulnCon has clearly found its niche
as the premier gathering place for people interested in or involved with
vulnerability management. The conference is well designed and well executed.
The sessions I’ve been attending
are those that have to do with software naming in what I call the “CVE
ecosystem”, but which most people think of as the National Vulnerability
Database (NVD). If you have been reading my recent
posts, you know that:
1.
Learning about a software
vulnerability isn’t very helpful if you don’t know what products are affected
by it; ideally, you want to be able to search on a product name in a
vulnerability database and immediately be shown all the vulnerabilities that
have recently been identified in that product. Moreover, since CVE is by far the most widely
cited vulnerability type and there are now over 280,000 CVEs in the official
list, affected products need to be referred to using a machine-readable
software identifier. The only identifier currently supported by CVE.org (the organization
funded by DHS that creates and manages CVE Records) is CPE, which stands for
Common Platform Enumeration.
2.
When a CVE Numbering
Authority (CNA), working for CVE.org, produces a CVE Record to report a new software
vulnerability, they do not usually include a CPE name(s) to refer to affected
products listed in the text of the record. The reason for this is that the NVD[i] has always wanted to be in
control of CPE creation. This didn’t previously cause a big problem, since
until last year, the NVD almost always created a CPE for every affected product
described in the text of a CVE Record; they did this within a few days of
receiving the record from CVE.org.
3.
However, starting on
February 12, 2024, the NVD drastically slowed their production of CPE names,
for a reason that has never been clearly explained. This has produced an
ever-growing backlog of CVE Records without a CPE name. Despite several
promises that they would fix the problem by a certain date, the backlog has
continued to grow. Today, the backlog stands at well over 40,000 CVE Records (although
a well-known vulnerability researcher estimated in the VulnCon chat that the
backlog is now 52,000 records). Of course, this is far more than 50% of the
total new CVEs identified since February 2024. The NVD no longer even talks
about eliminating the backlog for good. My guess is they would be happy just to
stop it from growing, but even that doesn’t seem likely now.
4.
Why is it bad that so
many CVE Records don’t contain CPE names? It’s bad because a CVE Record without
a CPE name is invisible to an automated search of the NVD. If a user of Product
ABC wants to learn what vulnerabilities (CVEs) are currently present in that
product, they might enter “Product ABC” in the search bar of the NVD. The user
should see every CPE name that contains that text string. The user can
determine which of those CPEs matches the product they use; then they can
search for CVEs that apply to that CPE.
5.
However, if there are
no CPE names that contain the text string, the user will receive the message,
“There are 0 matching records.” The user will receive this message even if
there is a CVE Record that states in its text that Product ABC is affected by
the vulnerability, as long as that record doesn’t include Product ABC’s CPE
name. The lack of the CPE name in the record means that searching on a CPE name
will not inform the user that their product is affected by the vulnerability
described in that record.
6. But there’s a
worse problem than not learning about vulnerabilities that affect the product
being searched for: The above message is the same one that the user will
receive if the product in fact has no identified vulnerabilities. Human nature
alone dictates that most users will interpret the message this way. That is, most
people will believe the product they use has no vulnerabilities, when in fact
it may have a lot of them.
In my opinion, everyone in the CVE
ecosystem needs to assume that CPE will never be a reliable identifier, even
though nobody is saying that CPE should go away. What’s Plan B? Plan B is purl, which has come from literally nowhere eight years ago to
being one of the two or three most widely used software identifiers in the world.
However, purl cannot currently be used in CVE Records, so people in the CVE
ecosystem currently cannot benefit from using it.
This is why I’m pleased to announce
that purl will soon (let’s say in 6-9 months) be available in the CVE
ecosystem. I’ve been advocating for purl for more than two years; interest in it has
clearly been growing, but the day when it would become an officially accepted
part of the CVE ecosystem has always seemed far away. Now, I can say with
confidence that CNAs will be able to identify vulnerable products in CVE
Records – and end users will be able to search for them – using purl within a
year, and perhaps less than that.
Purl was discussed in at least
four different sessions at VulnCon, but perhaps the most interesting was a
two-hour workshop led by Chris Coffin of MITRE, leader of the CVE Quality
Working Group, and Pete Allor, Senior Director of Product Security at Red Hat
(both of them are members of the CVE.org Board, which runs the CNA Program within
DHS). When the idea for the workshop first came up early in the year – it was
primarily the brainchild of Christopher Robinson, aka “CRob”, of the Linux
Foundation - the point of the workshop was to have a kind of “face-off” between
purl and CPE.
At that time, the question was
whether there was enough support for purl in the CVE community for the CVE
Board to seriously consider moving forward with it as a second possible
software identifier along with CPE. The point of the workshop was to get a
“sense of the room” on this subject.
However, I was surprised (and
others were, too) by the fact that in the past one or two months, the CVE
Program has decided to at least start laying the groundwork for incorporating
purl in the CVE Record Format. How did this change come about? While I have no
specific knowledge of the reason, I attribute it in large part to the fact that
in March it became clear that the NVD was not only not making progress
on eliminating their backlog of CVE Records without CPE names, but they were in
fact allowing
it to grow at a much more rapid pace. Indeed, at
the end of March, I was told that the backlog had grown from 55% of CVE Records
issued since February 12, 2024 – its size at the end of 2024 – to over 70%.
In other words, searching the NVD
for new vulnerabilities applicable to a software product has increasingly
become an exercise in futility: You will most likely just get a message saying,
“There are 0 matching records.” If you want a lift to your day, you can believe
that means your product has zero vulnerabilities and you have nothing to worry
about. Or if you want to be realistic, you can say this more likely means that
any CVE Record that mentions the product you are searching for in its text does
not include a CPE name for the product. If you want to verify this for
yourself, you can always read the text of each of the 40,000 new CVE Records added
to the NVD since February 12, 2024.
The CVE Program intends to change
the CVE Record Format (the format used by CNAs to create CVE Records) to enable
CNAs to use purl to identify a vulnerable software product, not just CPE. You
might ask why that is such a big deal. After all, if the NVD is struggling to
create CPE identifiers, why won’t they also struggle to create purl
identifiers?
The answer is that purl
identifiers don’t need to be “created”. Today, purl is mainly used to identify
open source software distributed in package managers and similar repositories
(of course, this includes a huge percentage of open source software products,
especially of software components found in SBOMs). A typical purl is: “pkg:pypi/django@1.11.1”.
The values of the fields in this purl are:
“pkg” – This field does not
currently have a use, but it will in the future. Currently, all purls start
with these three letters.
“pypi” – This is called the purl “type”.
The package manager is designated in the type. In this case, the package
manager (or more correctly, the package index) is PyPI.
“django” – This is the product name in that
package manager.
“1.11.1” – This is the version
number (or “version string”) in that package manager.
If you are a CNA creating a new
CVE Record that reports a vulnerability found in django v1.11.1 as it exists in
PyPI, you can easily create the purl using the values for those four fields. If
you’re not sure about one of the fields (e.g., you’re not sure about the
spelling of django), you can verify it by checking in PyPI. Similarly, if
you’re a user of django and want to learn about current vulnerabilities found in
that product/version, you can look at the product itself, or else verify the
information in PyPI.
The most important feature of this
process is that the purl for django 1.11.1 as found in PyPI will always be
globally unique. There are some open source products, like OpenSSL, that exist
in multiple package managers, so the name and version string might be the same
for all those instances. However, the package manager will be different in each
instance. This means every purl is guaranteed to be globally unique.
By contrast, CPE names include at
least two fields that are inherently ambiguous: product name and vendor name. Everyone
knows that products are renamed regularly, due to M&A as well as various
marketing and rebranding campaigns. But even the company name is hardly
unambiguous. A consultant who worked at Microsoft once asked people there what
company they worked for; she received over 20 different answers. This is
compounded by the fact that software identifiers are based on a single spelling
of a name, so “Microsoft, Inc” is different from “Microsoft”, which is
different from “Microsoft, Inc.” with a period, etc.
The NVD mostly leaves it up to a
staff member – usually a contractor – to decide what values to include in the
product name and vendor name fields of a CPE name they are creating. It is
likely that the only direction they give the contractor is to adhere as closely
as possible to existing values in the “CPE Dictionary” (which isn’t a
dictionary at all, but simply a list of every CPE ever created). Of course, the
product and vendor names vary greatly in the “dictionary”, even when they
probably refer to the “same” product or vendor. So, the CPE dictionary is a
very week reed to lean on.
In discussions about this problem
(which is the infamous software “naming problem”, unless you didn’t realize
that), someone always asks, “Why don’t we just build a database of all software
products and/or all software vendors? That database can have a canonical name
for each product or vendor; every staff member creating a new CPE name will
need to adhere as closely as possible to similar names that are located near it
in the database.
That idea sounds attractive until
you start thinking about it. Then you quickly realize:
1.
Creating, and even
more so maintaining, a database like that would be fantastically expensive –
many times the cost of maintaining the NVD itself. Remember, the database will
include not just big- or medium-sized software companies, but one-person shops
that ship a single product. These will have to be tracked all the time for name
changes, acquisitions, etc.[ii]
2.
As my friend the
consultant found out, there is no agreement on either product or vendor naming among
employees of a large software company. Who will oversee decisions regarding canonical
names? Since I’m sure there’s no employee at Microsoft that even knows every product
they make (let alone can track all the changes in product names), it’s not
likely one person, or even one department, can make that decision. The decision
will have to be delegated. How will that be done, and what criteria will be
provided for the people that make these decisions? Just developing training for
these people – which will have to be constantly repeated, of course – will be a
monumental task.
3.
I will point out one area
of agreement that I’ve found in these discussions: The person who advocates for
an approach like this will usually end up saying their department should oversee
software naming, because they are the only department with the right
perspective to make these decisions. This is expected behavior, since there’s probably
no objective way to decide who should oversee software naming.
To summarize the above, trying to definitively
fix CPE name creation will usually lead to requiring at least two separate
databases: for software and vendor names, respectively. I don’t know of any
other way that it would be possible to enforce a policy like, ‘Any software
developer whose name begins with the word “Microsoft” will be called “Microsoft
Corporation” (and not “Microsoft Corp.”, “Microsoft, Inc.”, etc.).’
How does purl handle the naming
problem? The name of an open source product in a package manager is controlled
by the operator of the package manager; whatever name they decide on is the
correct one for that package manager, although another package manager may
decide to give the “same” product a different name. Moreover, it’s likely those
two databases will themselves require other databases. After all, if a company
like Microsoft is going to designate certain people to oversee naming for
certain types of software, there will need to be a database that lists each of
those people, as well as the types of products over which they have authority. And
that database might itself require another database, etc.
How does purl decide the “correct” name for a software
product found in a package manager? It follows a simple rule: the name of the
product in the package manager is presumably under the control of the operator
of the package manager. That person or organization can be counted on to
maintain a “controlled namespace”, in which no product name/version string
combination duplicates the name/version of another product in the same package
manager.
That way, the name of a product distributed through PyPI or
Maven Central will always be the same for anyone who wants to look at the
package manager (or even read the “About…” section on the main page of a
software product they use); no centralized database lookup is required. Two
different people (say, the CNA that creates a CVE Record that includes a purl
for Product ABC version 1.2 and the user who wants to search for
vulnerabilities in that product/version) should always, barring a mistake, create
the same purl.
Problem solved.
If you would like to comment on
what you have read here, I would love to hear from you. Please email me
at tom@tomalrich.com.
My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.
[i] The
National Vulnerability Database is part of NIST, which is part of the Department
of Commerce. The CVE.org organization, which used to be called MITRE and is
still staffed by contractors from the MITRE Corporation, is funded by the
Department of Homeland Security (DHS).
[ii]
Steve Springett is advocating an idea called “common
lifecycle enumeration”. This can be thought of as an online ledger of
changes in names and versions of a software product.