I’ve written a lot about the purl (Product URL) software identifier in the past year. While I’ve been a big fan of purl since the OWASP SBOM Forum wrote this white paper in 2022, my biggest reason for pushing the idea now is the fact that the only major “competitor” to purl, the CPE identifier used in the National Vulnerability Database (NVD), has in the last 15 months become very hard to find in new CVE records – even though almost every new record used to include one. This is causing huge problems for the vulnerability management community.
As I and others have pointed out in multiple posts over the
past year, well over half of new CVE records added to the NVD since February
2024 don’t include a CPE name. NIST, which runs the NVD, has never provided a
thorough explanation for this situation. The problem is that a CVE record
without a CPE name is literally invisible to automated searches. Thus, someone searching
the NVD today for vulnerabilities that apply to a particular software product/version
is likely to be shown fewer than half of the vulnerabilities that have been
identified as affecting that product/version since early 2024; the other CVEs that
apply to the product never show up in a search, since there’s no
machine-readable identifier that links their records to the product.
Of course, this isn’t a
sustainable situation. When the NVD started having these problems in 2024, the
initial reaction among most people in the vulnerability management community
was that this was only temporary; it was certain that the NVD would bounce back
in a few months. Indeed, the NVD itself put out stories to that effect.
However, fifteen months later
the problem has grown, not improved - despite repeated assurances from the NVD
that they are on their way to solving their problems. The NVD is no longer
saying that, but they are pointing to longer term improvements (like AI) that
they hope will save them. At this point, it’s certainly safe to say that
the cavalry isn’t going to arrive anytime soon, if ever. What’s Plan B?
I’m pleased to say that the CVE Program is now making noises
to the effect that purl is coming as an alternative identifier in CVE records
(currently, CPE is the only identifier that can be used). In fact, I’m guessing
it may be an option by the end of the year. This means that the CVE Numbering
Authorities (CNAs) that create new CVE records will have the option of identifying
vulnerable products using purl.
Since a CPE name is only added after
a new CVE record has been incorporated into the CVE.org database and downloaded
by the NVD (assuming it is added at all, of course), it is still possible that
a CPE will be added to the record, even if the record already contains a purl. That’s
not a problem, since the CPE and purl can be used as checks on each other. The
point is that, once purl is available, there should be fewer CVE records that
don’t include a machine-readable software identifier.
The most important concept in purl
is that of a “controlled namespace”. This term refers to an important tool for
combating the dreaded software “naming problem”, which is perhaps the biggest
impediment to automated software vulnerability management. The essence of this
problem is that software products are referred to with many different names
throughout their lifecycle. Even at a single point in time, employees of a
software developer will refer to their products with different names, depending
on the division they work for, whether they worked for the predecessor company
that originated the product, etc.
CPE names reflect this problem. A
CPE must include a product name, but which name to include is left entirely to
the discretion of the person that creates the CPE name, usually a contractor
working for the NVD. There is no established name for a software product in all
contexts, so the contractor is left to simply do their best. However, this
means there is almost never a way for a software user to predict with certainty
what the CPE name for a product is. Instead, the user will need to look through
NIST’s CPE Dictionary – which isn’t
a real dictionary, but simply an alphabetical list of every CPE ever created.
This provides the contractor with a set of suggestions for the name, but
provides no good means to determine what name to use in any instance.
The situation is very different
with purl. A software user that wants to learn about vulnerabilities found in
an open source product can usually create its purl by combining the package
manager name with the package name and version string in that package
manager.[i]
These are all pieces of information that the user should have on hand. However,
if they don’t, they can easily look them up in the package manager.
What is most important is that, unlike
CPE and other identifiers like Social Security number or phone number[ii],
the user doesn’t first have to search for the purl in a central database. Since
the operator of the package manager makes sure no two packages have the same
name, the package manager has a controlled namespace. This means that, no
matter who creates a purl for an open source product distributed by a package
manager, it will always be the same, because it’s based on the name (and
version string, if applicable) in the package manager. Anyone can verify the
name at any time, simply by looking in the package manager.
The NVD lists vulnerabilities
found in both open source and commercial software products. CPE can identify
either type of software. However, today purl primarily identifies open source products
distributed by package managers. While there are other vulnerability databases
besides the NVD that identify vulnerabilities in open source products[iii],
currently the NVD and a group of other databases that are built on top of the
NVD are the only vulnerability databases for commercial software products.
Commercial software products are
seldom distributed through package managers. Instead, a customer first
completes a commercial transaction (e.g., a credit card purchase or a PO) with the
supplier. Then the supplier makes the product they have purchased available to
them using various means, such as a SaaS subscription, download of the
binaries, etc.
With so many ways to distribute
a commercial product, the supplier cannot control the product’s namespace through
the distribution point, in the same way that the operator of a package manager
can control an open source package’s namespace. How will a user learn the purl
for a commercial product? Will they have to look it up in a central database,
as they do for CPE?
As the OWASP SBOM Forum
discussed on pages 4-6 of our 2022 white
paper on software naming in the NVD, having to rely on a central database
creates many problems. However, on pages 11-12 of that white paper, we
described an idea that would allow a commercial software supplier to control
the namespace for each of their products, without having to restrict their
distribution to a single internet location like a package manager.
Our idea was for the supplier to
create a small document – called a tag - that contains the fewer than ten
pieces of information required to create a purl for the product. These include
at a minimum the supplier name (“software creator”), product name and version
string (if used). Because the existing SWID (“Software Identification”) tag
standard, originally developed by NIST to be the replacement for CPE as the software
identifier in the NVD, could easily accomplish that task, we decided in 2022 to
use that as the format for the document. We created a new purl “type”[iv]
called SWID.
In that paper, we suggested
that, when a commercial supplier releases a new software product or a new
version of an existing product, they will create a SWID tag that contains all
the required and optional fields for creating a purl for the product. The
supplier will make the tag available at least to their customers, but ideally
to anyone who wants it (in a subsequent post, I’ll discuss options for sharing the
tag).
To produce this blog, I rely
on support from people like you. If you appreciate my posts, please show that
by donating here.
Any amount is welcome. Thanks!
However, based on discussions
with industry groups about the purl SWID type, it now appears that using the SWID
format for the software tag may have confused people. SWID is described in the ISO/IEC
19770-2 standard. That standard lists around 80 fields for a SWID tag, yet fewer
than ten of those fields are required to create a purl (the supplier only needs
to fill in those ten fields in their tag, but this might not always be apparent).
Another problem is that access to the standard is not free, but costs around
$150. Even though there is no need to download the standard just to create
purls, some people take offense at even being asked to do so.
For that reason, the OWASP SBOM
Forum has decided to lead development of our own tag format – although it will
of course be made available to the entire purl and CVE communities, and anyone
from those communities is welcome to join us in this effort. It will only
include fields that are necessary, or at least optional, to include in the purl
for a commercial product.
Regardless of tag format, perhaps
the most important party to receive the software tag will be the CVE Numbering
Authority (CNA) that creates new CVE Records to report vulnerabilities identified
in the product. They will follow the purl specification and create a
purl for the product, utilizing the product metadata included in the tag.[v]
When a customer of the product wants
to learn about vulnerabilities identified in it, they will create a purl based
on the same tag. Given that the tag for a product/version will never change
until the version changes (e.g., the product is upgraded to a newer version), a
user who received the tag from the supplier will use that to create a purl to
search a vulnerability database for the product/version. Barring an error, the
user’s purl should always match the purl the CNA used when they created the CVE
Record, meaning that purl searches in a vulnerability database will likely have
a higher chance of success than CPE searches.
However, there’s an easier way
to make sure that a customer (or any other software user) always uses the
correct purl: The supplier can publish the purl for a product/version along
with the tag. Since neither the purl nor the tag will need to change until the version
changes, a customer who has both can just use the purl, and won’t have to
create it. However, someone who only has the tag – or someone who wants to validate
the purl they’ve been given – will still be able to use it to create the purl.
It’s important to keep in
mind the problem: Because CPE is the only software identifier currently used to
identify commercial software products, the fact that so many CVE records today don’t
include a CPE name means that users of commercial software products are likely
to learn about fewer than half of the recent vulnerabilities (i.e., those
identified since February 2024) that apply to those products. While there is no
magic wand that can fix this problem immediately, the purl identifier, along
with the enhancement described in this post, may well be the best permanent
solution.
If
you would like to comment on what you have read here, I would love to hear from
you. Please email me at tom@tomalrich.com. And don’t forget to donate!
[i]
For example, the purl for the product django version 1.11.1, found in the PyPI
package manager, is “pkg:pypi/django@1.11.1”. Note that every purl is preceded
by “pkg:”.
[ii]
The 2022 white paper referenced earlier discusses the difference between
intrinsic identifiers like purl, which don’t require lookup in a central
database, and extrinsic identifiers like CPE that do require a database lookup.
[iii] These
databases almost all use purl to identify the open source products.
[iv] Every
purl has a “type”; there are over 1,000 types, some of which aren’t used very
much. For most open source products, the type is based on the name of the
package manager where the product is found.
[v] There
will probably be one or more tools in the future that ingest a SWID tag and
output a purl for the product/version described by the tag.
No comments:
Post a Comment