Thursday, May 22, 2025

Purls for commercial software

I’ve written a lot about the purl (Product URL) software identifier in the past year. While I’ve been a big fan of purl since the OWASP SBOM Forum wrote this white paper in 2022, my biggest reason for pushing the idea now is the fact that the only major “competitor” to purl, the CPE identifier used in the National Vulnerability Database (NVD), has in the last 15 months become very hard to find in new CVE records – even though almost every new record used to include one. This is causing huge problems for the vulnerability management community.

As I and others have pointed out in multiple posts over the past year, well over half of new CVE records added to the NVD since February 2024 don’t include a CPE name. NIST, which runs the NVD, has never provided a thorough explanation for this situation. The problem is that a CVE record without a CPE name is literally invisible to automated searches. Thus, someone searching the NVD today for vulnerabilities that apply to a particular software product/version is likely to be shown fewer than half of the vulnerabilities that have been identified as affecting that product/version since early 2024; the other CVEs that apply to the product never show up in a search, since there’s no machine-readable identifier that links their records to the product.

Of course, this isn’t a sustainable situation. When the NVD started having these problems in 2024, the initial reaction among most people in the vulnerability management community was that this was only temporary; it was certain that the NVD would bounce back in a few months. Indeed, the NVD itself put out stories to that effect.

However, fifteen months later the problem has grown, not improved - despite repeated assurances from the NVD that they are on their way to solving their problems. The NVD is no longer saying that, but they are pointing to longer term improvements (like AI) that they hope will save them. At this point, it’s certainly safe to say that the cavalry isn’t going to arrive anytime soon, if ever. What’s Plan B?

I’m pleased to say that the CVE Program is now making noises to the effect that purl is coming as an alternative identifier in CVE records (currently, CPE is the only identifier that can be used). In fact, I’m guessing it may be an option by the end of the year. This means that the CVE Numbering Authorities (CNAs) that create new CVE records will have the option of identifying vulnerable products using purl.

Since a CPE name is only added after a new CVE record has been incorporated into the CVE.org database and downloaded by the NVD (assuming it is added at all, of course), it is still possible that a CPE will be added to the record, even if the record already contains a purl. That’s not a problem, since the CPE and purl can be used as checks on each other. The point is that, once purl is available, there should be fewer CVE records that don’t include a machine-readable software identifier.

The most important concept in purl is that of a “controlled namespace”. This term refers to an important tool for combating the dreaded software “naming problem”, which is perhaps the biggest impediment to automated software vulnerability management. The essence of this problem is that software products are referred to with many different names throughout their lifecycle. Even at a single point in time, employees of a software developer will refer to their products with different names, depending on the division they work for, whether they worked for the predecessor company that originated the product, etc.

CPE names reflect this problem. A CPE must include a product name, but which name to include is left entirely to the discretion of the person that creates the CPE name, usually a contractor working for the NVD. There is no established name for a software product in all contexts, so the contractor is left to simply do their best. However, this means there is almost never a way for a software user to predict with certainty what the CPE name for a product is. Instead, the user will need to look through NIST’s CPE Dictionary – which isn’t a real dictionary, but simply an alphabetical list of every CPE ever created. This provides the contractor with a set of suggestions for the name, but provides no good means to determine what name to use in any instance.   

The situation is very different with purl. A software user that wants to learn about vulnerabilities found in an open source product can usually create its purl by combining the package manager name with the package name and version string in that package manager.[i] These are all pieces of information that the user should have on hand. However, if they don’t, they can easily look them up in the package manager.

What is most important is that, unlike CPE and other identifiers like Social Security number or phone number[ii], the user doesn’t first have to search for the purl in a central database. Since the operator of the package manager makes sure no two packages have the same name, the package manager has a controlled namespace. This means that, no matter who creates a purl for an open source product distributed by a package manager, it will always be the same, because it’s based on the name (and version string, if applicable) in the package manager. Anyone can verify the name at any time, simply by looking in the package manager.

The NVD lists vulnerabilities found in both open source and commercial software products. CPE can identify either type of software. However, today purl primarily identifies open source products distributed by package managers. While there are other vulnerability databases besides the NVD that identify vulnerabilities in open source products[iii], currently the NVD and a group of other databases that are built on top of the NVD are the only vulnerability databases for commercial software products.

Commercial software products are seldom distributed through package managers. Instead, a customer first completes a commercial transaction (e.g., a credit card purchase or a PO) with the supplier. Then the supplier makes the product they have purchased available to them using various means, such as a SaaS subscription, download of the binaries, etc.

With so many ways to distribute a commercial product, the supplier cannot control the product’s namespace through the distribution point, in the same way that the operator of a package manager can control an open source package’s namespace. How will a user learn the purl for a commercial product? Will they have to look it up in a central database, as they do for CPE?

As the OWASP SBOM Forum discussed on pages 4-6 of our 2022 white paper on software naming in the NVD, having to rely on a central database creates many problems. However, on pages 11-12 of that white paper, we described an idea that would allow a commercial software supplier to control the namespace for each of their products, without having to restrict their distribution to a single internet location like a package manager.

Our idea was for the supplier to create a small document – called a tag - that contains the fewer than ten pieces of information required to create a purl for the product. These include at a minimum the supplier name (“software creator”), product name and version string (if used). Because the existing SWID (“Software Identification”) tag standard, originally developed by NIST to be the replacement for CPE as the software identifier in the NVD, could easily accomplish that task, we decided in 2022 to use that as the format for the document. We created a new purl “type”[iv] called SWID.

In that paper, we suggested that, when a commercial supplier releases a new software product or a new version of an existing product, they will create a SWID tag that contains all the required and optional fields for creating a purl for the product. The supplier will make the tag available at least to their customers, but ideally to anyone who wants it (in a subsequent post, I’ll discuss options for sharing the tag).

To produce this blog, I rely on support from people like you. If you appreciate my posts, please show that by donating here. Any amount is welcome. Thanks!

However, based on discussions with industry groups about the purl SWID type, it now appears that using the SWID format for the software tag may have confused people. SWID is described in the ISO/IEC 19770-2 standard. That standard lists around 80 fields for a SWID tag, yet fewer than ten of those fields are required to create a purl (the supplier only needs to fill in those ten fields in their tag, but this might not always be apparent). Another problem is that access to the standard is not free, but costs around $150. Even though there is no need to download the standard just to create purls, some people take offense at even being asked to do so.

For that reason, the OWASP SBOM Forum has decided to lead development of our own tag format – although it will of course be made available to the entire purl and CVE communities, and anyone from those communities is welcome to join us in this effort. It will only include fields that are necessary, or at least optional, to include in the purl for a commercial product.

Regardless of tag format, perhaps the most important party to receive the software tag will be the CVE Numbering Authority (CNA) that creates new CVE Records to report vulnerabilities identified in the product. They will follow the purl specification and create a purl for the product, utilizing the product metadata included in the tag.[v]

When a customer of the product wants to learn about vulnerabilities identified in it, they will create a purl based on the same tag. Given that the tag for a product/version will never change until the version changes (e.g., the product is upgraded to a newer version), a user who received the tag from the supplier will use that to create a purl to search a vulnerability database for the product/version. Barring an error, the user’s purl should always match the purl the CNA used when they created the CVE Record, meaning that purl searches in a vulnerability database will likely have a higher chance of success than CPE searches.

However, there’s an easier way to make sure that a customer (or any other software user) always uses the correct purl: The supplier can publish the purl for a product/version along with the tag. Since neither the purl nor the tag will need to change until the version changes, a customer who has both can just use the purl, and won’t have to create it. However, someone who only has the tag – or someone who wants to validate the purl they’ve been given – will still be able to use it to create the purl.

It’s important to keep in mind the problem: Because CPE is the only software identifier currently used to identify commercial software products, the fact that so many CVE records today don’t include a CPE name means that users of commercial software products are likely to learn about fewer than half of the recent vulnerabilities (i.e., those identified since February 2024) that apply to those products. While there is no magic wand that can fix this problem immediately, the purl identifier, along with the enhancement described in this post, may well be the best permanent solution.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. And don’t forget to donate!


[i] For example, the purl for the product django version 1.11.1, found in the PyPI package manager, is “pkg:pypi/django@1.11.1”. Note that every purl is preceded by “pkg:”.

[ii] The 2022 white paper referenced earlier discusses the difference between intrinsic identifiers like purl, which don’t require lookup in a central database, and extrinsic identifiers like CPE that do require a database lookup.

[iii] These databases almost all use purl to identify the open source products.

[iv] Every purl has a “type”; there are over 1,000 types, some of which aren’t used very much. For most open source products, the type is based on the name of the package manager where the product is found.

[v] There will probably be one or more tools in the future that ingest a SWID tag and output a purl for the product/version described by the tag.

No comments:

Post a Comment