Saturday, March 22, 2025

Implementing purl in the CVE ecosystem


As discussed in this recent blog post, software vulnerability management is facing a serious problem: The National Vulnerability Database (NVD) seems to be seriously neglecting one of its two primary responsibilities: adding “CPE names” to new CVE records.

This leads to two problems that need to be addressed as soon as possible.

Part I: The first problem

· Software users need to be able to learn about vulnerabilities that have been reported in the software they use. They do this by searching a software vulnerability database.

·       The National Vulnerability Database (NVD) is by far the most widely used vulnerability database in the world. However, just learning about a new software vulnerability does not help a user, unless they know what product or products are affected by the vulnerability.

·       In the NVD, vulnerabilities are identified in CVE records using a format like “CVE-2024-12345”. Products that are affected by a CVE should be identified in the CVE record, using a machine readable “CPE name” like “cpe:2.3:a:microsoft:internet_explorer:8.0.6001:beta:*:*:*:*:*:* ”.  

·       Currently, NVD contractors are responsible for adding CPE names to new CVE records. Before February 2024, this was almost always done within a few days of when the NVD received the CVE record.

·       If a CVE record does not contain a CPE name for a software product affected by the vulnerability described in the text of the record, a user searching for vulnerabilities that have been identified in the product will not learn it is affected by that CVE.

·       The NVD’s problem is that, starting on February 12, 2024, the number of CPE names it created dropped drastically. As a result, about 55% of new CVE records in 2024 were never given a CPE name. This means the product(s) affected by that CVE are invisible to a search. Various observers have pointed out that in 2025, the problem is getting even worse, since only about 25% of new CVE Records contain a CPE name. As of this writing (the end of March 2025), there is a serious question whether the NVD is creating new CPE names at all.

·       This means that any search for a particular product in the NVD is more likely than not to miss any vulnerabilities that have been reported for that product since February 2024, making NVD searches more and more useless – as well as misleading – as time goes on.

·       What other vulnerability databases are there, besides the NVD? For open source software products, the answer is “a lot”. These include OSS IndexOSVGitHub Security Advisories and others; in fact, a software user is more likely to learn about vulnerabilities that apply to open source software products (or open source components in an SBOM) in these databases than in the NVD.

·       However, for commercial software products, there is currently only one vulnerability database: the NVD[i]. Because the NVD can no longer be called reliable, this means there is currently no reliable source of vulnerability data for commercial software products. Obviously, this isn’t a good thing, given how dependent business and government organizations are on commercial software.

·       When will this problem be fixed? If this question asks when CPE names will be added to all the over 30,000 “CPE-less” CVE records currently in the NVD’s backlog, the answer is “probably never”. Currently, the best that can be hoped for is to slow the rate of growth of the backlog.

·       Since the CPE backlog may never go away, what measures can be taken in the longer term? There isn’t much question: CVE records can no longer be restricted to containing CPE identifiers. While CPE should continue to be one option for new CVE records, it should no longer be the only one. The best alternative is purl.[i]

·       If purl were implemented in the CVE record format, it would immediately improve identification of open source products, since purl has – in only about eight years – become a major software identifier used in open source vulnerability databases.

·       The most important feature of purl is that the user never has to look up the purl for a product before they search for the product in a vulnerability database. This is because they should always be able to create the purl for a product by using information they already have. This includes the package manager from which they downloaded the software, as well as the package name and version string in that package manager.

·       Since each purl is globally unique, a purl for an affected product in a CVE record should always match a purl created by a software user before they search for the product in a vulnerability database. This means that searches using purl will have a high success rate.

How can we address the first problem in Part I of the project? 

Introducing purl into the CVE ecosystem requires making it possible for CVE Numbering Authorities (CNAs) to designate software products in CVE records using purls. CNAs are mostly larger software developers and organizations like GitHub; they are responsible for reporting vulnerabilities to CVE.org using the CVE Record Format.

Three tasks are required to address the first problem. In each of these tasks, we will coordinate with the CVE.org Quality Working Group.

A.      Develop a new version of the CVE Record Format to accommodate use of purl and submit it as a pull request to CVE.org. The SBOM Forum will work with the CVE.org Quality Working Group (QWG) and the Python Foundation to accomplish this goal.

B.      Develop plans for an end-to-end proof of concept for use of purl in the CVE ecosystem.

C.     Conduct that proof of concept. The PoC will involve software suppliers, end user organizations, CVE Numbering Authorities (CNAs) and vulnerability database operators.


Part II: The second problem

·       Today, purl can only be used to identify open source software in package managers, not commercial software. Since most private and government organizations utilize commercial software to run their businesses, it is important that purl be expanded to identify commercial, as well as open source, software products. In 2022, the OWASP SBOM Forum suggested[iii] a way to fix this problem by having a supplier create a “SWID tag” for each of their products. A new “type” called SWID was developed and implemented in purl.

·       A SWID tag is a small document containing 5-10 pieces of metadata about a software product. These pieces of information can be used to create the purl for the product, which will always be globally unique.

·       The only three mandatory fields for a purl using the SWID type are “name”, “version” and “tagId”. Note that “tagId” can be almost anything. For example, it could be the URL from which the product is downloaded.

·       To illustrate this, the purl for Fedora version 29 is “pkg:swid/Fedora@29?tag_id=org.fedoraproject.Fedora-29”. Note that every purl starts with “pkg:” followed by the type. For open source software, the type usually indicates the package manager or other repository – for example, “NPM” for Node NPM packages and “maven” for Maven JARs and related artifacts. However, for commercial software, the type will normally be SWID.

·       The supplier will usually make both the SWID tags and the purls for their products available on their website or by other means. If a user wants to look up a product in a vulnerability database, they can download the purl for it, if that is available; otherwise, they can download the SWID tag and use that to create the purl (of course, various tools will automate this process). Neither the purl nor the SWID tag will need to change until the product is upgraded to a new version.

·       As in the case of purls for open source software products, the purl included in the CVE record for a commercial product should always match the purl a user creates when they want to search for that product; this is because both purls will be based on the contents of the same SWID tag. 

How can we address the second problem in Part II of the project?

We can address the second problem – purls current lack of support for commercial software products - with three tasks. To accomplish Part II, we will work with a group of industry participants, including commercial software developers, CNAs, and vulnerability database operators.

A.      Develop “rules of the road” for production, distribution, and use of SWID tags to allow purl to identify commercial software.

B.      Test those rules in a small-scale proof of concept. In that PoC,

                                        i.               A supplier will create SWID tags (perhaps using this tool) for certain products and make them available to their customers;

                                      ii.               CNAs will create test CVE records containing those purls to report test “vulnerabilities” in their products[v];

                                   iii.               One or more vulnerability databases (that support both CVE and purl) will ingest the test CVE records; and

                                   iv.               End users will utilize purls created from the SWID tags to search the vulnerability databases. If all the CVEs that were recorded for a product are revealed when the user searches using the product’s purl, the PoC is successful.

C.     Develop educational webinars and videos on use of purl in the CVE Record Format for CNAs and other participants in the CVE ecosystem.

Note: While Part 2 of the project follows Part 1 in this document, it is not necessary that this should be done when the project is executed. This is because nothing in Part 1 is an absolute prerequisite for accomplishing Part 2. The project might be significantly accelerated if Parts 1 and 2 could be accomplished at the same time.

The goal of Part 1 is to conduct a proof of concept to demonstrate how purl, as it is used today, can be incorporated as an optional software identifier in the CVE ecosystem. Since purl currently is used mostly to identify open source software found in package managers, that will be its use when it becomes an option in CVE Records.

However, the goal of Part 2 is to allow purl to become an identifier for commercial software products by having commercial developers create SWID tags to carry  metadata for their product; users searching for vulnerabilities in a commercial product that they own can utilize the product’s SWID tag to create a purl for it. This should always exactly match the purl used by the CNA when they reported the vulnerability in a new CVE Record (because both purls will be based on the same SWID tag).

The goal of Part 2 will be to develop “rules of the road” for creating and using SWID tags and the purls based on them, as well as test these in a small-scale proof of concept. Because SWID is just one of hundreds of purl types and the types can be used interchangeably, no changes should be required to the CVE Record Format or any other component of the CVE ecosystem.

Therefore, if funding permits, Part 2 of this project should be executed at the same time as Part 1. For example, while the proof of concept in Part 1 Step C is being executed, either (or both) steps D and E of Part 2 could be executed. Since Part 1 might itself require nine months to one year, starting both parts simultaneously could save up to a year of total project time.

One reason why this should be considered is that as of late March 2025, the NVD is not creating new CPE names in anywhere near the volume required to reduce their backlog of “unenriched” CVE Records, let alone eliminate it. While there are alternate vulnerability databases for open source software (almost all based on purl), there are no vulnerability databases for commercial software that are not themselves based on the NVD.

As previously stated, this means there is no reliable vulnerability database for commercial software products today. Given that private enterprises and government agencies mainly utilize commercial software, this is a serious problem. The sooner that purls based on SWID tags can be used in new CVE Records, the sooner that users of commercial software products will be made fully aware of the risks they face due to recently identified vulnerabilities.

 

Conclusion

It is possible that the National Vulnerability Database may no longer exist at all soon. However, no matter what happens, it is clear there needs to be an alternate software identifier besides CPE available to CNAs and software end users. While there are one or two experimental alternatives (such as OmniBOR), purl is already in heavy use. For example, the open source software composition analysis (SCA) tool Dependency Track alone is used over 20 million times every day to look up a dependency from a software bill of materials (SBOM) in the OSS Index vulnerability database, which is based on purl.

Purl’s availability in the CVE record format will quickly make identification of open source software much easier and more accurate in the NVD and other vulnerability databases based on CVE. And, when the policies and procedures for use of the SWID purl type have been worked out and tested in a proof of concept, identification of commercial software products in the same databases will be much easier, as well as much more accurate.

Of course, it will probably be 1-2 years before purl is in widespread use in the CVE ecosystem. But there’s no excuse for waiting any longer; two years in the future will still be two years in the future six months from now. The six tasks (A – F) listed above are mostly non-technical; they mainly require getting agreement among a group of participants in the CVE ecosystem.

The OWASP SBOM Forum will be pleased to lead this effort; we will start out with an initial project to perform the first two Part 1 tasks: development of plans for the proof of concept and identification of changes to the CVE record format that are required to accommodate purl.

Anyone interested in contributing their time and/or resources to this project should contact Tom Alrich at tom@tomalrich.com. Donations to OWASP of $1,000 and up can be “restricted” to use by the SBOM Forum. OWASP is a 501(c)(3) nonprofit organization. We welcome all contributions. 


[i] There are several commercial vulnerability databases that are based on the NVD; these include the data currently in the NVD (which can be downloaded in about ten minutes). That data has been augmented and “cleaned up”. These databases are all trying to remedy the NVD’s current shortfalls in various ways. However, none of them have the resources to do more than try their best to fill the most serious gaps.

No comments:

Post a Comment