Thursday, July 10, 2025

Is CPE on the way out?


In September 2022, a group that I lead, the SBOM Forum – now the OWASP SBOM Forum – published a white paper that described a number of serious problems with the machine-readable CPE (common platform enumeration) software identifier on which the National Vulnerability Database (NVD) is based. The reason why machine-readable software identifiers are so important is that software products have many different names in different contexts. In vulnerability databases, it’s essential that both the person who reported the vulnerability and the person who looks it up in the NVD or another vulnerability database have the same product in mind; if both are using the same software identifier, there is normally no question about this.

Our white paper pointed to multiple reasons (described on pages 4-6) why searching the NVD using a CPE name often produces an erroneous result or no result at all. We proposed a way to address these problems by utilizing the purl (Product URL) identifier; this is already widely used in vulnerability databases for open source software. We weren’t proposing that CPE be replaced with purl, but that they both be options for searches in the NVD and other vulnerability databases that use CVE to identify vulnerabilities.

When we wrote the paper, we knew that:

1.      The NVD is based on CVE records. Each record describes a vulnerability, as well as one or more software products that are affected by the vulnerability. New vulnerabilities are reported by CVE Numbering Authorities (CNAs) who work with the CVE Program; many of them are software developers like Oracle, Microsoft and Red Hat. New CVE records are regularly downloaded from the CVE.org database by the NVD and other vulnerability databases that are based on CVE.

2.      Since early in the CVE Program, CPE has been the only machine-readable software identifier used in CVE records. Even though CVE records don’t originate in the NVD, the NVD has almost always created CPE names for the products described in the text of each record and has added those names to the record.

3.      Until 2024, the NVD usually added CPE names to new CVE records soon (usually within a few days at most) after the new records were downloaded to the NVD. Because of this, automated NVD searches for software products were likely to identify most recently identified vulnerabilities that applied to those products.

However, there was one important thing we didn’t know in 2022 when we wrote the white paper: In February 2024, the NVD experienced problems that drastically reduced the number of CPE names they could add to new CVE records. Despite attempts to fix the problems, they have not yet done so, with the result that fewer than half of CVE records created since February 2024 contain a CPE name. This means that half of the vulnerabilities (CVEs) that have been identified since that month do not usually show up in an automated vulnerability search of the NVD.

When the NVD first started experiencing problems in February 2024, there was a lot of sympathy for them; their parent organization, NIST (part of the US Department of Commerce), shook their piggy bank and found some funds to pay for contractors to help them dig themselves out of the hole they’d gotten into. As a result, on May 29, 2024, the NVD proudly announced that, “We are confident that this additional support will allow us to return to the processing rates we maintained prior to February 2024 within the next few months.” They also announced that, working with CISA, they had started to reduce the big backlog they had built since February, and expected to eliminate it by the end of 2024.

Of course, both problems have since gotten much worse, not better. By the end of 2024, it had become painfully obvious that the NVD had completely dropped the ball on both promises they had made on May 29; as a result, they were starting to lose support from other agencies that had supported them so far. In December, CISA quietly announced that they would no longer add CPE names to some new CVE records; they had been doing this as part of the “Vulnrichment” program that they announced soon after the NVD’s problems appeared in February.

In April, I reported (remotely) from VulnCon that the CVE Program had decided to add purl as an alternative software identifier to CPE in CVE records, most likely by the end of 2025 or early 2026. Of course, it was a good thing that CNAs would be given the opportunity to include purls in their CVE records, with the same status as CPE names.

However, there’s been a further turn of the screw. Recently, Andrew Lilley Brinker of the Quality Working Group (QWG) of the CVE Program (the CVE working group that is responsible for originating changes to the CVE Record Format, aka the CVE Schema) put up this document in the CVE-schema repository in GitHub. The document described how both purl and another software identifier, OmniBOR, will be added as options to the CVE Schema in the next few months – without waiting until the end of 2025 or early 2026, as originally planned.

The document didn’t stop there. It went on to make it clear that NIST has decided it’s time to pull the plug on CPE. The section titled “Problem statement” includes this paragraph:

For CPE, the key challenges are its reliance on a central dictionary and the processes used to update that dictionary. NIST, the United States' National Institute of Standards of Technology, stewards the CPE specification and maintains the CPE Dictionary, which is the central registry of defined terms which may be used to identify vendors, products, and more within a CPE identifier. The reliance on this central dictionary means that the issuance of new CPEs for vendors or products not present in the dictionary requires NIST to update the dictionary to support them[i]. While anyone can request the creation of a CPE from NIST, NIST may at times be slow to respond to these requests due to resource limitations. (my emphasis)

I hereby nominate the last sentence of the above paragraph as Understatement of the Year. It should be clear now that CPE is a dead end, as evidenced by the fact that fewer than 50% of the CVE records added to the NVD since February 2024 include a CPE name. Now, it seems possible that figure may drop to around 0%.

Because the CVE Program and the NVD only support CPE as a software identifier today, a CVE record that doesn’t include a CPE name for the vulnerable product is invisible to automated searches in the NVD, even to normal command line searches. Because of this, NVD searches for vulnerabilities in a software product today on average yield fewer than half the CVEs that have been reported for that product.

Just as importantly, the problem is getting worse, not better. This section continues: 

Mechanical applicability determinations, especially searches of CVE data based on software identifiers, are compromised if the searcher cannot rely on the identifiers to be available when and where they are needed.

Moreover, some vulnerability conditions cannot be expressed adequately using CPE. For example, sometimes a vulnerability is only present when certain modules or files are present, but CPEs do not capture software at the module or file level. To put it another way, CPE is a relatively coarse-grained software identifier, identifying software “products,” potentially constrained with version information, but not components or materials within those software products….

CPEs are also not used universally across different software ecosystems. Open source software projects are generally less well represented in the NIST-maintained CPE dictionary than closed source software. This means sole reliance on CPE as the mechanism for identifying software within the CVE record format leaves CVE less able to identify open source software affected by a vulnerability.

Near the end of the same document, in the section titled "Related Issues or Proposals”, the author states:

For over a decade, NIST has tried to manage CPEs to keep pace with the needs of CVE. However, the challenge and expense ha(ve) proven to be significant and NIST has expressed a desire to end its role as the provider of CPEs for CVEs. Without a massive investment, it is unlikely that any party could produce CPEs quickly enough to meet CVE’s needs. Moreover, even a complete CPE library would not address CPE’s inability to capture vulnerabilities that depend on files or modules, since those are beyond CPE’s ability to capture.

Thus, it’s possible that NIST will withdraw funding for the NVD to create CPE names altogether. Instead of 50% of CVE records not having CPE names, we might be faced with 100% not having CPE names. Of course, once purls are being regularly added to CVE records to identify vulnerable open source software products, that won’t create a big problem, since a user could use the purl for an open source product to find vulnerabilities in any vulnerability database that supports both CVE and purl.

Currently, Sonatype’s OSS Index database – one of the largest open source vulnerability databases – is the best example of such a database. The Dependency Track open source software composition analysis (SCA) tool is used over 25 million times every day to look up an open source component from an SBOM in OSS Index.

Purls for commercial software

However, even though there is growing dissatisfaction with CPE, purl can’t replace it anytime soon. Currently, purl primarily identifies open source software products distributed through package managers. Since few commercial software products are distributed that way, this means purl can’t identify most commercial software products.[ii]

This leaves CPE as the only major software identifier that can identify commercial products today. Yet the fact that CPE names are missing from most recent CVE records means that CPE is an increasingly unreliable identifier. In other words, commercial software suppliers like Oracle, Microsoft, HPE, Cisco, Schneider Electric and others don’t have a reliable identifier with which to report vulnerabilities in their products. Of course, this is a distressing situation.

Purl follows the “razor and blades” model. The base purl specification is simple and changes very slowly, but every use of purl (e.g., every package manager that serves as a purl namespace) requires its own purl type; there are about 1500 of those (although many of them are used very little. Expanding purl to address commercial software will just require developing and adding a new type; it won’t require any changes to the base specification).

Steve Springett, one of the original developers of purl (working with Philippe Ombredanne, who came up with the original idea and continues to lead the purl project) and a current maintainer (as well as the leader of the OWASP CycloneDX and Dependency Track projects), recently developed a Type Definition for a new purl type called SCID; that stands for Software Component Identification. I will put up a separate blog post on this development, but I want to point out that SCID defines a set of metadata fields that a supplier will publish in a tag that can be distributed with the software binaries, made available at a well-known location on the supplier's website, emailed to customers, made available through the Transparency Exchange API (now in Beta1 phase), etc.  

SCID not only expands purl to address commercial software, but it also expands it to cover non-packaged open source software. Note that non-packaged software that is authoritatively available through a repository with a supported purl type like GitHub is already addressed by purl.

How will this get rolled out?

I believe the CVE Quality Working Group has developed a couple of pull requests to implement purl in the CVE Record Format (CVE Schema). They will need to be finalized and submitted to the CVE Board for their approval. At that point, they will be merged with the Schema. However, there are further steps that need to be taken before purls-in-CVE-records can be considered a success, including the following:

  1. There needs to be an end-to-end proof of concept. It will start with CNAs creating test CVE records that include purls (these could be for newly identified vulnerabilities or for fake ones).
    1. The new records will be submitted to CVE.org. and will appear in searches using purls.
    2. The records will be downloaded from CVE.org by vulnerability databases that can utilize CVE records that include purls (of course, ultimately every database that supports CVE will need to support both CPE and purl. This may include the NVD, but it also may include OSS Index, VulnDB, VulnCheck, VulDB, and others).
    3. End users and service providers will test searches in the vulnerability databases. They will create purls for vulnerable products that have been included in the test CVE records. If a user searches for a product/version using a purl, the purl they search with should always match the purl created by the CNA when they created the record. Any mismatches need to be investigated.
  2. There needs to be training for the groups involved in the vulnerability management process: software suppliers, CNAs, vulnerability database operators, end users, service providers, etc. It will include general training in purl and CVE, as well as training in specific topics like how a CNA can create a new CVE record containing a purl. This training will mostly be in the form of webinars and YouTube videos.
  3. Use of the SCID format needs to be tested in a proof of concept (or tabletop exercise), although this could be combined with the PoC in step 1 above.

Steps 1 and 2 will implement existing purl capabilities (i.e., addressing open source software made available in package managers), but step 3 will involve extending those capabilities to commercial software. Thus, steps 1 and 2 may need to be executed before step 3, although they could all be executed together. Step 1 could start before any changes are made to the CVE Schema.

It’s safe to say that, until at least steps 1 and 2 are executed, the purl rollout to the CVE Record Format is unlikely to be considered a success. The OWASP SBOM Forum is ready to take the lead on all three of these steps when we can secure at least some of the funding required (which will not be huge). If your organization would like to help with funding, you can do that through a donation to the OWASP Foundation that is “restricted” to our project. OWASP is a 501(c)(3) nonprofit organization. Please email me if you would like to discuss this.

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve often been told that I should either accept advertising or put up a paywall and charge a subscription fee, or both. However, I really don’t want to do either of these things. It would be great if everyone who appreciates my posts could donate a $20-$25 (or more) “subscription fee” once a year. Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] The fact that CPE relies on an external dictionary brings up an important discussion in the SBOM Forum’s 2022 white paper: the difference between “extrinsic” identifiers like CPE, which depend on an external dictionary, and intrinsic identifiers like purl, which don’t require an external dictionary. Like all intrinsic identifiers (chemical formulas are another example used in the paper), the user can construct them using information they either have on hand or can easily look up. In the case of a purl for an open source package in the Maven Central package manager, the user can create the purl if they just know the name and version number of the package, as well as the fact that it was downloaded from Maven Central.

[ii] Because online software stores like Google Play and the Apple Store function a lot like package managers – that is, they make software available for download in a fixed location, and control the namespace of the products being offered – Steve Springett and Tony Turner of the OWASP SBOM Forum have both suggested that software stores could have purl types that are closely analogous to types for package managers like Maven Central and NPM.

Of course, a large percentage of commercial software products (and certainly the more strategic ones) are not available in software stores. However, given the huge numbers that are available (Google Play offers over 3 million products for sale and download), enabling developers of mobile apps to report vulnerabilities in their products using purl identifiers could result in a huge improvement in mobile app security.

No comments:

Post a Comment