Tom Alrich's Blog: It’s even worse than I thought

Before our biweekly OWASP SBOM Forum meeting on Friday, I asked Andrey Lukashenkov of Vulners for an update on where the National Vulnerability Database's (NVD) backlog of “unenriched” CVE records stands[i]. Andrey said the backlog is now over 21,300; this is at least 2,000 more than it was not much more than a month ago, and of course it’s a record high number for this year. Since there have been a total of 37,000 new CVE records added to the NVD this year, this means only about 43% contain a CPE name.

In other words, on average a simple search of the NVD using a known CPE name will only discover 43% of vulnerabilities identified since February 12, the day the NVD's problems started. Even though CVE records for the other 57% of vulnerabilities are present in the NVD, they don’t contain CPE names and therefore are invisible to searches. If you want to learn whether any of those vulnerabilities apply to your product, you need to do a text search of the 21,300 “unenriched” (i.e., CPE-less) CVE records. Of course, you would need to do that for every product of concern to you, and you would have to do it as often as you want to learn about newly reported vulnerabilities, which ideally is daily. Of course, nobody is going to do this.

Andrey also pointed out something even more startling: During the first four days of the week of December 2 (and presumably also on the day we were meeting, December 6), the NVD added CPE names to exactly 0% of new CVE records. Since their problems started on February 12th, the NVD has always enriched at least a few CVE records every day (other than a single day in May).

Of course, I assume the NVD will resume adding CPE names to CVE records sooner or later. But the idea that the NVD can eliminate their backlog in 2025 (or perhaps ever) looks more and more like fantasy. CISA has added about 2,000 CPEs for exploited vulnerabilities, but the backlog figure of 21,300 presumably takes those into account. In addition, a few firms like Vulners (Andrey’s employer) and VulnCheck have taken it upon themselves to add their own CPEs to some of the unenriched CVE records; unfortunately, neither of these firms has official “Alternate Data Provider” (ADP) status, so it isn’t clear what will happen to the CPE names they created, when and if the NVD returns.

In other words, today automated searches of the NVD, and presumably vulnerability scanner output, will normally identify no more than 50% of vulnerabilities that have been reported since February 12. If you went to the doctor to diagnose your illness and they told you up front that they only knew about fewer than 50% of new diseases that have been discovered this year, would you keep going to them? That’s essentially the problem the software security community faces now.

What’s the solution to this problem? Some people have pointed to the CVE Numbering Authorities (CNAs) as the solution. These are the organizations, including a number of large software developers (e.g., Oracle, Microsoft, and Schneider Electric) and organizations like GitHub, MITRE and the Japanese JP-CERT, that create the CVE records in the first place. They report vulnerabilities in products they have developed themselves, as well as products from developers, including open source projects, that are not themselves CNAs.

The question is why the CNAs aren’t adding CPE names to the CVE records they create. Since the SBOM Forum includes several large CNAs, we have discussed this question a lot. I have heard two main answers:

1. In the past, the NVD has usually rejected CPE names that were created by anyone other than the NVD, presumably on the grounds that only the NVD knows how to create them. Unfortunately, if the NVD has some sort of secret process they follow to create CPE names, they have never revealed it. Moreover, that process seems to include at least a few purely random elements, since nobody has ever come up with a way to predict a CPE name with certainty. For a discussion of some of the problems with CPE, as well as how they might be addressed, see this 2022 paper by the SBOM Forum (the discussion of problems with CPE is found on pages 4-6).

2. To be honest, there seems to be little if any enthusiasm among the CNAs to start creating CPEs, precisely because so much of the process seems to be arbitrary. Nobody can be expected to invest a lot of time creating a CPE name when it has all the durability of a Jell-O sculpture.

Fortunately, there is an alternative to CPE called purl, which stands for “product URL”. In less than one decade, purl has gone from nowhere to completely conquering the open source software world. It is used as the software identifier in almost all vulnerability databases for open source software worldwide. The notable exception to this rule is the NVD and databases derived from it, which of course use CPE.

Why has purl been so successful in the open source world? This post discusses several reasons, but the most important is that a user who wants to know the purl for an open source product, which they downloaded from a package manager, will always create the same purl as any other user, as long as it is for the same version of the same product, which is found in the same package manager.

Moreover, the CNA reporting a vulnerability in the purl in a CVE record will create the purl using the same information – meaning a purl used to search for an open source project in a vulnerability database should always (barring human error) match the purl in a CVE record. Unlike the case with the NVD today, in which a search for CVEs applicable to a product will probably not reveal half of the vulnerabilities that have been identified in that product this year, a search in a purl-based open source vulnerability database like OSS Index should always yield every vulnerability that has ever been reported for the same product.

However, there are two important tasks (each with sub-tasks) that need to be accomplished, before purl can be placed on an equal footing with CPE in CVE records.[ii] They are:

First task: CVE Numbering Authorities need to start including purls in CVE records, when the product being referenced is an open source product in a package manager. While that is technically possible now due to the CVE 5.1 specification coming into effect this past spring, it turns out that virtually none of the CNAs are in fact doing this. The biggest reason is undoubtedly that neither of the two major US government-run databases, the NVD and CVE.org, currently accepts any software identifier other than CPE. So, a CVE record with a purl identifier is all dressed up with nowhere to go.

How can this situation be changed? Some group needs to conduct extensive outreach to the CNAs and to CVE.org (which runs the CVE Program, including recruiting and managing the CNAs). That outreach will include “evangelizing” about the advantages of including purls in CVE records, as well as training on the details of doing so. Just as importantly, the group needs to work with the CNAs and CVE.org to identify the policies and procedures that must be in place for purls to be successfully used in the CVE context.

One important part of this effort will be conducting an end-to-end proof of concept, in which:

1. CNAs will include a purl whenever they create a CVE record to report a new vulnerability in an open source product found in a package manager. The purl will be based on the package manager name, as well as the product name and version string in that package manager.

2. A purl-based vulnerability database will ingest the CVE record, just as the NVD does for CVE records now.

3. A user who has downloaded an open source product from a package manager will easily create a purl using the package manager name, as well as the product name and version string as registered in the package manager. Since the user’s purl should always match the purl that the CNA included in the CVE record, the search should always return every CVE that has been reported for that product.

The results of this proof of concept should help convince CVE.org and the CNAs that purl is a much better identifier for open source software than CPE.

Second task: Purl needs to be able to identify commercial software, not only open source software found in package managers. A scheme for doing this was suggested in 2022 by Steve Springett, leader of the OWASP Dependency Track and CycloneDX projects and a founding member of the OWASP SBOM Forum, in the above-referenced white paper on CPE naming in the NVD. Steve’s idea is that commercial software suppliers will create standardized short documents called “SWID tags”. These will provide authoritative metadata for a software product, including the supplier name, product name and version string.

Whenever the supplier wishes to report a new vulnerability in their product, they will provide the SWID tag to the CNA who creates the new CVE record. The CNA will create the product’s purl using the information in the SWID tag; they will include the purl in the CVE record. Later, when an end user wants to learn about new vulnerabilities that have been identified in a commercial product they use, they will be able to locate and download[iii] the same SWID tag as the CNA used when they created the purl in the CVE record. The fact that both the CNA and the end user will base their purls on the same SWID tag means the purls should be identical barring human error, just as in the above case of purls for open source software distributed in package managers.

The three primary goals of the project are:

1. To work with commercial software developers, vulnerability management service providers, and end users to identify policies and procedures for creation and use of purls based on SWID tags.

2. To evangelize and train CVE.org staff members and CNAs on creation and use of the new SWID-based purls. Of course, this effort will build on the evangelization and training in the first task.

3. To conduct an end-to-end proof of concept that essentially mirrors the one described in the first task, except that the purl name will always be based on the contents of the SWID tag prepared by a commercial software supplier, not the name and version string for an open source product distributed through a package manager.[iv]

Tom Alrich and Tony Turner of the OWASP SBOM Forum have developed a white paper that proposes a project to implement both of the above steps, as well as a project plan [v] for doing this. The project is called “Purl Expansion Design and Proof of Concept”. Because this project will almost certainly take more than a year to accomplish, and because neither of us is able to donate that amount of time, we are requesting donations to fund at least part of this effort. While we believe the whole project will require over $100,000 in funding, we are willing to start the project with a much more modest donation or donations.

If you or your organization are able to donate any amount over $1,000, you can donate to OWASP (a 501(c)(3) nonprofit organization) and have your donation “directed” to the SBOM Forum; this can be done either online or directly. Donations are often tax deductible.

If you would like to discuss this, please email Tom Alrich at tom@tomalrich.com.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

[i] CVE records – i.e., records of newly-discovered software vulnerabilities – are supposed to include one or more machine-readable software identifiers called CPE names. The CPE name identifies a software product that is affected by the vulnerability identified by the CVE number. Before February 12, 2024, the NVD always created a CPE name for every product named (in a text field) in the CVE record. However, on that day the NVD’s production of CPE names dropped precipitously; it has not recovered since that day.

[ii] There should be no problem with having both a CPE name and a purl in a single CVE record, since there is no intention of purl “replacing” CPE. As long as somebody - perhaps the NVD staff, or perhaps some CNAs who prefer CPE – is willing to keep creating new CPE names, they will continue to be used. Moreover, the huge set of CPE names already created (at least 250,000, and probably more than that) will not disappear, since there is no good way to replace them with purls in existing CVE records.

[iii] End users will be able to locate and download a SWID tag, as well as other types of software supply chain artifacts like SBOMs and VEX documents, by utilizing the upcoming Transparency Exchange API. It will be fully available in 2025.

[iv] Package managers almost never distribute commercial software.

[v] The project plan primarily focuses on the second step, since the need for the first step was not apparent until very recently.

Tom Alrich's Blog

Monday, December 9, 2024

It’s even worse than I thought

No comments:

Post a Comment

Get new posts by email: