Wednesday, June 26, 2024

Breaking free of the NVD


In April, I announced the new OWASP Vulnerability Database Working Group, part of the OWASP SBOM Forum. The group was formed to try to make sense of the many options available for vulnerability databases, especially since the seeming collapse of the National Vulnerability Database (NVD) made it imperative for all members of the software security community to learn what their options are (there have always been lots of options, but when the NVD was working reasonably well, many organizations were happy to put all their eggs in that basket. Unfortunately, the NVD is no longer working well, in case you didn’t know).

That group, which meets biweekly, has had some very interesting discussions (the meeting notes and chats are here). In the meeting this week, we discussed the problems caused by the fact that the NVD has stopped “enriching” CVE reports by adding CPE names. That discussion revealed a problem that I didn’t know existed. Since understanding the problem requires understanding how information gets into the NVD in the first place, I’ll start there.

The NVD is a database of software vulnerabilities, which are identified by a CVE number (e.g., CVE-2021-44228, the famous log4j – actually, log4shell – vulnerability). CVE numbers are maintained in a database operated by the MITRE Corporation under a contract with DHS. The database used to be called just “MITRE”, but now it’s officially known by its URL: cve.org. While MITRE personnel run cve.org day-to-day, they report to an independent board composed of representatives from private industry and government (including CISA and the NVD).

Like probably most people, I used to think that vulnerabilities were reported by independent researchers and white hat hackers directly to MITRE, and that the developer of the software is not usually involved in this process. However, that’s literally the opposite of the truth. In fact, almost all CVEs are reported by the supplier of the software itself in a CVE report.

A CVE report needs to be created by a “CVE Numbering Authority” (CNA), which assigns a CVE number to the vulnerability. In most cases, the CNA is a large software developer – Oracle, Red Hat, Microsoft, HPE, Schneider Electric, etc. Some CNAs just report vulnerabilities discovered in their own software. Others, like Red Hat and GitHub (a division of Microsoft), advertise that they will help other developers (within a certain scope, like “open source projects” or a particular industry or country) create CVE reports for vulnerabilities they’ve discovered in their products.

A developer that isn’t a CNA but wants to report a vulnerability in one of their products can contact a CNA that has them within their advertised scope. And if a developer can’t find a CNA that seems likely to be able to help them, they can contact MITRE itself, which is the “CNA of Last Resort” (CISA is the CNA of Last Resort for Industrial Control Systems and medical devices).

Of course, the CVE Report doesn’t just describe a vulnerability. It always needs to point to at least one product (software or an intelligent device) that is subject to the vulnerability. In at least 80% of cases, the product in the report was developed by the CNA that created the report.

There are two ways in which the product subject to the CVE can be referred to in the report. The default is always a textual description, e.g. “Cisco Crosswork Network Controller version 3.0.0” – and it’s safe to say that every CVE report includes such a textual description of the product. However, a user searching for vulnerabilities in a product they utilize will almost never be able to find the product in a vulnerability database like the NVD simply by searching on a text description; this is because there are many ways in which the product can be identified textually (to use the above example, that product might be described as “Cisco Crosswork Network Controller v3.0.0”, “Cisco, Inc. Crosswork Network Controller version 3.0.0”, “Cisco Crosswork Network Controller version 3.0”, etc. None of these would find a match if entered in the NVD).

This is why there should always be a machine-readable software identifier on the CVE report; a user that knows the identifier for a product can search for it in a vulnerability database like the NVD by entering that identifier. Currently, the only identifier supported by the NVD is the CPE name. If the user enters the correct CPE name for the product, the search result will either describe any vulnerabilities to which the product is subject or return a null result, which the user can trust to be an indication that no vulnerabilities have been reported for that product.[i] If they don’t know the correct CPE name for the product (and, unlike the purl identifier, the CPE name can’t be definitively predicted from information available to the user), they’re out of luck.

When the CNA creates the CVE report, they should include a CPE name for the product or products affected by the vulnerability. However, in the past the CNAs have often not done that. One reason for this is that the CNA may not feel comfortable creating the CPE name, because the specification isn’t easy to understand. Another reason is that the NVD, when they receive the CVE report from CVE.org, is supposed to “enrich” it with information that they provide; one of those pieces of information is the CPE name. In many cases, if a CNA included a CPE name in a CVE report, it was overwritten when the NVD enriched the report (this also has happened a lot with the CVSS score). The result was that, even when the CNA included a CPE name in the report, the CPE name in the NVD was the one that a NIST employee had created, not the CNA.

Of course, as long as a user can learn the CPE name in the NVD (perhaps through the vendor of the product), this isn’t a terrible situation. However, on February 12, 2024, the NVD abruptly reduced the number of CVE reports that they enriched to almost zero; while this has recovered to some degree, it’s still far below where it should be[ii].

Even that wouldn’t be a terrible problem if the CNAs simply created their own CPEs. CVE.org is pressing them to do that, and the five or six largest CNAs (which account for the vast majority of CVE reports) are doing this, at least for reports of vulnerabilities in their own products. The problem is that most of the CNAs aren’t including CPE names in their CVE reports. This makes the reports unusable in most widely-used applications, since they all require the ability to automatically find a product in the NVD; manual searches are close to useless.

We discussed this issue in the meeting of the OWASP Vulnerability Database Working Group this week. The Directors of Product Security of two of the largest software developers in the US (both large CNAs) were in the meeting, and both pointed to a big reason why many CNAs aren’t including CPE names in their reports: since the NIST people who enrich the CVE reports almost always must choose one of many different vendor names (Microsoft, Microsoft Inc., Microsoft Europe, etc.), product names (Microsoft Word, Microsoft Office Word, Word, etc.), and more, there is no way up front to know for certain what choices they’ll make. If the CNA enters the CPE name it believes is appropriate, the NVD staff may override that with their own CPE name (and this has happened a lot in the past).

These two large CNAs (and many other people, of course) would like to learn what rules the NVD staff members follow when they create CPE names, so they can make sure their staff members follow those same rules when they create CPE names for CVE reports. However, nobody has been able to get that information from the NVD (my guess is this is because the NVD doesn’t have rules to follow, but won’t admit that).

Unfortunately, there’s probably no near-term solution to this problem, except for CVE.org to provide training to the CNAs on how they should be creating CPE names, and hope the NVD doesn’t suddenly start creating their own CPE names again.

However, given that there’s no definitive way to identify values for the fields included in a CPE name (vendor name, product name, etc.), there will never be a real solution to this problem as long as CPE is the only option for naming software in the NVD. The ultimate solution to this problem is to take advantage of the fact that the new CVE version 5.1 specification (formerly the “CVE JSON specification”) includes the capability to utilize purl identifiers.

If a CNA adds a purl identifier to the CVE report (they probably have to include a CPE also), and if the vulnerability database supports purl (which isn’t the case now with the NVD and won’t be anytime soon, for sure. Of course, CVE.org should be supporting it now, although there probably aren’t any purls in that database now), a user will be able to find recent vulnerabilities for a product by searching on its purl. This should always be predictable, based on information the user should have already: the location from which they downloaded the package (e.g. Maven Central), the name of the package in that location, and the version string for the package in that location (the SBOM Forum white paper goes into a lot of depth in explaining why this is the case).

As I discussed in this post last year, purl is already used to identify software in almost every vulnerability database in the world which isn’t based on CPE (this means the NVD and the databases based on it). However, currently, purl can only be used to find open source software packages in vulnerability databases, not proprietary (“closed source”) products.

In the SBOM Forum’s paper, we described a scheme – based on a suggestion from Steve Springett, the leader of the OWASP CycloneDX and Dependency Track projects and also a purl maintainer – in which a software developer will create a SWID tag for each new product and version and make that tag available with the binaries. As we were writing the paper, Steve got a new purl type added (each download location, usually a package manager, has its own purl type), called SWID. If a user has the SWID tag for a product and wants to find about vulnerabilities in it, they will be able to create a purl using just 3 or 4 fields from the SWID tag (the SWID spec supports about 80 fields, but only a few of them are needed to create the purl).

If the CNA that created the CVE report (say it’s for CVE-2024-12345) included a purl in the report for one of their proprietary products, they presumably based it on the SWID tag (since they probably work for the developer that created both the product and its tag). Thus, the purl the user enters in their search (which was developed using the SWID tag they found on the developer’s website) should always match the purl associated with the CVE number. This means the user should always be able to find out that their product is vulnerable to CVE-2024-12345. This sort of certainty is never possible with CPE.

The big fly in the ointment currently is that what I’ve just described only applies to new software products or versions, not to existing or legacy ones. There needs to be some mechanism by which a user of a legacy product version can find a SWID tag for their version as well. The good news is that it shouldn’t be hard to create such a mechanism. For example, I’ve suggested that a software supplier could have a known location on their website called maybe “SWID.txt”. It would provide a list of products and versions, along with a SWID tag for each. A tool could search on the product and version and find the SWID tag; using that, the tool could create the purl for the product/version.[iii]

Of course, there would be other ways to make the SWID tag information available to users of legacy products and versions. In fact, there’s no reason why SWID tags even need to be used for this purpose. There just needs to be a way for the supplier to make information required to identify their products available to users of both current and legacy products. There are lots of ways this could be done.

I would love to see the OWASP Vulnerability Database working group address this task, but currently it’s beyond our means. If your organization might be interested in supporting this work through man (or woman) power or a donation to OWASP (or both), please drop me an email.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

I lead the OWASP SBOM Forum, which works to understand and address issues like what’s discussed in this post; please email me to learn more about what we do or to join us. You can also support our work through easy directed donations to OWASP, a 501(c)(3) nonprofit. Email me to discuss that.

My book "Introduction to SBOM and VEX" is now available in paperback and Kindle versions! For background on the book and the link to order it, see this post.


[i] Unfortunately, because of deficiencies in the NVD, a null result for a vulnerability search can mean many other things, such as that the user unknowingly fat-fingered the CPE name. This and other problems with CPE are described on pages 4-6 of the OWASP SBOM Forum’s 2022 white paper linked above. 

[ii] CISA has tried to help out with their Vulnrichment program, but that only addresses a small fraction of the non-enriched records. In addition, some CNAs report that there are problems with CISA’s work. 

[iii] In fact, this could be simplified if the supplier listed the purl along with the SWID tag, since there should always be a one-to-one correspondence between the two.

No comments:

Post a Comment