Friday, November 10, 2023

The Global Vulnerability Database won’t be a “database” at all


I have written about a Global Vulnerability Database before, by which I meant a database that would be funded and run internationally. However, I’ve come to realize more recently that “global”, in the context of vulnerability databases, means a lot more than simply “international”. It means a database that relates multiple types of vulnerabilities (there are a number of vulnerability types, although CVE is by far the dominant type) and multiple types of software names.

As far as I know at the moment, the only two software identifiers used in vulnerability databases are CPE, which is used in the NVD and other databases derived from the NVD, and purl, which is just about the only good option in the open source world. But there is room for more. Specifically, a new identifier is needed for proprietary software, since I (and others) regard CPE as a dead end, even though it was pioneering in its time[i]. The OWASP SBOM Forum’s paper from September 2022 describes the problems with CPE and the advantages of purl in great detail. However, I will be the first to admit that our idea for including proprietary software products in purl using SWID tags is still rudimentary and requires a lot more thought.

In any case, I have expanded on my GVD idea in the document below. I’d welcome any comments. I think the next step is to start an open source project to design the “database”. Now that I’ve done it once (with a lot of help from my friends), I think it would be easy to get that project going in OWASP. And it’s easy for organizations to make restricted donations to this project through OWASP (a 501(c)(3) non-profit corporation). If you think your company might want to help out with this effort (obviously, designing the database will not be an expensive effort at all), please let me know.

 

Toward a Global Vulnerability Database

Tom Alrich, November 2023

Currently, there is no easy way to identify vulnerabilities of all types (CVE, OSV, etc.) that apply to a single software product or component of a software product. Also, because of the naming problem, there is no easy way to identify all products affected by a particular vulnerability. Achieving either of these goals requires multiple database searches and manual correlation of the results; even after doing that, there is no guarantee that the user will be able to achieve either goal.

The solution to these problems is usually described as some sort of “harmonization” of vulnerability and/or product identifiers. In other words, “All we need is a single means of identifying products and a single means of identifying vulnerabilities. Then we can simply correlate the vulnerabilities with the products and create a database that’s searchable using both fields. What could be simpler?”

Unfortunately, an effort to harmonize either the different types of vulnerability identifiers or the different types of product identifiers, let alone both, is very likely to fail. This is because, in many if not most cases, vulnerabilities or product identifiers of different types simply can’t be harmonized. For example, since there can be only one CPE name for an open source project but there can be a separate purl for each repository in which the project’s code is found, directly mapping a CPE to each unique purl would make no sense (plus, there is no assurance that the code in each repository is exactly the same as in the other repositories, even though the repositories may all have the same project name).

There needs to be a globally accessible vulnerability database that incorporates all major vulnerability sources (including CVE, OSV, Python security advisories, etc.), as well as all major product identifiers (all product identifiers that are referenced by a major vulnerability source, an elite club that now includes – as far as I know - CPE and purl). The database should not even try to provide harmonized vulnerability and product identifiers, because this simply can’t be done now.

In fact, the data don’t even need to reside in a single database. The various constituent databases (NVD, OSV, OSS Index, etc.) can simply be referenced through a single smart query engine, which is titled the “Global Vulnerability Database” (GVD). A query could refer to any supported vulnerability or product identifier; for example, “What are all the vulnerabilities that apply to purl pkg:pypi/django@1.11.1?” or “To which products does CVE-2023-12345 apply?”. The query engine would decide which queries to make to which vulnerability databases (in some cases, performance considerations may dictate that at least parts of the databases - e.g., the CPE dictionary from the NVD - be downloaded regularly to a central location).

Of course, it would be more satisfying if every vulnerability type could reference every product identifier and vice versa, but trying to do that would require such a massive effort that it is effectively impossible. What is possible is to undertake particular improvement projects like adding purls to existing CVE reports; however, these may be expensive, and there will probably always be significant issues with the GVD data. The consolation is that the GVD will improve the current situation by providing a central location from which to query multiple vulnerability databases, without removing or degrading any currently available capability. Meanwhile, improvements like the CVE JSON 5.1 spec can be introduced, that will bring the GVD much closer to being a universal vulnerability database.

For example, currently no CVE report identifies a purl. When a user looks in the NVD for vulnerabilities applicable to a particular purl, they won’t see any CVEs at all. However, they will see them when CVE reports start including purls after the CVE JSON 5.1 spec is implemented and the NVD adopts that spec, but that is not likely to happen for at least the next couple of years. Perhaps the GVD might support the JSON 5.1 spec before the NVD does.

The best way to achieve the goal of a GVD is through a global effort, funded by private industry, nonprofit organizations and government. It is likely that, as long as one or two well-known organizations lead the initial effort, there will be substantial interest worldwide. Therefore, obtaining adequate funding may not pose a big problem.

The first step should be the high-level database design. When that is finished, a group will develop the detailed design, as well as a roadmap for implementing the GVD (implementation can be done in stages, with validation of each stage before moving on to the next one. While it would certainly be advantageous to obtain funding for the entire project from the beginning, that is probably unrealistic. Instead, the project team should assume that each stage will need to be funded separately).

Below are likely goals to be achieved by this project:

1.      Access to the database needs to be free, although high-volume commercial uses may be tariffed in some way.

2.      The database should be easily accessible worldwide, except in remote areas, etc. In general, no country should have their access to the database restricted, although there might be reasons to do so in some cases, like active support of terrorism.

3.      The database needs to be able to scale easily, meaning it can be built out in stages.

4.      Because there are errors in the current databases (e.g., off-spec CPE names), there should be an ongoing effort to clean up errors. There should also be an effort to make strategic enhancements to the database, such as adding purl identifiers to existing and new CVE reports. However, these efforts need to be undertaken as funds and time permit. It is possible that volunteers can be found to assist in these efforts, such as college cybersecurity majors.

The most important aspect of the GVD is that it needs to be truly global. While individual governments will be welcome to contribute both funds and human resources to the project, no government will exercise control over the GVD; governance will be by an independent board. Ultimately – once the GVD is operating smoothly and is being used heavily - the project might be turned over to an international organization like IANA, as the NTIA (part of US Dept. of Commerce) did with DNS in the 1990s (I believe the NTIA took over DNS from the universities in California where it was developed. NTIA was effectively the first domain registrar).

To sum this up, there needs to be a single searchable database of vulnerabilities worldwide. This will probably not be a single physical database implemented in a single facility. Instead, it might in effect be an AI-based “switching center”, through which searches would be coordinated among different vulnerability databases, using diverse identifiers for software and vulnerabilities. 20 years ago, the technology required for this probably wasn’t available. However, it is likely there are no significant technical obstacles to constructing this database today. In fact, rather than create a massive uber-database combining all other databases, this seems to be the approach that makes sense and is doable. We’ll leave the uber-database for another day, if ever.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.


[i] CPE will be around for a long time, since there’s so much information about vulnerabilities and the software products they apply to – all identified with CPEs – in the CVE reports. However, open source software should now always be identified using purls, but that still leaves the question of how we identify proprietary software products, if not with CPEs. 

Steve Springett (leader of the OWASP CycloneDX and Dependency Track projects) has suggested that purl could be applied very easily to software in online “stores” like the Apple Store and Google Play, since purl is based on the idea of a download location. Given the huge amount of proprietary software available in those stores (and many other online software stores, of course), creating new purl types to incorporate them into the purl world would go a long way toward addressing the problem of proprietary software. Perhaps the idea in the SBOM Forum paper about SWID tags (which was also Steve’s idea) could be fleshed out, to accommodate proprietary software that isn’t found in online stores.

No comments:

Post a Comment