Sunday, June 8, 2025

Rules for the Global Vulnerability Database


I recently described my idea for a Global Vulnerability Database. The GVD won’t be a database at all, but rather an “intelligent switching hub” that accepts vulnerability queries that are in the form:

“What Vulnerabilities are found in Product ABC?”, or

“What Products are affected by Vulnerability 123?”

The Product and Vulnerability fields are both intended to be as universal as possible; that is, they should accept all major machine-readable identifiers. For example, the Vulnerability field will accept CVE, OSV, GHSA (GitHub Security Advisory), and other vulnerability identifiers. The Product field will accept CPE, purl, OSV, and perhaps other product identifiers.

While this was not always the case, it is safe to assume that today there is no major vulnerability database that does not accept and/or output machine readable vulnerability identifiers, product identifiers, or both. However, in this regard there are two important differences between the GVD and other vulnerability databases:

1.      With one notable exception[i], it is unlikely there is any vulnerability database today that, in response to a query for vulnerabilities that affect Product ABC, will provide more than one type of vulnerability identifier - for example, both CVE and GHSA. Moreover, with the same exception, it is unlikely there is any vulnerability database today that, in response to a query for products that are affected by a particular vulnerability (e.g., CVE-2025-12345), will provide more than one type of product identifier, e.g. purl and CPE. This is because most vulnerability databases are designed to associate a single type of product identifier with a single type of vulnerability identifier. For example, the NVD only associates CPE names for products with CVE numbers for vulnerabilities; the OSS Index open source database only associates purl identifiers with CVE numbers; etc.

2.      It is also safe to say there is no vulnerability database today that will respond to a query like “Show me vulnerabilities of all types that affect Product ABC”, by displaying all major types of vulnerability identifiers. It’s also safe to say there’s no vulnerability database today that will respond to a query like, “Show me products of all types that are affected by CVE-2025-12345”, by displaying all major types of product identifiers. Yet, my ambition is that the GVD will do both of those things.

However, there is a potential fly in this ointment: There is no way to create an unambiguous mapping either between different types of vulnerability identifiers (e.g., CVE to OSV) or different types of product identifiers (e.g., CPE to purl). Here are several examples:

A. Most vulnerabilities are assigned to products as part of a coordinated vulnerability disclosure process. For example, an open source project (“Project 1”) might report a new vulnerability they have identified in their product to the CVE Program. A CVE Numbering Authority (CNA) will create a new CVE record for the vulnerability and assign it a CVE number like CVE-2024-56789. If the project team also registers the new vulnerability with GitHub, it will receive a GHSA identifier as well. Given that the same team is responsible for both registrations for the vulnerability (CVE and GHSA), the two registrations will usually be considered to identify the same vulnerability.

B. However, if a separate open source project registers a similar vulnerability as a GHSA and asserts it is the same as the vulnerability described in CVE-2024-56789, this assertion may meet with skepticism in the CVE Program, since the two registrations were not by the same team. Since there is no easy way to resolve a dispute like this, the only safe policy is to accept two registrations as being for the same vulnerability only if they were both created by the same organization or person. If that is not the case, the two registrations need to be considered different vulnerabilities.

C. Libraries are widely used by both open source and commercial developers. Usually, a vulnerability will be present in just one module of a library, not all of them. However, since CPE names identify the product that contains the vulnerability and the library itself is the product, this means a CPE name will not usually refer to the vulnerable module[ii].

By contrast, purl (“package URL”) identifies a package. Since each module of a library is its own package, this makes it possible to identify the location of a vulnerability with much more precision.[iii] Thus, there can be no CPE “equivalent” of a purl that references a single library module.

The primary lesson to be drawn from the above examples is that, because there are so many reasons why one type of vulnerability or product identifier will not be “translatable” to another type, it would be a bad idea to try to “harmonize” the identifiers into one type – for instance, make purl the “universal” product identifier or CVE the “universal” vulnerability identifier, with all other identifiers “translated” to one or the other. On the other hand, if it might benefit a vulnerability database user to learn about a vulnerability or vulnerable product that is like the one included in the response to their query, the GVD will usually provide both the exact and the similar match.

This means that, even though the user will usually enter a straightforward query that lists just one or two product identifiers, the response will not necessarily be limited to the same identifiers. The GVD will always assume that the user is interested in seeing as much relevant information as possible, even if they end up discarding some of what they are shown.[iv]

Here are two examples of how a single query might work:

Query 1: “What current vulnerabilities have been identified in the open source project Django version 5.2?”

The query is parsed into three queries to three vulnerability databases:

·        To the NVD: “What vulnerabilities affect Django version 5.2?” The response to this query is this list of four CVE numbers. Each of those can be queried separately for more information on the vulnerability.

·        To GitHub Advisory Database (GAD): “What vulnerabilities affect Django version 5.2?” The response to this query is this list of two CVE numbers, which are both included in the NVD response. The first of the two CVEs corresponds to the GitHub ID GHSA-7xr5-9hcq-chf9, which can be searched on separately. The second CVE corresponds to GHSA-8j24-cjrq-gr2m, which can also be searched on separately.   

·        To Sonatype OSS Index: “What vulnerabilities apply to purl pkg:pypi/django@5.2?”[v] The response to this query is this list of two CVEs. These are the same CVEs shown by the GitHub Advisory Database. However, clicking on either of the CVE lines provides additional information not provided by either the NVD or GAD.

All three results will be provided to the user, as well as results from queries to any other vulnerability database like OSS Index or OSV, if different results are obtained. Note that, while the NVD and GAD queries are identical, the OSS Index query uses the purl for Django v5.2.[vi]

Query 2: “What products are affected by CVE-2021-45046?”

The query is parsed into two queries to two vulnerability databases:

·        To the NVD: “What products are affected by CVE-2021-45046?” The response to this query identifies twelve “Known affected software configurations”, which among them list over 50 CPE names.

·        To GitHub Advisory Database: “What products are affected by CVE-2021-45046?” The response to this query illustrates the fact that there is not always a list of machine-readable software identifiers available. The primary feature of this page is the set of references – security advisories by various developers and manufacturers, including patch URLs. These references need to be parsed “manually”.

Of course, even though the response from the NVD includes machine readable software identifiers and the response from the GAD does not, that doesn’t mean the two responses should not be displayed together. Both responses provide a set of references; it is unlikely that the two sets are identical. Since most queries about CVE-2021-45046 are probably motivated by a search for a patch (this is one of the vulnerabilities associated with the log4shell vulnerability in the log4j library), users will want to see as many references as possible. 

The moral of this story is that a query to the Global Vulnerability Database will usually yield multiple responses. These will include

1.      Responses from databases other than the one originally intended in the query, as well as

2.      Responses generated from queries using identifiers that are similar to, but not the same as, the identifier used in the query.

Of course, the additional queries will not be generated by some mechanistic process, but rather by an intelligent process that will run in the “front end” of the GVD. Does this mean that the front end will run a large language model created by generative AI? No. My opinion (which I’ll be glad to discuss with anybody who thinks differently) is that the decisions on alternative queries in the GVD need to be based on a set of identifiable rules that can be audited.[vii] 

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. And please donate as well!


[i] The exception is the OSV vulnerability database.

[ii] In some cases, the person who creates the CPE name creates a “product name” that includes the names of both the library and the vulnerable module. However, there is no consistent procedure for doing this, so it cannot be used for an automated response.

[iii] Because software developers often do not install library modules that are not directly used by their product, this means that a lot of patches for libraries are issued and applied needlessly, since the vulnerable module was never included in the product in the first place. This was the case with the log4shell vulnerability in the log4j library.

Log4shell affected just the log4core module, meaning any developer that had not installed that module didn’t need to patch the library. However, since vulnerability advisories that referred to the CPE name (and thus only designated the log4j library as vulnerable, not the log4core module) didn’t capture this subtlety, many developers probably fell into this category.

[iv] Since some users will not be interested in seeing close matches, a GVD user will be able to suppress display of any match except an exact one. In that case, the output they receive will be close to what they will receive from a search on a single database.

[v] A purl can be easily created using a simple formula and information that a user should have readily available (or else be able to find quickly). In this case, the user just needs to know the package name, version number, and the repository from which they downloaded the package. The repository (known as the purl “type”) is PyPI, which stands for Python Package Index.

[vi] Every purl has a “type” that usually indicates the repository from which the software was downloaded. The purl in this example has the type “pypi”, which refers to PyPI, the Python Package Index. If Django is not available in other repositories than PyPI, this means there is only one possible purl to use in a search for Django in OSS Index. However, if Django were available in other repositories (e.g. package managers), each of those could be used for a separate search in OSS Index, by simply replacing “pypi” with the type for the other package manager and then re-running the search. 

While it might seem odd to search the same vulnerability database three times for the same product name and version number, there is a good reason for doing this: There can be no assurance that a vulnerability that applies to a particular product/version in one package manager will also apply to the “same” product/version in a different package manager. In other words, purl treats products with the same name and version number as different products if they are found in different repositories.

[vii] This is like an early type of AI called “expert system”. These systems were literally created by interviewing an expert in a certain process (e.g., operation of a machine in a manufacturing plant) and codifying their advice into a set of rules. A simulation of the process would then be run, governed by these rules; the rules would be iteratively tweaked to improve the outcome of the process. After the process was running smoothly in the simulation, the rules would then be tested on the physical process itself.

The most important aspect of this procedure was that any change in the rules could be audited. If a rule was changed but that didn’t improve the process, the change would be backed out and a different change would be tried.

My blog is more popular than ever, but I need more than popularity to keep it going. I’ve been told I should either accept advertising or charge a subscription fee, or both. However, neither of those options appeals to me. It would be great if everyone who appreciates my posts could donate a $20-$25 “subscription fee” once a year (of course, I welcome larger amounts as well!). Will you do that today?

No comments:

Post a Comment