Thursday, May 15, 2025

What I mean by a federated vulnerability database


I’ve been writing about a Global Vulnerability Database for at least a year; my most recent post on that topic is this one. What most people probably think when they hear that term is that I’m proposing one big database that will somehow combine all or most of the existing vulnerability databases. Since a vulnerability database requires machine-readable identifiers for software (e.g., purl and CPE) and vulnerabilities (CVE, OSV, GHSA, etc.), and since different vulnerability databases often use different identifiers, combining these databases into one usually means “harmonization” of the identifiers – i.e., “mapping” multiple identifiers into one.

For example, harmonization of software identifiers might mean mapping CPE names to “equivalent” purl identifiers, or vice versa. Or maybe both purl and CPE names will be mapped to a single third identifier to be named later. But here’s the thing about identifiers, whether we’re talking about identifiers for vulnerabilities, identifiers for software products, or both: They can almost never be cleanly mapped to each other. If they could, why would there be multiple identifiers in the first place?

Here's an example of what I mean: I’ve written a lot about purl and CPE. In this post, I described how a purl usually identifies a particular software package which is made available in a package manager. Since sometimes the same (or closely similar) software is made available in multiple package managers, a purl includes the name of the package manager. This ensures that purls will be unique, since the operator of a package manager makes sure there are no duplicate names; this is called a controlled namespace.

This also ensures that, if a “single” package is distributed through multiple package managers (as sometimes happens), there will be no confusion about which package manager we’re talking about. Since the purl includes the “type” that corresponds to the package manager, the purl always tells us which package manager is being referred to.

This is especially important, because usually there will be slight variations in the product between package managers, even if they’re in theory the “same” package – e.g., OpenSSL version 3.1.8. Since the purl differs between the package managers, and since a vulnerability might be present in the same product in one package manager but not another, it’s important to know which package manager is the source of the codebase your organization uses.

However, there usually will be confusion with CPE, since CPE doesn’t have a field for “package manager”. Sometimes, the person who creates the CPE builds the package manager name into the product name in the CPE, but more often there is no way to tie the CPE name to a particular package manager. This means there’s no way to directly map a purl for an open source product distributed in a package manager to a particular CPE. There are many other examples in which a software product identified with CPE or purl (or OSV, the other major software identifier) can never cleanly map to another identifier.

The same holds true for vulnerability identifiers. CVE is by far the most widely used vulnerability identifier, but there are others like GHSA (GitHub Security Advisories), Snyk ID and OSV. There’s no way to say upfront that CVE XYZ maps directly to GHSA ABC. However, often the organization that identifies a vulnerability will report it as a new CVE Record and at the same time report an ICSA (CISA’s ICS Security Advisory), for example. If the same organization did both reports (and especially if that organization is also the supplier of the product being reported on), there shouldn’t be any objection to the fact that the two identifiers can’t usually be directly mapped to each other. They’re “mapped” because they came from the same organization.

This is all a long way of saying that there’s no such thing as “harmonization” of either software or vulnerability identifiers. And if there’s no harmonization, this means the Global Vulnerability Database (GVD) can’t be a single database.

That’s why I call the GVD a “federated” database. Offhand, that term – federated database – might seem like an oxymoron. A database usually gives a single answer, but a federated database must inherently give multiple answers. However, when I use that term, I mean there are multiple databases, but they (almost) speak with one voice. There needs to be an “intelligent front end” that takes all the queries, routes them to the relevant individual database(s), and routes the answers back to the user.

What the federated database doesn’t do is somehow combine the answers from the different databases into a single “harmonized” answer. When there are different identifiers involved, there can’t be a harmonized answer. But that doesn’t mean it’s not worthwhile to receive multiple answers. 

For example, suppose a GVD user entered a purl for an open source product and requested all vulnerabilities – of all types – that affect that purl. They might get four different responses: 

1.      The front end could query OSS Index, an open source database that supports purl and identifies vulnerabilities using CVE. That query would return one or more CVEs that affect the product designated by the purl.

2.      The front end could query GHSA, which also supports purl. GHSA might return a CVE Record, an OSV advisory, a GHSA advisory, or even two or three of those.

3.      The front end could query OSV, which also supports purl. OSV will usually return an OSV advisory, but it could also return a CVE.

4.      Since the front end is intelligent, it might query the National Vulnerability Database (NVD) and notice that there’s a CPE identifier that probably corresponds closely to the purl in the original query. Therefore, it would conduct a query using that CPE, and return one or more CVE Records that reference that CPE. 

In other words, my Global Vulnerability Database won’t even attempt to deliver harmonized responses. Instead, it will provide you with every response it receives from any of the federated databases. If you’re the sort of person who wants just one answer, you might not appreciate this arrangement. But if you understand that vulnerability management is an inexact science – in fact, it isn’t a science at all – you might appreciate having a diversity of information sources to compare. 

Someday, it may be possible really to harmonize the responses from the GVD, so that people who want a single answer and people who value diversity might both be satisfied. But we’re not there now.

To produce this blog, I rely on support from people like you. If you appreciate my posts, please make that known by donating here. Any amount is welcome. Thanks!


If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. And while you’re at it, please donate as well!

 

No comments:

Post a Comment