I’ve been writing about a Global Vulnerability Database for
at least a year; my most recent post on that topic is this
one. What most people probably think when they hear that term is that I’m
proposing one big database that will somehow combine all or most of the
existing vulnerability databases. Since a vulnerability database requires
machine-readable identifiers for software (e.g., purl and CPE) and
vulnerabilities (CVE, OSV, GHSA, etc.), and since different vulnerability databases
often use different identifiers, combining these databases into one usually
means “harmonization” of the identifiers – i.e., “mapping” multiple identifiers
into one.
For example, harmonization of software identifiers might
mean mapping CPE names to “equivalent” purl identifiers, or vice versa. Or
maybe both purl and CPE names will be mapped to a single third identifier to be
named later. But here’s the thing about identifiers, whether we’re talking
about identifiers for vulnerabilities, identifiers for software products, or both:
They can almost never be cleanly mapped to each other. If they could, why would
there be multiple identifiers in the first place?
Here's an example of what I mean: I’ve written a lot about
purl and CPE. In this
post, I described how a purl usually identifies a particular software
package which is made available in a package manager. Since sometimes the same
(or closely similar) software is made available in multiple package managers, a
purl includes the name of the package manager. This ensures that purls will be
unique, since the operator of a package manager makes sure there are no
duplicate names; this is called a controlled namespace.
This also ensures that, if a “single” package is distributed
through multiple package managers (as sometimes happens), there will be no
confusion about which package manager we’re talking about. Since the purl
includes the “type” that corresponds to the package manager, the purl always
tells us which package manager is being referred to.
This is especially important, because usually there will be
slight variations in the product between package managers, even if they’re in
theory the “same” package – e.g., OpenSSL version 3.1.8. Since the purl differs
between the package managers, and since a vulnerability might be present in the
same product in one package manager but not another, it’s important to know which
package manager is the source of the codebase your organization uses.
However, there usually will be confusion with CPE, since CPE
doesn’t have a field for “package manager”. Sometimes, the person who creates
the CPE builds the package manager name into the product name in the CPE, but
more often there is no way to tie the CPE name to a particular package manager.
This means there’s no way to directly map a purl for an open source product distributed
in a package manager to a particular CPE. There are many other examples in
which a software product identified with CPE or purl (or OSV, the other major software
identifier) can never cleanly map to another identifier.
The same holds true for vulnerability identifiers. CVE is by
far the most widely used vulnerability identifier, but there are others like
GHSA (GitHub Security Advisories), Snyk ID and OSV. There’s no way to say upfront
that CVE XYZ maps directly to GHSA ABC. However, often the organization that
identifies a vulnerability will report it as a new CVE Record and at the same
time report an ICSA (CISA’s ICS Security Advisory), for example. If the same
organization did both reports (and especially if that organization is also the
supplier of the product being reported on), there shouldn’t be any objection to
the fact that the two identifiers can’t usually be directly mapped to each other.
They’re “mapped” because they came from the same organization.
This is all a long way of saying that there’s no such thing
as “harmonization” of either software or vulnerability identifiers. And if
there’s no harmonization, this means the Global Vulnerability Database (GVD) can’t
be a single database.
That’s why I call the GVD a “federated” database. Offhand,
that term – federated database – might seem like an oxymoron. A database
usually gives a single answer, but a federated database must inherently give
multiple answers. However, when I use that term, I mean there are multiple
databases, but they (almost) speak with one voice. There needs to be an “intelligent
front end” that takes all the queries, routes them to the relevant individual
database(s), and routes the answers back to the user.
What the federated database doesn’t do is somehow combine the answers from the different databases into a single “harmonized” answer. When there are different identifiers involved, there can’t be a harmonized answer. But that doesn’t mean it’s not worthwhile to receive multiple answers.
For example, suppose a GVD user entered a purl for an open source product and requested all vulnerabilities – of all types – that affect that purl. They might get four different responses:
1.
The front end could query OSS Index, an open source database
that supports purl and identifies vulnerabilities using CVE. That query would
return one or more CVEs that affect the product designated by the purl.
2.
The front end could query GHSA, which also
supports purl. GHSA might return a CVE Record, an OSV advisory, a GHSA
advisory, or even two or three of those.
3.
The front end could query OSV, which also supports purl. OSV will usually
return an OSV advisory, but it could also return a CVE.
4. Since the front end is intelligent, it might query the National Vulnerability Database (NVD) and notice that there’s a CPE identifier that probably corresponds closely to the purl in the original query. Therefore, it would conduct a query using that CPE, and return one or more CVE Records that reference that CPE.
In other words, my Global Vulnerability Database won’t even attempt to deliver harmonized responses. Instead, it will provide you with every response it receives from any of the federated databases. If you’re the sort of person who wants just one answer, you might not appreciate this arrangement. But if you understand that vulnerability management is an inexact science – in fact, it isn’t a science at all – you might appreciate having a diversity of information sources to compare.
Someday,
it may be possible really to harmonize the responses from the GVD, so that people
who want a single answer and people who value diversity might both be satisfied.
But we’re not there now.
To produce this blog, I rely on
support from people like you. If you appreciate my posts, please make that
known by donating here. Any amount is welcome. Thanks!
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. And while you’re at it, please donate as well!