I have written about a Global
Vulnerability Database before, by which I meant a database that would be
funded and run internationally. However, I’ve come to realize more recently
that “global”, in the context of vulnerability databases, means a lot more than
simply “international”. It means a database that relates multiple types of
vulnerabilities (there are a number of vulnerability types, although CVE is by
far the dominant type) and multiple types of software names.
As far as I know at the moment,
the only two software identifiers used in vulnerability databases are CPE,
which is used in the NVD and other databases derived from the NVD, and purl,
which is just about the only good option in the open source world. But there is
room for more. Specifically, a new identifier is needed for proprietary
software, since I (and others) regard CPE as a dead end, even though it was
pioneering in its time[i]. The OWASP SBOM Forum’s paper
from September 2022 describes the problems with CPE and the advantages of purl
in great detail. However, I will be the first to admit that our idea for including
proprietary software products in purl using SWID tags is still rudimentary and
requires a lot more thought.
In any case, I have expanded on my
GVD idea in the document below. I’d welcome any comments. I think the next step
is to start an open source project to design the “database”. Now that I’ve done
it once (with a lot of help from my friends), I think it would be easy to get
that project going in OWASP. And it’s easy for organizations to make restricted
donations to this project through OWASP (a 501(c)(3) non-profit corporation).
If you think your company might want to help out with this effort (obviously, designing
the database will not be an expensive effort at all), please let me know.
Toward a Global Vulnerability
Database
Tom Alrich, November 2023
Currently, there is no easy way to identify vulnerabilities
of all types (CVE, OSV, etc.) that apply to a single software product or
component of a software product. Also, because of the naming problem, there is
no easy way to identify all products affected by a particular vulnerability. Achieving
either of these goals requires multiple database searches and manual
correlation of the results; even after doing that, there is no guarantee that
the user will be able to achieve either goal.
The solution to these problems is usually described
as some sort of “harmonization” of vulnerability and/or product identifiers. In
other words, “All we need is a single means of identifying products and a
single means of identifying vulnerabilities. Then we can simply correlate the
vulnerabilities with the products and create a database that’s searchable using
both fields. What could be simpler?”
Unfortunately, an effort to harmonize either the different
types of vulnerability identifiers or the different types of product
identifiers, let alone both, is very likely to fail. This is because, in many
if not most cases, vulnerabilities or product identifiers of different types
simply can’t be harmonized. For example, since there can be only one CPE name
for an open source project but there can be a separate purl for each repository
in which the project’s code is found, directly mapping a CPE to each unique
purl would make no sense (plus, there is no assurance that the code in each
repository is exactly the same as in the other repositories, even though the
repositories may all have the same project name).
There needs to be a globally accessible vulnerability
database that incorporates all major vulnerability sources (including CVE, OSV,
Python security advisories, etc.), as well as all major product identifiers
(all product identifiers that are referenced by a major vulnerability source, an
elite club that now includes – as far as I know - CPE and purl). The database
should not even try to provide harmonized vulnerability and product identifiers,
because this simply can’t be done now.
In fact, the data don’t even need to reside in a single
database. The various constituent databases (NVD, OSV, OSS Index, etc.) can
simply be referenced through a single smart query engine, which is titled the “Global
Vulnerability Database” (GVD). A query could refer to any supported
vulnerability or product identifier; for example, “What are all the
vulnerabilities that apply to purl pkg:pypi/django@1.11.1?” or “To which
products does CVE-2023-12345 apply?”. The query engine would decide which
queries to make to which vulnerability databases (in some cases, performance
considerations may dictate that at least parts of the databases - e.g., the CPE
dictionary from the NVD - be downloaded regularly to a central location).
Of course, it would be more satisfying if every vulnerability
type could reference every product identifier and vice versa, but trying to do
that would require such a massive effort that it is effectively impossible.
What is possible is to undertake particular improvement projects like
adding purls to existing CVE reports; however, these may be expensive, and
there will probably always be significant issues with the GVD data. The
consolation is that the GVD will improve the current situation by providing a
central location from which to query multiple vulnerability databases, without removing
or degrading any currently available capability. Meanwhile, improvements like
the CVE JSON 5.1 spec can be introduced, that will bring the GVD much closer to
being a universal vulnerability database.
For example, currently no CVE report identifies a purl. When
a user looks in the NVD for vulnerabilities applicable to a particular purl,
they won’t see any CVEs at all. However, they will see them when CVE reports
start including purls after the CVE JSON 5.1 spec is implemented and the NVD
adopts that spec, but that is not likely to happen for at least the next couple
of years. Perhaps the GVD might support the JSON 5.1 spec before the NVD does.
The best way to achieve the goal of a GVD is through a
global effort, funded by private industry, nonprofit organizations and
government. It is likely that, as long as one or two well-known organizations
lead the initial effort, there will be substantial interest worldwide. Therefore,
obtaining adequate funding may not pose a big problem.
The first step should be the high-level database design. When
that is finished, a group will develop the detailed design, as well as a
roadmap for implementing the GVD (implementation can be done in stages, with
validation of each stage before moving on to the next one. While it would
certainly be advantageous to obtain funding for the entire project from the
beginning, that is probably unrealistic. Instead, the project team should
assume that each stage will need to be funded separately).
Below are likely goals to be achieved by this project:
1.
Access to the database needs to be free,
although high-volume commercial uses may be tariffed in some way.
2.
The database should be easily accessible
worldwide, except in remote areas, etc. In general, no country should have
their access to the database restricted, although there might be reasons to do
so in some cases, like active support of terrorism.
3.
The database needs to be able to scale easily,
meaning it can be built out in stages.
4.
Because there are errors in the current
databases (e.g., off-spec CPE names), there should be an ongoing effort to
clean up errors. There should also be an effort to make strategic enhancements
to the database, such as adding purl identifiers to existing and new CVE
reports. However, these efforts need to be undertaken as funds and time permit.
It is possible that volunteers can be found to assist in these efforts, such as
college cybersecurity majors.
The most important aspect of the GVD is that it needs to be
truly global. While individual governments will be welcome to contribute both
funds and human resources to the project, no government will exercise control
over the GVD; governance will be by an independent board. Ultimately – once the
GVD is operating smoothly and is being used heavily - the project might be
turned over to an international organization like IANA, as the NTIA (part of US
Dept. of Commerce) did with DNS in the 1990s (I believe the NTIA took over DNS
from the universities in California where it was developed. NTIA was
effectively the first domain registrar).
To sum this up, there needs to be a single searchable database
of vulnerabilities worldwide. This will probably not be a single physical
database implemented in a single facility. Instead, it might in effect be an
AI-based “switching center”, through which searches would be coordinated among
different vulnerability databases, using diverse identifiers for software and
vulnerabilities. 20 years ago, the technology required for this probably wasn’t
available. However, it is likely there are no significant technical obstacles
to constructing this database today. In fact, rather than create a massive uber-database
combining all other databases, this seems to be the approach that makes sense
and is doable. We’ll leave the uber-database for another day, if ever.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.
[i] CPE will be around for a long time, since there’s so much information about vulnerabilities and the software products they apply to – all identified with CPEs – in the CVE reports. However, open source software should now always be identified using purls, but that still leaves the question of how we identify proprietary software products, if not with CPEs.
Steve Springett (leader of the OWASP CycloneDX and
Dependency Track projects) has suggested that purl could be applied very easily
to software in online “stores” like the Apple Store and Google Play, since purl
is based on the idea of a download location. Given the huge amount of proprietary
software available in those stores (and many other online software stores, of
course), creating new purl types to incorporate them into the purl world would
go a long way toward addressing the problem of proprietary software. Perhaps
the idea in the SBOM Forum paper about SWID tags (which was also Steve’s idea)
could be fleshed out, to accommodate proprietary software that isn’t found in
online stores.
No comments:
Post a Comment