Given the proliferation of serious software vulnerabilities like the log4shell vulnerabilities in the log4j library, software vulnerability management is an important component of any organization’s security program. Successful vulnerability management starts with successful vulnerability identification. This requires that:
1.
The supplier of the software reports vulnerabilities
they find in their products. These reports are incorporated into vulnerability
databases, especially the US National
Vulnerability Database (NVD). Almost all software vulnerabilities are
reported by the supplier of the software, not a third party.
2.
Later, users of the software can search the NVD
for new vulnerabilities that apply to software products they use. Learning
about these vulnerabilities enables the user to coordinate with the suppliers
of those products, to learn when they will patch the vulnerabilities and
encourage them to speed up patches for the most important vulnerabilities.
However, one important assumption underlies these two
requirements: that the user will always be able to learn about vulnerabilities
that apply to a product they use when they search a vulnerability database like
the NVD. The user will only be able to do this if they know how the supplier
has identified the product in the database.
It might seem like the solution to this problem is obvious:
The supplier will report the vulnerability using the name of the product and the
user will search for that name. The problem is that software products are notorious
for having many names, due to being sold under different brands or in different
sales venues, acquisition by a different supplier, etc. Even among the employees
of a large software supplier, their own products may be known by different
names. Trying to create – and especially maintain – a database that lists all
the names for a particular software product would be hugely expensive and would
ultimately fail, due to the rapidly increasing volume of new software products.
Given there will never be a definitive database of all the
names by which a single software product is known, how can a user be sure their
search will find the correct product in a vulnerability database? There needs
to be a single machine-readable identifier for the product, which the supplier includes
in the vulnerability report and the user searches for in the vulnerability database.
We have already ruled out the idea of a centralized database that lists all the
possible names for a single software product. How can we accomplish this goal
without a central database?
The solution is for the identifier to be based on something
that the supplier will always know before they report a vulnerability for their
product, and that the user will also know (or can easily learn) before they
search for that product in a vulnerability database. A good analogy for this is
the case of the formula for a chemical compound.
If a chemist has identified a compound whose molecules
consist of two hydrogen atoms and one oxygen atom, the chemist will write it as
“H2O” (of course, the “2” is normally written as a subscript). Every other
chemist will recognize that as water. Similarly, a compound of one sodium and
one chlorine atom is NaCl, which is table salt. Note that all chemists can create
and interpret these identifiers, without having to look them up in a central
database. A chemist who reads “NaCl” always knows which compound that refers
to.
There is a software identifier that works in the same way.
It’s called “purl”, which stands for “package URL”. It is in widespread use as
an identifier in vulnerability databases for open source software that is made available
for download through package managers (these are the primary locations through which
open source software is made available for download, although not all open
source software is available in a package manager).
To create a purl for an open source product, the supplier or
user only needs to know the product name, the version number (usually called a “version
string”) and the package manager name (such as PyPI). Because every product
name/version string combination will always be unique within one package manager
(although the same product/version might be available in a different package manager),
the purl that includes those three pieces of information is guaranteed to be
unique; it is also guaranteed always to point to the same product, since the
combination of product name and version string will never change for that
product/version.
For example, the purl for version
1.11.1 of the package named “django” in the PyPI package manager is “pkg:pypi/django@1.11.1”.
If a user wants to learn about vulnerabilities for version 1.11.1 of django in
the pypi package manager, they will always be able to find them using that
purl. If they upgrade their instance of django to version 1.12.1, they will
search for “pkg:pypi/django@1.12.1” (the “pkg” field is found in all purls). Since
the supplier will always use the same purl to report vulnerabilities, the user
can be sure their search will find all reported vulnerabilities for that
product/version.
Besides purl, the only vulnerability
identifier in widespread use is CPE, which stands for “Common Platform Enumeration”.
Without going into a lot of detail, CPE is the identifier used in the National
Vulnerability Database. It was developed more than 20 years ago by the National
Institute of Standards and Technology (NIST), which operates the NVD.
A CPE is created by a NIST employee or contractor and added
to a vulnerability (CVE) record in the NVD. Unfortunately, there is no way that
anyone can predict with certainty the CPE that this person will create. Some of
the reasons why this is the case are described on pages 4-6 of the OWASP SBOM
Forum’s 2022 white paper titled “A proposal to operationalize component
identification for vulnerability management”.
Currently (as of the fall of 2024), there is an even more
serious problem with CPE, in that since February the NVD staff has drastically
reduced the number of CPEs it creates. The result is that over two thirds of new
CVE records entered in 2024 do not have a CPE name attached to them. This makes
those CVEs invisible to automated searches using a CPE name. A user that searches
with a CPE name today may potentially never learn about two thirds of the
vulnerabilities that apply to their product/version.
The upshot of this situation is that, if truly automated software
vulnerability management is going to be possible again, purl needs to be the
default software identifier, both in CVE records and the National Vulnerability
Database. While most of the groundwork for achieving this result has already
been laid, there remains one big obstacle: Currently, there is no workable way
for purl to identify proprietary software. Since the majority of private and
public sector organizations in the world rely primarily on proprietary software
to run their businesses, this obstacle needs to be removed, so that users of
proprietary software products can easily learn about vulnerabilities present in
those products.
The OWASP SBOM Forum has identified two methods by which the
purl specification can be expanded to make vulnerabilities in proprietary
software products as easily discoverable as are vulnerabilities in open source
products today. We will soon be starting a working group to address this problem.
If you would like to participate in that group and/or provide financial support
through a donation to OWASP, please email me.
Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
No comments:
Post a Comment