CISA recently published a white paper on “Software Identification Ecosystem Option Analysis”. This paper is almost a textbook example of the above three principles, and especially of the corollary. You may know that CISA has been promising a white paper to address the software naming problem for at least a year. But the thing about the naming problem is that it’s a really…hard…problem. It’s not something that can be solved with a paper or two.[i]
I think CISA’s
paper could have been much more successful if the writers had taken time up
front to ask themselves, “Why does the naming problem need to be solved?” Obviously,
the fact that there isn’t a consistent, universal naming scheme for software
products by itself shouldn’t keep anybody awake at night. What is the real
problem this causes?
While software naming issues show
up in many areas – e.g., cataloguing software products of different types – there
is one area where the naming problem is causing significant and ongoing harm:
that is in software vulnerability management. Specifically, the naming problem
makes it difficult – and often impossible - for a user organization to learn
about vulnerabilities that are present in a software product it uses (whether
in the product itself or in one of its components).
This is best illustrated in the
case of CPE names in the NVD, discussed on pages 4-6 of the SBOM Forum’s (now the
OWASP SBOM Forum’s) white
paper on solving the naming problems in the NVD. If a software product can’t
be accurately identified in a vulnerability database, the user will never be
able to learn about vulnerabilities they need to remediate (most likely by regularly
contacting the supplier’s help desk until they release a patch for the vulnerability).
Thus, if I had been asked, I would
have suggested that the CISA paper ask and answer the question, “How can we make it more likely that users trying to learn
about vulnerabilities in the software they use will be successful?” The
answer to this question would certainly involve questions regarding the different
identifiers available and how they can be properly utilized in vulnerability
management, but also other problems like the structure and governance of
vulnerability databases.
Unfortunately, this wasn’t the
question that the CISA team asked – and answered – in their paper. What was the
question they actually answered? While it was never stated directly, I would
summarize it as the following:
“Any solution to the naming problem
requires a single global uber-identifier, into which all other software
identifiers can be mapped. What is that identifier?”
On the last page (page 22), they
give their answer: There are three options that “can serve as starting points
to refine the merits of various operational models.” They are:
1.
OmniBOR, which used to be known as GitBOM. Ed Warnicke, co-founder
of GitBOM, provided a really interesting presentation to one of the NTIA
working groups in (I believe) 2021, and I got quite excited after seeing it.
The idea behind GitBOM was really intriguing, although it was clearly focused
almost entirely on open source software. I’m sure there was some way that proprietary
software could be handled by GitBOM, but it’s hard to call an identifier
“universal” if it treats the software that runs probably 99% of organizations
worldwide as kind of a second-class citizen. And if OmniBOR/GitBOM is
restricted to just open source software, it immediately runs into the problem
that one identifier, purl, has already conquered the open source world.
2.
CPE, the identifier on which the National Vulnerability
Database (NVD) is based – as well as a small number of other databases that are
based on the NVD but purport to make up for some of the NVD’s problems. To be
fair, the CISA team doesn’t give CPE a whole-hearted endorsement. This is a
good thing, since, far from being a solution to the naming problem, CPE is
probably the biggest contributor to it.
3.
purl, which is now undoubtedly the most widely used software
identifier worldwide and is very unlikely to be dislodged from that post. This
is evidenced by the fact that I don’t know of any vulnerability database, other
than the NVD and its derivatives, that is not based on purl. On the
other hand, the vulnerability databases that use purl are all 100% focused on
open source software. Since probably at least 90% of software products
worldwide are open source (including at least 90% of components in proprietary
software), this shows that purl is already close to being a universal
identifier. But there’s no denying that it doesn’t now address proprietary
software[ii] and that it doesn’t even
fit all open source software perfectly.
However, CISA’s paper doesn’t even
ask the real question, which is whether a) it would ever be possible to have a
truly universal software identifier (which I doubt, at least in most of our
lifetimes), and b) whether it’s even necessary to have a universal identifier to
address the naming problem.
Of course, b) is the really
interesting question. Previously, I used to think it would be impossible to
have multiple software identifiers in a single database. Thus, the NVD and its
imitators are based on CPE, while the databases that focus on open source are
based on purl. Yea verily, never the twain shall meet – or at least that’s what
I used to think.
However, I now realize that a
single vulnerability database can easily utilize multiple software identifiers.
For example, the OWASP SBOM Forum’s 2022 paper on the naming problem advocated
incorporating purl identifiers into the NVD, but it also acknowledged that CPE identifiers
will need to remain in the database for years, since there is such a wealth of
information embedded with the CPEs now (more specifically, embedded in the CVE
reports that call out those CPEs). While it’s nice to fantasize about transferring
information now in CPEs to whatever will replace CPEs later on, the resources necessary
to do this on the large scale that would be required are simply not available. For
the foreseeable future, both CPE and purl will remain in active use, often in
the same database, each including whatever data is now included with them.
There’s another identifier that is
also available in different flavors: vulnerability identifiers (e.g., CVE,
Google OSV, GitHub security advisories or GHSA, etc.). As with software product
identifiers, the different vulnerability identifiers will need to continue to be
available, often in the same database.
Why do I say that both software
and vulnerability advisories need to continue to be used as they are today? After
all, the CISA paper repeatedly discusses the need to “harmonize” the different software
identifiers, meaning (of course) that they should be consolidated into one of
the three identifier options listed at the end of the paper.
I used to agree with this idea,
since it seemed out of the question that it would be advantageous to combine multiple
identifiers for the “same” thing (e.g., software products or vulnerabilities) in
one database. Why not choose one uber-identifier and map each name in
the other identifiers to that one?
This would make sense if the items
identified by the different identifiers were truly interchangeable. For example,
it would make no sense to have different identifiers for different types of animals;
they can all have a name that fits into a single taxonomy, which was initially
developed by Linnaeus.
However, there are reasons why the
different software identifiers can’t be easily consolidated into one. For example,
take the case of CPE and purl. They’re both software identifiers, but what do
they identify? CPE is a centrally administered identifier. They are created by members
of the NIST NVD team, when a CVE report is submitted that refers to a software
product for which the organization submitting the report (usually a proprietary
software supplier that is also a CVE Numbering Authority or CNA) does not know
of an existing CPE name. CPEs were designed with proprietary software suppliers
in mind, since most CVE reports are submitted by such a supplier.
On the other hand, purl isn’t
centrally administered at all, and it would make no sense to change it to be
centrally administered (as the CISA paper suggests should happen). The whole
point of purl is that the person who wants to learn about vulnerabilities in an
open source software (OSS) product that they utilize (or an OSS component of a product
they utilize) just needs to know three things about the product: the package
manager (or similar ecosystem) from which they downloaded the product, the name
of the product in that package manager, and the version that they downloaded
(other information may be included, but is optional).
If they have these three pieces of
information, the user can create a purl that should always match the purl for
that same product (from the same package manager) in a vulnerability database. The
fact that no centralized name database is required makes purl the ideal
identifier in the open source world, which changes very rapidly and doesn’t
rely on paid maintainers. Obviously, if a centralized database were required, someone
would have to come up with a huge chunk of change to finance that effort.
Since purl requires knowledge of
the package manager from which the software was downloaded, and since one open
source project can be available in multiple package managers with slightly
different code, this means that the single project can have multiple purls. And
if the project consists of multiple modules (e.g., a library), each of those
modules can have its own purl as well. Yet there can be only one CPE for the project
(product). This means there’s no good way to map a single CPE to a single purl,
unless some arbitrary decision is made about which purl maps to the CPE[iii].
Let’s go back to the question I
would have asked, “How can we make it more likely that users trying to learn
about vulnerabilities in the software they use will be successful?” The answer
to this question now seems simple to me: We need to develop a vulnerability
database that can accept queries made with any major software identifier (e.g.,
CPE or purl) or any major vulnerability identifier (e.g., CVE or OSV), and
return whatever results the user would receive today if they were to make a
query to a database that was designed around that identifier (for example, a user
that queries the database for CVEs that correspond to a particular CPE name
would receive the same response they would have received if they had queried
the NVD using that same CPE name).
In fact, the new central database
might not, strictly speaking, be a database at all but more of a “switchboard”
that would relay each query to an appropriate “client” database (or even multiple client databases). It would then
return to the user whatever response it received from the other databases (with an
AI-based front-end module that would determine how best to reformulate and re-route
each query). While this approach would probably not initially yield any more information
than the user would have received had they queried the client database
individually, it would at least centralize (and perhaps standardize)
vulnerability queries. As time went on and additional funding became available,
more efforts to harmonize and clean up the data (including the CVE reports in CVE.org)
could be made.
In past months, I’ve advocated the
idea of a Global
Vulnerability Database, meaning one that’s sourced and supported globally.
However, I’m now expanding my understanding of “global” to include the ability
to accept queries for multiple software and vulnerability identifiers. Also, I’m
also giving up my idea that the GVD could be built on top of an existing
database like the NVD; it will have to be built from scratch, but it can well incorporate
data and features from the existing vulnerability databases – and, of course,
the existing databases would continue to do what they do now, since they would
now, at least for many queries, become clients of the GVD.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.
[i] The OWASP SBOM Forum – at the time just the SBOM Forum – produced a paper in September 2022 that unabashedly aimed directly at the naming problem. We obviously weren’t following the lesson I’d learned in college, because we called the document “A Proposal to operationalize component identification for vulnerability management”. This paper was a direct assault on the naming problem, or at least the most prominent manifestation of this problem: CPE (Common Platform Enumeration) names found in the National Vulnerability Database (NVD). Not surprisingly, the paper didn’t lead to the CPE problem being solved, but it has proven to be very useful in discussions with various groups like the NVD team at NIST and the team at ENISA that is building a vulnerability database from scratch - in compliance with Section 12 of the EU NIS 2 cybersecurity regulation, which came into effect in 2022.
[ii] The SBOM Forum’s paper includes a short description, on pages 12 and 13, of our idea for how to identify proprietary software using purl; there are certainly many other ways to do that. But it’s also true that purl identifiers for proprietary (or “closed source”) software will never be as robust as those for open source.
[iii] If the person that is mapping CPEs to purls
knows from which package manager the software on which a CPE is based was
downloaded, they could in theory map the CPE to the purl. But having that
knowledge will always be the exception, never the rule.
No comments:
Post a Comment