I probably don’t need to tell you that vulnerability
management is important for any organization, public or private, that uses
software. If you’re not convinced of this, all you need to do is look at
devastating ransomware attacks like WannaCry, NotPetya and Ryuk. All of these
exploited known vulnerabilities for which patches were available.
I also probably don’t need to tell you it is impossible to
manage vulnerabilities that affect software you use, if you can’t learn about
them using frequent, fully automated searches – in which you enter an
identifier for a software product and version and immediately discover all
recently identified vulnerabilities that affect that product and version.
Yet, that is the situation today: The most widely used
vulnerability database in the world is the US National Vulnerability Database (NVD). However,
because of the NVD’s currently huge
backlog of “unenriched” CVE (vulnerability) records dating from February of
this year, any search for vulnerabilities that apply to a particular software
product and version will yield on average fewer than one third of the
vulnerabilities that have been identified this year for that product and
version. Even worse, the NVD provides no warning about this situation.
This is analogous to a doctor that stopped studying new
diseases eight months ago and can only diagnose diseases that were identified
before then – yet never warns his patients that they could possibly have
contracted a disease he hasn’t yet learned about. In both cases, the end
user/patient is more likely to be harmed due to not knowing about a
vulnerability/disease, than to benefit from knowing about one that they face.
Ignorance is not bliss.
However, the NVD’s biggest problem isn’t their current
backlog, but the fact that the CPE (“common platform enumeration”) software
identifier that is required for all vulnerability lookups in the NVD has many
problems - and there is no good solution for them. These problems cause
many searches to fail, without any explanation for the failure. Even worse, the
user will usually not even be informed that the search has failed.
In 2022, the OWASP SBOM Forum (which I co-lead) published a
white paper on the CPE
problem in the NVD. The central argument of that paper was that the purl
(product URL) software identifier is far superior to CPE, and that CVE.org (the agency of the Department of
Homeland Security that oversees the CVE
Program) and the NVD should move as quickly as possible toward supporting
both purl and CPE. After writing that paper, we submitted a “pull request” to
CVE.org to add purl support to CVE records. That request came into effect when
the CVE 5.1
specification was approved earlier this year.
However, the 5.1 specification alone didn’t solve the
problem. The CVE
Numbering Authorities that create CVE records (i.e., report new
vulnerabilities in software products, usually products developed by their own
organization. For example, Microsoft, Oracle, Red Hat, Schneider Electric and HPE
are all CNAs) need to start adding purls to those records, yet few if any have
done so thus far. One reason for this is that, even if the CNAs started doing
that, the purls would be “all dressed up with nowhere to go”, since neither the
NVD nor the CVE.org database currently allows a search using purl.
But there’s an even bigger problem: While purl has literally
conquered the world of open source software, it can only be used to identify a tiny
percentage of proprietary software products with vulnerabilities today. This
means a user of a proprietary software product cannot look that product up in
the NVD using purl; instead, they must use CPE. Purl can never be on an equal
footing with CPE until it can be used to identify proprietary software products,
not just open-source products.
The OWASP SBOM Forum has decided
this is an unacceptable situation, especially since purl eliminates most of the
problems that affect CPE. We are asking, “What will it take to give purl the
capability to identify proprietary (closed-source) software, as well as
open-source?”
Fortunately, two very smart
individuals are members of the Forum. One is Steve Springett, creator and
leader of two of OWASP’s major projects: Dependency-Track (which performs over
20 million automated vulnerability lookups every day - although few of these
use the NVD. In fact, D-T mainly uses Sonatype’s OSS Index,
an open source vulnerability database that is based on purl) and CycloneDX. The
other is Tony Turner, the cybersecurity expert and SANS instructor who co-leads
the SBOM Forum with me, along with Jeff Williams of Contrast Security.
Both Steve and Tony are quite
familiar with purl, since they are both part of the project team. In fact, in
the “early days” of purl (which were less than ten years ago, believe it or not),
Steve worked closely on the design with Philippe Ombredanne, the creator of
purl (who is also a member of the SBOM Forum). When the SBOM Forum developed
our paper in 2022, Steve described two ideas for how to expand purl to identify
proprietary software.
Before I explain Steve’s ideas
(one of which Tony came up with separately), I need to point out the most
important feature of purl: It isn’t based on a centralized “namespace” like CPE
is. CPE names are created by contractors who work for the NVD (which is part of
NIST). Unless one of those contractors creates the CPE name, it isn’t valid[i].
If a CNA or software user wants to
learn the CPE name for a software product, they must use a variety of methods
to find it – fuzzy logic, generative AI, prayer, etc. There is a centralized
“CPE database”, but it is simply a list of all the CPEs that have ever been
created, without any contextual information. As Bruce Lowenthal of Oracle has
pointed out, this would be like listing all the words in the Bible in
alphabetical order and calling that an English dictionary.
By contrast, purl creates a decentralized
namespace. Purl consists of a series of one-word types, which currently mostly refer to package managers for
open-source software (e.g., the “maven” type refers to the Maven Central
package manager). All you need to know about package managers now is that
they’re a single web location from which you can download software, if you know
the name of the product and its version string. Since a single product/version
pair can never be replicated within the package manager, each pair is unique. Therefore,
each package manager has a controlled namespace.
What’s more important is that the combination
of three pieces of information – package manager (type), product name within
the package manager, and version string - is guaranteed to be unique within the
entire purl namespace (i.e. across all purl types). What’s even more
important is that the user of the product doesn’t have to query a central database
to find out the purl for their product. The user can create the purl on their
own, using information they already have.
To create the unique purl, the
user just needs to know the type (package manager), and the name and version
string in that package manager. For example, the purl for version 1.11.1
of the Python package named “django” in the PyPI package manager is “pkg:pypi/django@1.11.1”.[ii]
Of course, even though the user can
always re-create the correct purl for the product, that will only help them
identify a vulnerability if the supplier reports vulnerabilities in that
product/version to CVE.org[iii] using the same purl;
that way, the purl the user enters in the vulnerability database will match the
purl on the CVE record. This is how CPE is supposed to work, but since it’s
impossible to know for certain what the NVD contractor actually created, there
can never be any certainty regarding CPE.
For example, if the contractor
used “Microsoft” as the vendor name, that CPE will be different than if they
used “Microsoft, Inc.” If a user, who is trying to learn about vulnerabilities
in a Microsoft product, creates a CPE according to the CPE specification, they will have
to guess which of these is the vendor name used by the contractor, since they will
be different CPEs.
What is worse is that if they
guess wrong and search on the wrong CPE, they will simply be informed that “There
are 0 matching records”. This is the same message they would receive if they
had guessed correctly, but there are no vulnerabilities listed in the NVD that
apply to that product/version (which might be interpreted to mean the product/version
has a “perfect record”). There is no way for the user to learn which is the
case.
With purl, as long as the user
knows the package manager they downloaded the product from and the product’s name
and version string in that package manager, they should always (barring a
mistake) be able to create the same purl that the supplier used when they
reported the vulnerability. This is why purl has literally conquered the open
source software world. In that world, it would be difficult even to say there
is a number two software identifier after purl.
Of course, the key to purl’s
success is the existence of package managers in the open source world; it would
be much more difficult to create a distributed namespace without them. That
raised the question in a few creative peoples’ minds: Is there an analogue to package
managers in the proprietary software world? At different times, both Steve and
Tony realized that the answer to this question is yes: it’s app stores.
Like package managers, app stores
(these include the Apple Store - which is in fact five stores - as well as
Google Play and the Microsoft Store, although there are many smaller stores as
well) do the following:
1.
Provide a single
location from which to download software;
2.
Control the product
namespace within the store, so that each product has a unique name; and
3.
Ensure that each
version string is unique for the product to which it applies. For example, the
product named Foo won’t have two versions that have the same version string,
say “4.11.6”.
In other words, app stores can probably
be treated in purl like package managers are treated today. Each app store will
have its own purl type, just like package managers do now. Perhaps the most
impressive aspect of adding app stores to the purl ecosystem is that, as soon
as a purl type is created for a new store, all the products in that store (for example,
Google Play currently contains about 3.5 million products) will instantly have
a purl. No NVD employee or contractor (or anyone else) needs to do anything to
enable this to happen.
What about proprietary products
that aren’t in app stores?
The great majority of proprietary software
products are not available in app stores, but from the website of either the developer
or a distributor. How can purl be expanded to include them?
In the SBOM Forum’s 2022 paper, we
provided a two-paragraph high level description of the purl solution we were
suggesting for proprietary software, based on an idea of Steve Springett’s:
1.
When a developer
releases a new software product or a new version of an existing software
product, they will create a short document (called a tag) that provides
important information on the product, especially the name, supplier and version
string.
2.
When a user downloads that
product from the developer’s website (presumably after paying for it), the user
will also receive the tag; they can use the information in the tag to create
the purl for the product (perhaps like the purl described above)[iv].
Since
the supplier created the tag in the first place, when they report a
vulnerability for the product to CVE.org, they should use a purl that includes
the information from the tag. Thus, the purl created by the user will match the
one created by the supplier, since they are both based on the same tag. When
the user searches a vulnerability database using that purl, they are sure to
learn about any vulnerabilities the supplier has reported for the product.
Rather
than create our own format for the product information tag, Steve suggested
that we use the existing SWID (“software identification”) format. SWID is a specification
(codified in the ISO/IEC 19770-2 standard in 2006) that was developed by NIST. It was originally intended to be
the replacement for CPE in the NVD and to be distributed with the binaries for
a software product. However, it never gained much traction for that purpose.
NIST has dropped the idea of replacing CPE with SWID tags in recent years.
Steve realized that, since SWID is
an existing standard and a lot of software products have SWID tags now (for
example, for about two years, Microsoft distributed SWID tags with all their
new products and product versions), it would be better to use that than to
create a new format; this was especially important, since the SWID format
includes all the information required to create a usable purl. Steve defined a
new purl
type called “SWID” and got it be added to the purl specification in 2022. He
also developed a tool
that creates a purl based on information in a SWID tag.[v]
However, our 2022 document didn’t
address two important questions:
1.
For legacy products,
if the supplier didn’t create a SWID tag originally, who should create one now?
Presumably, it will be the current supplier of the product, even if the product
has been sold to a different supplier in the meantime.
2.
How will the user of a
product, for which the supplier has created a SWID tag, locate and access the tag?
While the supplier could develop a mechanism through which a customer can
automatically locate and download the tag from their website, there will soon
be a much more universal method for discovering and accessing software supply
chain artifacts: the Transparency
Exchange API. This is being developed by the
CycloneDX project. It will be fully released by the end of 2025, when it will
also be approved as an ECMA standard.
How will all of this happen?
The OWASP SBOM Forum believes that,
once purl can represent proprietary software products (after the required new
types are implemented in the purl specification), the following set of steps[vi] will be set in motion:
1.
A “purl expansion
working group” – including members from many different types of organizations –
will meet regularly to work out required details for expansion of purl to
proprietary software products. The group will publish these details (most
likely as OWASP documents). The group will also:
a.
Recruit operators of app
stores to participate in the purl community, along with creating a new purl
type for each store and submitting the pull request to add that type to the
purl specification; and
b.
Conduct tabletop
exercises with software suppliers to test the formats and procedures required
to implement the purl SWID tag program. This will include testing the purl SWID
type definition. This definition was created more than two years ago, but it has
only been tested by a few software developers. It needs to be subjected to
broader “tabletop” testing.
2.
Private and
governmental security organizations (including CVE.org) conduct awareness and
training activities for the activities described in this paper, especially regarding
the development, distribution and use of SWID tags to create purls for
proprietary software products. These activities will target CNAs, software
suppliers, security tool vendors, vulnerability database operators and larger
end user organizations, including government agencies.
3.
Suppliers create SWID
tags for their products, starting with new products and product versions and
continuing with legacy products that do not yet have SWID tags.
4.
Suppliers make their
SWID tags available through one (or more) of three channels: a) directly to
customers, b) in a machine-accessible format on their website, and c) using the
Transparency Exchange API, when it is available.
5.
After being trained in
purl and the new purl types for proprietary software, CNAs start including
purls in CVE records. The purls are based on the suppliers’ SWID tags.
6.
Vulnerability
databases based on CVE records (perhaps including the NVD) advertise the fact
that users can now find vulnerabilities in proprietary software using purl. They
offer training materials (webinars, videos, website content and hard-copy
publications) for users.
7.
Users begin to see the
advantage of using purl. The primary advantage is that they can deploy fully
automated tools for vulnerability identification without having to intervene
regularly in the identification process, as is the case with CPE.
8.
As suppliers realize
their SWID tags are being accessed by their customers, they also see this is
giving them a small but tangible marketing advantage over competitors.
9.
Purl-based open source
vulnerability databases see increased traffic once they start accepting the new
purl types, as users realize they now have a “one-stop-shop” for identifying
vulnerabilities in both open source and proprietary software.
10.
Operators of CPE-based
vulnerability databases (especially the NVD) notice that not having to create
at least one CPE for every new CVE record saves their staff a lot of time. They
also notice that users of those databases are expressing more satisfaction with
their experience, since a much higher percentage of the purls they enter are
finding their match in the CVE records, than was the case when CPE was the only
software identifier available to them.
11.
As CNAs begin to
realize that users are taking purl seriously, they add more purls, and fewer
CPEs, to CVE records.
12.
The above set of steps
cycles continually, until growth of the overall vulnerability database “market”
results in continuous growth of both purl and CPE, with roughly constant
“market shares”.
The OWASP SBOM Forum is under no illusion
that the above set of steps will be accomplished very quickly, given the current
rudimentary state of awareness regarding purl and its advantages. On the other
hand, the fact that truly automated vulnerability management is currently
almost impossible in the NVD makes it even more important that we start
implementing a real solution to those problems, while still hoping that the NVD
will eliminate their huge backlog of unenriched CVE records in the coming one
or two years.
There is good reason to believe
that if we start now, within 3-4 years purl will be widely accepted and used to
identify vulnerable proprietary software products in most vulnerability
databases. We say this because this will be the second time that purl
has been quickly accepted. Here is the story of the first time:
Steve Springett[vii] states that in 2017 and
2018, purl had little traction in the open source world, because it was so new.
Steve’s Dependency Track and CycloneDX projects, along with Sonatype’s OSS
Index vulnerability database, were a few early adopters of purl in 2018. Yet, purl
was in wide use in the open source community by 2022. Steve points out that today,
purl has been adopted by “most SCA vendors, hundreds of open source and
proprietary tools, and multiple sources of vulnerability intelligence.” I would
add that purl is used today by literally every major vulnerability database worldwide,
other than the NVD and databases based on NVD data. Indeed, purl has “won the
war”, when it comes to identifiers for open source software.
Of course, the world of
proprietary software is quite different from the open source world, since the
participating organizations are sometimes true competitors; that is not often the
case with open source software. However, once new purl types are developed to allow
identification of proprietary software, it should not require a heavy lift for databases
now based on purl to accommodate those new types. This means that, soon after
the new purl types for proprietary software have been incorporated into the
purl specification, big purl-based vulnerability databases like OSV and OSS
Index, which today only support open source software, may quickly support vulnerabilities
in proprietary software products as well.
Looking ahead
The OWASP SBOM Forum has recently
published a white paper that discusses all the above topics in more detail. It
is available for download here. We are actively discussing these topics in our meetings
and welcome new participants. Our meetings are every other Tuesday at 11AM ET
and every other Friday at 1PM ET. To receive the invitations for these
meetings, email tom@tomalrich.com.
We currently expect this to be a
two-phase project:
1.
Planning and
Design. This will consist of just the
first of the above steps. We believe this phase will require no more than 4-5
months of bi-weekly meetings (plus online asynchronous work between meetings, including
soliciting participation by app stores and software suppliers and conducting the
tabletop exercise to test adequacy of the SWID purl type). This phase will require
a modest budget for coordination of those activities.
2.
Rollout. All steps listed above other than the first are included in
this phase. This phase can be summed up as “training and awareness”. While
training and awareness activities are not inherently difficult, they require large
numbers of people to be involved, both on the “trainer” and “trainee” sides. We
estimate that this phase will require five to ten times the amount of resources
required for the first phase.
We estimate that the first phase
will require approximately $50,000 to $100,000 in funding, although we are
willing to start work on this phase with less than that amount committed. Since
the resources required for the second phase will depend on the design developed
in the first phase, we will wait until at least a high-level design is available
during the first phase, before estimating the second phase and seeking funding.
We invite all interested parties,
including software developers, software security service and tool providers,
and end users of software of all types, both to participate and to donate to
this effort. Donations (both online and directly) over $1,000 can be made to
OWASP and “restricted” to the SBOM Forum[viii]. Any such donations are
very welcome (OWASP is a 501(c)(3) nonprofit organization, meaning many
donations will be tax-deductible. However, it is always important to confirm
this with tax counsel). To discuss a donation of any size, please email tom@tomalrich.com
and tony@locussecurity.com.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[i] Due
to the NVD’s current problems in creating CPEs, CISA has been designated an
“Alternate Data Provider”, who can create authoritative CPEs that have the same
status as those created by the NVD contractors. CISA’s “Vulnrichment” program
has created many CPEs since their designation, but these are just a fraction of
the number required to reduce the backlog.
[ii]
Every purl begins with the prefix “pkg”. This prefix is not needed today, but will
be in the future.
[iii]
Many open source vulnerabilities are not reported to CVE.org, but instead to a
vulnerability database like GitHub Security Advisories (GHSA). Many of these
databases share their vulnerabilities with the OSV
database (managed by Google), where they are displayed using the OpenSSF
Vulnerability Format. Most OpenSSF vulnerabilities can be mapped to the CVE
format.
[iv]
Of course, the user should not have to create the purl manually; the process
can be completely automated within a vulnerability management tool.
[v]
Steve’s tool requires the user to manually input data from the SWID tag, but
the code can of course be adopted for automated use by a vulnerability
management tool.
[vi]
These steps aren’t a “chain”, since they will ideally happen simultaneously, at
least after an initial “startup” period. In general, each step listed depends
on the previous step being accomplished.
[vii] In
an email on October 19.
[viii]
OWASP reserves ten percent of each “restricted” donation to fund administration.
That is, OWASP doesn’t simply pass the donation through to the project team –
in this case, the SBOM Forum. Instead, as the project team performs work or
incurs other expenses on the project, they submit invoices to OWASP, which determines
whether they are appropriate
before paying them.