In September 2022, a group that I lead, the SBOM Forum – now
the OWASP SBOM Forum – published
a white
paper that described a number of serious problems with the machine-readable
CPE (common platform enumeration) software identifier on which the National
Vulnerability Database (NVD) is based. The reason why machine-readable software
identifiers are so important is that software products have many different
names in different contexts. In vulnerability databases, it’s essential that both
the person who reported the vulnerability and the person who looks it up in the
NVD or another vulnerability database have the same product in mind; if both
are using the same software identifier, there is normally no question about
this.
Our white paper pointed to multiple reasons (described on
pages 4-6) why searching the NVD using a CPE name often produces an erroneous
result or no result at all. We proposed a way to address these problems by
utilizing the purl
(Product URL) identifier; this is already widely used in vulnerability
databases for open source software. We weren’t proposing that CPE be replaced
with purl, but that they both be options for searches in the NVD and other vulnerability
databases that use CVE to identify vulnerabilities.
When we wrote the paper, we knew that:
1.
The NVD is based on CVE records. Each record
describes a vulnerability, as well as one or more software products that are
affected by the vulnerability. New vulnerabilities are reported by CVE
Numbering Authorities (CNAs) who work with the CVE Program; many of them are
software developers like Oracle, Microsoft and Red Hat. New CVE records are regularly
downloaded from the CVE.org database by the
NVD and other vulnerability databases that are based on CVE.
2.
Since early in the CVE Program, CPE has been the
only machine-readable software identifier used in CVE records. Even though CVE
records don’t originate in the NVD, the NVD has almost always created CPE names
for the products described in the text of each record and has added those names
to the record.
3.
Until 2024, the NVD usually added CPE names to
new CVE records soon (usually within a few days at most) after the new records
were downloaded to the NVD. Because of this, automated NVD searches for software
products were likely to identify most recently identified vulnerabilities that applied
to those products.
However, there was one important thing we didn’t know in
2022 when we wrote the white paper: In February 2024, the NVD experienced problems
that drastically reduced the number of CPE names they could add to new CVE
records. Despite attempts to fix the problems, they have not yet done so, with
the result that fewer than half of CVE records created since February 2024
contain a CPE name. This means that half of the vulnerabilities (CVEs) that
have been identified since that month do not usually show up in an automated
vulnerability search of the NVD.
When the NVD first started experiencing problems in February
2024, there was a lot of sympathy for them; their parent organization, NIST
(part of the US Department of Commerce), shook their piggy bank and found some funds
to pay for contractors to help them dig themselves out of the hole they’d
gotten into. As a result, on May 29, 2024, the NVD proudly announced that, “We
are confident that this additional support will allow us to return to the
processing rates we maintained prior to February 2024 within the next few
months.” They also announced that, working with CISA, they had started to
reduce the big backlog they had built since February, and expected to eliminate
it by the end of 2024.
Of course, both problems have since gotten
much worse, not better. By the end of 2024, it had become painfully obvious
that the NVD had completely dropped the ball on both promises they had made on
May 29; as a result, they were starting to lose support from other agencies
that had supported them so far. In December, CISA quietly announced
that they would no longer add CPE names to some new CVE records; they had been
doing this as part of the “Vulnrichment” program that they announced soon after
the NVD’s problems appeared in February.
In April, I reported
(remotely) from VulnCon that the CVE Program had decided to add purl as an
alternative software identifier to CPE in CVE records, most likely by the end
of 2025 or early 2026. Of course, it was a good thing that CNAs would be given
the opportunity to include purls in their CVE records, with the same status as
CPE names.
However, there’s been a further turn of the screw. Recently,
Andrew Lilley Brinker of the Quality Working Group (QWG) of the CVE Program (the
CVE working group that is responsible for originating changes to the CVE Record
Format, aka the CVE Schema) put up this
document in the CVE-schema repository in GitHub. The document described how
both purl and another software identifier, OmniBOR,
will be added as options to the CVE Schema in the next few months – without
waiting until the end of 2025 or early 2026, as originally planned.
The document didn’t stop there. It went on to make it clear
that NIST has decided it’s time to pull the plug on CPE. The section titled
“Problem statement” includes this paragraph:
For CPE, the key challenges are its reliance on a central
dictionary and the processes used to update that dictionary. NIST, the United
States' National Institute of Standards of Technology, stewards the CPE
specification and maintains the CPE Dictionary, which is the central registry
of defined terms which may be used to identify vendors, products, and more
within a CPE identifier. The reliance on this central dictionary means that the
issuance of new CPEs for vendors or products not present in the dictionary
requires NIST to update the dictionary to support them[i].
While anyone can request the creation of a CPE from NIST, NIST may at times
be slow to respond to these requests due to resource limitations. (my
emphasis)
I hereby nominate the last sentence of the above paragraph
as Understatement of the Year. It should be clear now that CPE is a dead end,
as evidenced by the fact that fewer than 50% of the CVE records added to the
NVD since February 2024 include a CPE name. Now, it seems possible that figure
may drop to around 0%.
Because the CVE Program and the NVD only support CPE as a
software identifier today, a CVE record that doesn’t include a CPE name for the
vulnerable product is invisible to automated searches in the NVD, even to
normal command line searches. Because of this, NVD searches for vulnerabilities
in a software product today on average yield fewer than half the CVEs that have
been reported for that product.
Just as importantly, the problem is getting worse, not better. This section continues:
Mechanical applicability determinations, especially searches
of CVE data based on software identifiers, are compromised if the searcher
cannot rely on the identifiers to be available when and where they are needed.
Moreover, some vulnerability conditions cannot be expressed
adequately using CPE. For example, sometimes a vulnerability is only present
when certain modules or files are present, but CPEs do not capture software at
the module or file level. To put it another way, CPE is a relatively
coarse-grained software identifier, identifying software “products,”
potentially constrained with version information, but not components or
materials within those software products….
CPEs are also not used universally across different software
ecosystems. Open source software projects are generally less well represented
in the NIST-maintained CPE dictionary than closed source software. This means
sole reliance on CPE as the mechanism for identifying software within the CVE
record format leaves CVE less able to identify open source software affected by
a vulnerability.
Near the end of the same document, in the section titled
"Related Issues or Proposals”, the author states:
For over a decade, NIST has tried to manage CPEs to keep pace
with the needs of CVE. However, the challenge and expense ha(ve) proven to be
significant and NIST has expressed a desire to end its role as the provider of
CPEs for CVEs. Without a massive investment, it is unlikely that any party
could produce CPEs quickly enough to meet CVE’s needs. Moreover, even a
complete CPE library would not address CPE’s inability to capture
vulnerabilities that depend on files or modules, since those are beyond CPE’s
ability to capture.
Thus, it’s possible that NIST will withdraw funding for the
NVD to create CPE names altogether. Instead of 50% of CVE records not having
CPE names, we might be faced with 100% not having CPE names. Of course, once
purls are being regularly added to CVE records to identify vulnerable open
source software products, that won’t create a big problem, since a user could use
the purl for an open source product to find vulnerabilities in any vulnerability
database that supports both CVE and purl.
Currently, Sonatype’s OSS Index database – one of the
largest open source vulnerability databases – is the best example of such a
database. The Dependency Track open
source software composition analysis (SCA) tool is used over 25 million times every
day to look up an open source component from an SBOM in OSS Index.
Purls for commercial software
However, even though there is growing dissatisfaction with
CPE, purl can’t replace it anytime soon. Currently, purl primarily identifies
open source software products distributed through package managers. Since few
commercial software products are distributed that way, this means purl can’t
identify most commercial software products.[ii]
This leaves CPE as the only major software identifier that
can identify commercial products today. Yet the fact that CPE names are missing
from most recent CVE records means that CPE is an increasingly unreliable
identifier. In other words, commercial software suppliers like Oracle,
Microsoft, HPE, Cisco, Schneider Electric and others don’t have a reliable
identifier with which to report vulnerabilities in their products. Of course,
this is a distressing situation.
Purl follows the “razor and blades” model. The base purl specification is
simple and changes very slowly, but every use of purl (e.g., every package
manager that serves as a purl namespace) requires its own purl
type; there are about 1500 of those (although many of them are used very
little. Expanding purl to address commercial software will just require developing
and adding a new type; it won’t require any changes to the base specification).
Steve Springett, one of the original developers of purl
(working with Philippe Ombredanne, who came up with the original idea and
continues to lead the purl project) and a current maintainer (as well as the
leader of the OWASP CycloneDX and Dependency Track projects), recently
developed a Type Definition for a new purl type called
SCID; that stands for Software Component Identification. I will put up a
separate blog post on this development, but I want to point out that SCID defines
a set of metadata fields that a supplier will publish in a tag that can be
distributed with the software binaries, made available at a well-known location
on the supplier's website, emailed to customers, made available through the Transparency
Exchange API (now in Beta1 phase), etc.
SCID not only expands purl to address commercial software,
but it also expands it to cover non-packaged open source software. Note that
non-packaged software that is authoritatively available through a repository
with a supported purl type like GitHub is already addressed by purl.
How will this get rolled out?
I believe the CVE Quality Working Group has developed a
couple of pull requests to implement purl in the CVE Record Format (CVE Schema).
They will need to be finalized and submitted to the CVE Board for their approval.
At that point, they will be merged with the Schema. However, there are further steps
that need to be taken before purls-in-CVE-records can be considered a success,
including the following:
- There needs to be an
end-to-end proof of concept. It will start with CNAs creating test CVE
records that include purls (these could be for newly identified
vulnerabilities or for fake ones).
- The new records will be
submitted to CVE.org. and will appear in searches using purls.
- The records will be
downloaded from CVE.org by vulnerability databases that can utilize CVE
records that include purls (of course, ultimately every database that
supports CVE will need to support both CPE and purl. This may
include the NVD, but it also may include OSS Index, VulnDB, VulnCheck,
VulDB, and others).
- End users and service
providers will test searches in the vulnerability databases. They will create
purls for vulnerable products that have been included in the test CVE
records. If a user searches for a product/version using a purl, the purl
they search with should always match the purl created by the CNA when
they created the record. Any mismatches need to be investigated.
- There needs to be training
for the groups involved in the vulnerability management process: software
suppliers, CNAs, vulnerability database operators, end users, service
providers, etc. It will include general training in purl and CVE, as well
as training in specific topics like how a CNA can create a new CVE record
containing a purl. This training will mostly be in the form of webinars
and YouTube videos.
- Use of the SCID format
needs to be tested in a proof of concept (or tabletop exercise), although
this could be combined with the PoC in step 1 above.
Steps 1 and 2 will implement existing purl capabilities
(i.e., addressing open source software made available in package managers), but
step 3 will involve extending those capabilities to commercial software. Thus,
steps 1 and 2 may need to be executed before step 3, although they could all be
executed together. Step 1 could start before any changes are made to the CVE
Schema.
It’s safe to say that, until at least steps 1 and 2 are
executed, the purl rollout to the CVE Record Format is unlikely to be considered
a success. The OWASP SBOM Forum is ready to take the lead on all three of these
steps when we can secure at least some of the funding required (which will not
be huge). If your organization would like to help with funding, you can do that
through a donation to the OWASP Foundation that is “restricted” to our project.
OWASP is a 501(c)(3) nonprofit organization. Please email me if you would like
to discuss this.
My blog is more popular than
ever, but I need more than popularity to keep it going. I’ve often been told
that I should either accept advertising or put up a paywall and charge a
subscription fee, or both. However, I really don’t want to do either of these
things. It would be great if everyone who appreciates my posts could donate a $20-$25 (or more) “subscription fee” once a year. Will
you do that today?
If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[i]
The fact that CPE relies on an external dictionary brings up an important discussion
in the SBOM Forum’s 2022 white paper: the difference between “extrinsic”
identifiers like CPE, which depend on an external dictionary, and intrinsic
identifiers like purl, which don’t require an external dictionary. Like all
intrinsic identifiers (chemical formulas are another example used in the paper),
the user can construct them using information they either have on hand or can
easily look up. In the case of a purl for an open source package in the Maven
Central package manager, the user can create the purl if they just know the
name and version number of the package, as well as the fact that it was
downloaded from Maven Central.
[ii] Because
online software stores like Google Play and the Apple Store function a lot like
package managers – that is, they make software available for download in a
fixed location, and control the namespace of the products being offered – Steve
Springett and Tony Turner of the OWASP SBOM Forum have both suggested that
software stores could have purl types that are closely analogous to types for
package managers like Maven Central and NPM.
Of course, a large percentage of commercial software
products (and certainly the more strategic ones) are not available in software
stores. However, given the huge numbers that are available (Google Play offers
over 3 million products for sale and download), enabling developers of mobile
apps to report vulnerabilities in their products using purl identifiers could
result in a huge improvement in mobile app security.
No comments:
Post a Comment