I confess that I and almost every other person who writes or speaks about software bills of materials has been misleading you for years. We have done this when we’ve spoken about SBOM as something that is well defined. The two main formats are well defined, but almost all the best practices regarding production, distribution and use of SBOMs are still far from being defined, or at least agreed upon.
We have also been misleading you
when we’ve implied or stated that SBOMs were being widely (or even narrowly)
used for the purpose that most people talk about (including in Executive Order
14028): to enable software end users to learn about most important exploitable
component vulnerabilities in the software they utilize, so they can coordinate
with the suppliers to patch or otherwise mitigate them.
There are many issues standing in
the way of widespread SBOM use, but there are a few that are show-stoppers,
meaning I don’t see any way SBOMs will be used widely for this purpose until those
problems are addressed. One of those is the naming
problem, which can be summarized as, “When an SBOM is generated by an
automatic process, only a small percentage of component names will be found through
a search of the National Vulnerability Database (NVD).” In other words, in very
few cases will a user who looks for a component name in the NVD find the component
– meaning the user (or even a software tool acting on their behalf) will only
be able to learn about a small percentage of component vulnerabilities through
an NVD search, despite having an SBOM listing all the components.
How small a percentage is this? The
Director of Product Security for a very large software supplier, who has
participated for years in the SBOM and VEX discussions under the NTIA and now
CISA, as well as the informal group I started called the SBOM Forum, had previously
used the figure of 20%. Of course, that’s bad, since it means the user won’t be
able to learn (through searching the NVD) about vulnerabilities applicable to
80% of the components in an SBOM they receive.
However, that same person was challenged
regarding this figure by another very large supplier (of both software and intelligent
devices) at one of our SBOM Forum meetings. Did the other supplier question how
the 20% figure could be so low? No, they asked why it was so high,
because their experience was that it was below 5%. The original supplier
admitted that 20% was a very conservative number and agreed that 5% is closer
to their actual experience.
So, the good news is that it isn’t
true that 80% of component names can’t be found in the NVD; the bad news is the
figure is really more like 95%. This means you’ll only be able to find about 5%
of component names from an SBOM in the NVD.
Needless to say, SBOMs wouldn’t be
used at all if users could only find 5% of component vulnerabilities. Yet, even
though end users are barely using SBOMs at all today, suppliers are using them very
heavily. How can they do that? It’s because every supplier who needs to use
SBOMs – or in many cases, the consultants who help suppliers use SBOMs – has some
method, based on AI, fuzzy logic, throwing bodies at the problem, prayer, etc.,
to get around this problem. The good news is that every supplier or consultant that
I’ve talked to about this problem says they’ve been able to get the matches up
to an acceptable level, although it’s nowhere near 100%.
So while this is an acceptable
workaround for those suppliers willing to invest the time, money or both that is
required, the fact is that it makes it impossible to “operationalize” SBOM
production; every SBOM produced will need its own care and feeding, rather than
being a completely automated process. SBOMs will never be produced in the
volume required, if every one of them needs to be massaged in this way.
This is why the SBOM Forum developed
a workable “solution” for maybe 70-80% of the naming problem in the NVD, which
we described in this paper
that we released in September. It’s now being evaluated by CISA and MITRE, and
I’m reasonably optimistic it will be implemented at least in part. In this post
and (probably) my next one, I’ll lay out the basics of the argument that we
made in the paper, since I’ll admit that it’s densely written and a lot of people
may have given up on reading it.
The NVD’s problems lie mostly with
CPE names, the only identifiers supported by the NVD. Here are the six main
problems:
- Vulnerabilities are identified in the NVD
with a CVE number, e.g. “CVE-2022-12345”. A CPE is typically not created
for a software product until a CVE is determined to be applicable to the
product. However, many software suppliers have never identified a CVE that
applies to their products, so they have never created a CPE for them. This
is almost certainly not because the products have never had
vulnerabilities, but because the suppliers, for whatever reason, have not
submitted any vulnerability reports for those products for inclusion in
the National Vulnerability Database.
The worst part of this problem is that the result of an
NVD search will be the same in both cases - the case where a vulnerability has
never been identified in a product and the case where the supplier has never
felt inclined to report a vulnerability, even if their product is loaded
with them. The search will yield “There are 0 matching records” in both cases.
Someone conducting a search won’t know which case applies, so they may believe
the product has no vulnerabilities, when the truth is very different.
- There is no error checking when a new CPE
name is entered in the NVD. Therefore, if the CPE name that was originally
created for the product does not properly follow the specification, a user
who later searches for the same product and enters a properly-specified
CPE will receive an error message. Unfortunately, it is, once again, the
same error message that they would receive if the original CPE name were
properly specified but there are no CVEs reported against it: “There are 0
matching records”.
In other words, when a user receives this message, they
might interpret this to mean that there is
a valid CPE for the product they’re seeking, but a vulnerability (CVE) has
never been identified for that product - i.e. it has a clean bill of health.
However, in reality it would mean the CPE name was created improperly. In fact,
there might be a large number of CVEs attached to the off-spec CPE, but without
knowing that name, the user will not be able to learn about those CVEs.
Another explanation for the “There are 0 matching
records” error message is that the user had misspelled the CPE name in the
search bar. Again, the user would have no way of knowing whether this was the
reason for the message, or whether the message means the product has no
reported vulnerabilities.
It is to avoid problems like this that most organizations
that use the NVD employ advanced search techniques based on AI or fuzzy logic[1].
While that can greatly reduce the number of unsuccessful searches, having to
resort to this makes it impossible to conduct truly automated searches.
Considering that an average-sized organization might easily need to conduct
tens of thousands of NVD searches per day and a service provider doing this on
behalf of hundreds of customers would need to conduct some large multiple of
that number, the magnitude of this problem should be apparent.
- When a product or supplier name has changed
since a proprietary product was originally developed (usually because of a
merger or acquisition), the CPE name for the product may change as well.
Thus, a user of the original product may not be able to learn about new
vulnerabilities identified in it, unless they know the name of the current
supplier as well as the current name for the product. Instead, this user
will also receive the “There are 0 matching records” message.
- A similar consideration holds true for
supplier or product names that can be written in different ways, such as
“Microsoft(™)” and “Microsoft(™) Inc.”, or “Microsoft(™) Word” and
“Microsoft Office(™) Word”, etc. A user searching on one of the variants
of a supplier or product name may learn about just the CVEs that are
applicable to the variant they entered, rather than all of them.
- Sometimes, a single product will have many
CPE names in the NVD because they have been entered by different people,
each making a different mistake. For this reason, it will be hard to
decide which name is correct. Even worse, there may be no “correct” name,
since each of the names may have CVEs entered for it. This is the case
with OpenSSL (e.g. “OpenSSL” vs “OpenSSL_Framework”) in the NVD now.
Because there is no CPE name that contains all of the OpenSSL
vulnerabilities, the user needs to find vulnerabilities associated with
each variation of the product's name. But how could they ever be sure they
had identified all the CPEs that have ever been entered for OpenSSL?
- Often, a vulnerability will appear in one
module of a library. However, because CPE names are not assigned on the
basis of an individual module, the user may not know which module is
vulnerable, unless they read the full CVE report. Thus, if the vulnerable
module is not installed in a software product they use but other modules
of the library are installed (meaning the library itself is listed as a
component in an SBOM), the user may unnecessarily patch the vulnerability
or perform other unnecessary mitigations. In fact, it’s likely that at
least some of the patching performed for the log4j vulnerabilities was
unnecessary, for precisely this reason.
What
is needed is to be able to name software and hardware components in a BOM with
an identifier that, when entered in the NVD, will
- Almost always match to the correct product,
if the product is listed.
- Almost never match to an incorrect product.
- Not require that the identifier already exist
in the NVD. This is almost always required today, in order for the user to
get a correct response. If the user searches on a CPE name that doesn’t
exist in the NVD, the error message they receive, “There are 0 matching
records”, is the same one they would receive if the CPE does exist, yet it
has no reported vulnerabilities.
- Never yield a result that might be
interpreted to mean the product was found but there are no applicable
vulnerabilities, when in fact one of the following is the case:
- The wrong identifier was entered in
the search bar; or
- An off-spec CPE was initially created
for the product, so the product cannot be found by searching on a CPE
that was created according to the spec; or
- The name and or/supplier of the
product has changed due to a merger or acquisition. Thus, the CPE entered
by a user of the original product won’t match the current CPE name.
- Identify the vulnerable module in a library
rather than just the entire library, so that, if that module isn’t
installed in a product but other modules are installed (meaning the
product will appear to be vulnerable when in fact it isn’t[i]),
users will not patch or perform other mitigations that are not necessary.
- When a supplier and/or product name has
changed for a product, allow there to be separate identifiers - and thus
separate locations to report CVEs - for the different supplier or product
names; thus the different supplier/product names will be treated as separate
products.
Who or what is the hero that will tame the CPE beast! Keep tuned to this blog for the exciting conclusion!
Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.
[1] Or, in the case of at least one third-party service provider, a “small army” of CPE-resolvers.
[i] A
VEX from the supplier, saying that the vulnerability isn’t exploitable even
though the component itself is present, would address this problem. However, an
identifier that applied at the module level would prevent this problem from
even occurring.
No comments:
Post a Comment