Friday, November 11, 2022

Who will tame the CPE beast?

I confess that I and almost every other person who writes or speaks about software bills of materials has been misleading you for years. We have done this when we’ve spoken about SBOM as something that is well defined. The two main formats are well defined, but almost all the best practices regarding production, distribution and use of SBOMs are still far from being defined, or at least agreed upon.

We have also been misleading you when we’ve implied or stated that SBOMs were being widely (or even narrowly) used for the purpose that most people talk about (including in Executive Order 14028): to enable software end users to learn about most important exploitable component vulnerabilities in the software they utilize, so they can coordinate with the suppliers to patch or otherwise mitigate them.

There are many issues standing in the way of widespread SBOM use, but there are a few that are show-stoppers, meaning I don’t see any way SBOMs will be used widely for this purpose until those problems are addressed. One of those is the naming problem, which can be summarized as, “When an SBOM is generated by an automatic process, only a small percentage of component names will be found through a search of the National Vulnerability Database (NVD).” In other words, in very few cases will a user who looks for a component name in the NVD find the component – meaning the user (or even a software tool acting on their behalf) will only be able to learn about a small percentage of component vulnerabilities through an NVD search, despite having an SBOM listing all the components.

How small a percentage is this? The Director of Product Security for a very large software supplier, who has participated for years in the SBOM and VEX discussions under the NTIA and now CISA, as well as the informal group I started called the SBOM Forum, had previously used the figure of 20%. Of course, that’s bad, since it means the user won’t be able to learn (through searching the NVD) about vulnerabilities applicable to 80% of the components in an SBOM they receive.

However, that same person was challenged regarding this figure by another very large supplier (of both software and intelligent devices) at one of our SBOM Forum meetings. Did the other supplier question how the 20% figure could be so low? No, they asked why it was so high, because their experience was that it was below 5%. The original supplier admitted that 20% was a very conservative number and agreed that 5% is closer to their actual experience.

So, the good news is that it isn’t true that 80% of component names can’t be found in the NVD; the bad news is the figure is really more like 95%. This means you’ll only be able to find about 5% of component names from an SBOM in the NVD.

Needless to say, SBOMs wouldn’t be used at all if users could only find 5% of component vulnerabilities. Yet, even though end users are barely using SBOMs at all today, suppliers are using them very heavily. How can they do that? It’s because every supplier who needs to use SBOMs – or in many cases, the consultants who help suppliers use SBOMs – has some method, based on AI, fuzzy logic, throwing bodies at the problem, prayer, etc., to get around this problem. The good news is that every supplier or consultant that I’ve talked to about this problem says they’ve been able to get the matches up to an acceptable level, although it’s nowhere near 100%.

So while this is an acceptable workaround for those suppliers willing to invest the time, money or both that is required, the fact is that it makes it impossible to “operationalize” SBOM production; every SBOM produced will need its own care and feeding, rather than being a completely automated process. SBOMs will never be produced in the volume required, if every one of them needs to be massaged in this way.

This is why the SBOM Forum developed a workable “solution” for maybe 70-80% of the naming problem in the NVD, which we described in this paper that we released in September. It’s now being evaluated by CISA and MITRE, and I’m reasonably optimistic it will be implemented at least in part. In this post and (probably) my next one, I’ll lay out the basics of the argument that we made in the paper, since I’ll admit that it’s densely written and a lot of people may have given up on reading it.

The NVD’s problems lie mostly with CPE names, the only identifiers supported by the NVD. Here are the six main problems:

  1. Vulnerabilities are identified in the NVD with a CVE number, e.g. “CVE-2022-12345”. A CPE is typically not created for a software product until a CVE is determined to be applicable to the product. However, many software suppliers have never identified a CVE that applies to their products, so they have never created a CPE for them. This is almost certainly not because the products have never had vulnerabilities, but because the suppliers, for whatever reason, have not submitted any vulnerability reports for those products for inclusion in the National Vulnerability Database.

The worst part of this problem is that the result of an NVD search will be the same in both cases - the case where a vulnerability has never been identified in a product and the case where the supplier has never felt inclined to report a vulnerability, even if their product is loaded with them. The search will yield “There are 0 matching records” in both cases. Someone conducting a search won’t know which case applies, so they may believe the product has no vulnerabilities, when the truth is very different.

  1. There is no error checking when a new CPE name is entered in the NVD. Therefore, if the CPE name that was originally created for the product does not properly follow the specification, a user who later searches for the same product and enters a properly-specified CPE will receive an error message. Unfortunately, it is, once again, the same error message that they would receive if the original CPE name were properly specified but there are no CVEs reported against it: “There are 0 matching records”.

In other words, when a user receives this message, they might interpret this to mean that there is a valid CPE for the product they’re seeking, but a vulnerability (CVE) has never been identified for that product - i.e. it has a clean bill of health. However, in reality it would mean the CPE name was created improperly. In fact, there might be a large number of CVEs attached to the off-spec CPE, but without knowing that name, the user will not be able to learn about those CVEs.

Another explanation for the “There are 0 matching records” error message is that the user had misspelled the CPE name in the search bar. Again, the user would have no way of knowing whether this was the reason for the message, or whether the message means the product has no reported vulnerabilities.

It is to avoid problems like this that most organizations that use the NVD employ advanced search techniques based on AI or fuzzy logic[1]. While that can greatly reduce the number of unsuccessful searches, having to resort to this makes it impossible to conduct truly automated searches. Considering that an average-sized organization might easily need to conduct tens of thousands of NVD searches per day and a service provider doing this on behalf of hundreds of customers would need to conduct some large multiple of that number, the magnitude of this problem should be apparent.

  1. When a product or supplier name has changed since a proprietary product was originally developed (usually because of a merger or acquisition), the CPE name for the product may change as well. Thus, a user of the original product may not be able to learn about new vulnerabilities identified in it, unless they know the name of the current supplier as well as the current name for the product. Instead, this user will also receive the “There are 0 matching records” message.
  2. A similar consideration holds true for supplier or product names that can be written in different ways, such as “Microsoft(™)” and “Microsoft(™) Inc.”, or “Microsoft(™) Word” and “Microsoft Office(™) Word”, etc. A user searching on one of the variants of a supplier or product name may learn about just the CVEs that are applicable to the variant they entered, rather than all of them.
  3. Sometimes, a single product will have many CPE names in the NVD because they have been entered by different people, each making a different mistake. For this reason, it will be hard to decide which name is correct. Even worse, there may be no “correct” name, since each of the names may have CVEs entered for it. This is the case with OpenSSL (e.g. “OpenSSL” vs “OpenSSL_Framework”) in the NVD now. Because there is no CPE name that contains all of the OpenSSL vulnerabilities, the user needs to find vulnerabilities associated with each variation of the product's name. But how could they ever be sure they had identified all the CPEs that have ever been entered for OpenSSL?
  4. Often, a vulnerability will appear in one module of a library. However, because CPE names are not assigned on the basis of an individual module, the user may not know which module is vulnerable, unless they read the full CVE report. Thus, if the vulnerable module is not installed in a software product they use but other modules of the library are installed (meaning the library itself is listed as a component in an SBOM), the user may unnecessarily patch the vulnerability or perform other unnecessary mitigations. In fact, it’s likely that at least some of the patching performed for the log4j vulnerabilities was unnecessary, for precisely this reason.

What is needed is to be able to name software and hardware components in a BOM with an identifier that, when entered in the NVD, will

  1. Almost always match to the correct product, if the product is listed.
  2. Almost never match to an incorrect product.
  3. Not require that the identifier already exist in the NVD. This is almost always required today, in order for the user to get a correct response. If the user searches on a CPE name that doesn’t exist in the NVD, the error message they receive, “There are 0 matching records”, is the same one they would receive if the CPE does exist, yet it has no reported vulnerabilities.
  4. Never yield a result that might be interpreted to mean the product was found but there are no applicable vulnerabilities, when in fact one of the following is the case:
    1. The wrong identifier was entered in the search bar; or
    2. An off-spec CPE was initially created for the product, so the product cannot be found by searching on a CPE that was created according to the spec; or
    3. The name and or/supplier of the product has changed due to a merger or acquisition. Thus, the CPE entered by a user of the original product won’t match the current CPE name.
  5. Identify the vulnerable module in a library rather than just the entire library, so that, if that module isn’t installed in a product but other modules are installed (meaning the product will appear to be vulnerable when in fact it isn’t[i]), users will not patch or perform other mitigations that are not necessary.
  6. When a supplier and/or product name has changed for a product, allow there to be separate identifiers - and thus separate locations to report CVEs - for the different supplier or product names; thus the different supplier/product names will be treated as separate products.

Who or what is the hero that will tame the CPE beast! Keep tuned to this blog for the exciting conclusion!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[1] Or, in the case of at least one third-party service provider, a “small army” of CPE-resolvers.


[i] A VEX from the supplier, saying that the vulnerability isn’t exploitable even though the component itself is present, would address this problem. However, an identifier that applied at the module level would prevent this problem from even occurring.

No comments:

Post a Comment