Tom Alrich's Blog: The naming problem

As I described in this recent post, I have learned quite a lot about software bills of materials in the last two months of meeting with the participants in the “multistakeholder process” for Software Transparency, promoted by the National Telecommunications and Information Administration (NTIA) of the Department of Commerce. I’ve especially learned a lot about the problems that will need to be solved before SBoMs can meet with wide acceptance, both by the software supplier community and by the software user community.

This group (consisting of participants from many industries and government organizations, although the healthcare industry is by far the best represented) doesn’t feel their job is simply to think wise thoughts and develop long guidelines that will gather cyber dust somewhere on the internet. Rather, they conduct proofs of concept focused on particular industries, in which the supplies and end user organizations work together - first to determine exactly what they can demonstrate in a PoC, then to make all of the decisions on formats, procedures, etc. that are required to enable that to happen, and finally to conduct the PoC and see what works or doesn’t work as hoped.

Yet even when the participants in a PoC are all in agreement about what they want to do and how they want to do it, they sometimes run into roadblocks that require workarounds, for the sake of keeping the PoC itself moving. They then resolve to address these roadblocks in future PoCs.

The healthcare industry has so far conducted three PoCs (the third is ongoing, although it’s actually called the second iteration of the second PoC), and it’s very clear that there’s one problem that towers over the others. A complete solution to the problem is impossible, but even a reasonable workaround will be quite difficult, with the result that the group conducting the third PoC recently decided to kick this issue back to their next PoC. Of course, since the healthcare group is breaking new ground in so many other areas, I certainly don’t blame them at all for not wanting to tackle this one now.

The issue has to do with names of software packages. That might strike you (as it struck me, until recently) as an easy problem to solve. The name of the software package is what the supplier calls it. Just look at their web site, and you’ll have your answer.

This would be a great way to address the problem if users weren’t doing anything in particular with the SboM, other than to satisfy their curiosity about what’s in the software they use. However, the main use case for SBoMs (and what has been the use case for the healthcare PoCs) is to identify vulnerabilities that apply to the third-party and open source components included in the software.

As I discussed in this recent post, the SBoM will provide you with a list of components in the software. However, it will not provide you a list of vulnerabilities in those components, since they can change so frequently. In order to discover those, somebody – whether the end user organization or a third party providing this as a service – needs to consult the most recent feed of the National Vulnerability Database (NVD), which is published by NIST and look for new vulnerabilities that apply to each of the components.

So far so good. If the SboM lists a component called “Bill’s Browser”, all I have to do is look for Bill’s Browser in the NVD, and if I find any vulnerabilities for that component, I write down their CVE numbers. Soon I’ll have a list of all the component vulnerabilities in my software, and I’ll go out to validate and then mitigate those vulnerabilities – which will often involve calling my supplier and asking if there is a patch available (although there’s a big problem with this step, too. That will require another post). What’s hard about that?

The problem is that it’s guaranteed that the software component that your supplier has called Bill’s Browser in their SboM isn’t called that in the NVD. The NVD entries are all in the Common Platform Enumeration (CPE) format. One typical CPE reads “cpe:2.3:a:altiris:report_pack_for_inventory_solution_for_windows:6.2.1047:*:*:*:*:*:*:*”. This is the CPE for the Report Pack for Inventory Solution for Windows, version 6.2.1047. Once you find the CPE name, you can find the CVEs that apply to that CPE.

Of course, if the Supplier who provided the SboM knows something close to the exact name of the component and the exact version number, you can use the NVD search function. That works fairly well (although it certainly doesn’t cover all possibilities for entries. For example, when I searched for “Report Pack for Inventory Solution for Windows, version 6.2.1047” it couldn’t find anything, whereas when I took out “version”, it did find it. And when I changed the 1047 to 1040 but left the rest of the search the same, it still came up with nothing; obviously if you include a version number, it has to match exactly what’s in the CPE name).

If I were to stop here, you would think I’m an idiot. It’s not apparent so far that there’s a real problem with finding the CPE name, as long as the supplier provides you something close to the name listed in the CPE, and they know the exact version of the component that’s installed. And indeed in the first healthcare PoC, the participants reported good luck in finding the CPE names of components, even though the suppliers weren’t providing the full CPE names in their SBoMs.

But the problem is that no organization that has more than a few software packages will be interested in looking up all of the component CPE names by manual searching in the NVD database, given that – as I said in one of the posts linked above – the average software product has 135 components.

If your organization just uses 100 software packages (and this would be a small organization, obviously), there might be 13,500 component names to search on whenever new SBoMs are made available (the supplier should in theory release a new SboM whenever there has been any change at all in the components or in the software itself. Even if a supplier of a component was bought out and the only thing that changed was the supplier name, a new SboM is needed).

Obviously, the only usable long-term solution will require automation of the process. Specifically, some system would need to ingest SBoMs from a supplier, match the component names with CPE names, then find any CVEs that apply to any of those components, using the daily NVD feed. But here’s the problem: If a component name doesn’t exactly match a CPE name, there will be a null result for that component. In these cases, someone will need to do a manual search on the database. And since probably no supplier uses CPEs to designate components in its software build process, this means that all 13,500 components in the above example will still have to be searched for manually. That’s probably more than I could do in a month. Moreover, I’d go crazy in the process (and who just asked “How would we know?” Wise guy).

Clearly, either someone will have to beat the suppliers with sticks to get them to provide full CPE numbers (and even that isn’t likely to succeed, since just getting them to provide SBoMs at all will be a challenge), or there will need to be some automated process in the middle that will read the component names on the SBoMs and develop proper CPE names, then search through the NVD feed for these CPE names - perhaps using AI or fuzzy logic - while recording any CVEs it finds associated with those CPEs. This process might be operated by the end user organization or it might be a service provided by a third party.

This much is probably a solvable problem, and I know the healthcare PoC group had at least some success with this approach. However, it was recently pointed out that “some success” might be no better than “utter failure”. That’s because the great majority of the results of an automated search come back null. Maybe that’s because there really are no vulnerabilities for those CPEs – in fact, for the great majority of them, that’s just about certain. But maybe it’s because of a small problem with the way a CPE was entered (perhaps the supplier got one of the digits wrong in the version number, or the component supplier has an “A” at the end of the version number, yet the NVD didn’t register that). How will you ever know, except of course to have someone go through this manually and try to resolve each problem?

It currently seems to me that there needs to be some third party that runs this process. They would have a tremendous incentive to keep refining their AI instructions, to the point that almost all of these glitches could be corrected without human intervention. So I think this problem can be “solved”, in the sense that the need for manual intervention would be eliminated for all but a small percentage of the components listed in any SboM.

But don’t ask me how to solve it. That’s above my pay grade.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

3 comments:

UnknownNovember 17, 2020 at 5:45 PM
Hi Tom

I just noticed your blog and I must say that you really nailed the seriousness of the problem. We have been working on ML tools to address the matching of names in the CPE with names in the ICS software and are making some good progress in the FACT platform.

But it is not without a lot of pain. You mentioned the issues with errors in the CPE and we 100% agree with you on that, especially with regards to version numbers. Even getting the name of the vendor matched correctly is a challenge. For example, we did an analysis of the software products shipped by a major ICS vendor and found 17 different spellings of the vendor name in the software headers - even trivial differences like LTD Ltd and LTD. all prevent normal matching algorithms from working. As one drills into the subcomponents we see much worse match rates using conventional techniques. And this matters because most CVEs involving 3rd-party components do not list all the affected OEM products.

Like you, we think that this can be solved and hope to have something available in early 2020.
Tom AlrichNovember 17, 2020 at 5:56 PM
Thanks, Unknown! Could you drop me an email at tom@tomalrich.com? I'd like to learn more about what you're doing. I won't publish anything you say unless you agree to that.
Tom AlrichNovember 18, 2020 at 7:37 AM
"Unknown" got back to me, and he's someone I know well: Eric Byres of Adolus (https://www.adolus.com/). Adolus is forging ahead on a very ambitious project to improve software cybersecurity. Since I haven't kept up with what they're doing for more than a year, I'm hoping to talk to Eric soon to bring myself up to speed: And I'll share what he says in a new post.

Sunday, November 1, 2020

The naming problem

3 comments: