As I described in this
recent post, I have learned quite a lot about software bills of materials in
the last two months of meeting with the participants in the “multistakeholder
process” for Software
Transparency, promoted by the National Telecommunications and Information
Administration (NTIA) of the Department of Commerce. I’ve especially learned a
lot about the problems that will need to be solved before SBoMs can meet with
wide acceptance, both by the software supplier community and by the software
user community.
This group (consisting of participants from many industries
and government organizations, although the healthcare industry is by far the
best represented) doesn’t feel their job is simply to think wise thoughts and develop
long guidelines that will gather cyber dust somewhere on the internet. Rather,
they conduct proofs of concept focused on particular industries, in which the
supplies and end user organizations work together - first to determine exactly
what they can demonstrate in a PoC, then to make all of the decisions on
formats, procedures, etc. that are required to enable that to happen, and
finally to conduct the PoC and see what works or doesn’t work as hoped.
Yet even when the participants in a PoC are all in agreement
about what they want to do and how they want to do it, they sometimes run into
roadblocks that require workarounds, for the sake of keeping the PoC itself
moving. They then resolve to address these roadblocks in future PoCs.
The healthcare industry has so far conducted three PoCs (the
third is ongoing, although it’s actually called the second iteration of the
second PoC), and it’s very clear that there’s one problem that towers over the
others. A complete solution to the problem is impossible, but even a reasonable
workaround will be quite difficult, with the result that the group conducting
the third PoC recently decided to kick this issue back to their next PoC. Of
course, since the healthcare group is breaking new ground in so many other
areas, I certainly don’t blame them at all for not wanting to tackle this one
now.
The issue has to do with names of software packages. That might
strike you (as it struck me, until recently) as an easy problem to solve. The
name of the software package is what the supplier calls it. Just look at their
web site, and you’ll have your answer.
This would be a great way to address the problem if users
weren’t doing anything in particular with the SboM, other than to satisfy their
curiosity about what’s in the software they use. However, the main use case for
SBoMs (and what has been the use case for the healthcare PoCs) is to identify vulnerabilities
that apply to the third-party and open source components included in the
software.
As I discussed in this
recent post, the SBoM will provide you with a list of components in the software.
However, it will not provide you a list of vulnerabilities in those components,
since they can change so frequently. In order to discover those, somebody –
whether the end user organization or a third party providing this as a service –
needs to consult the most recent feed of the National Vulnerability Database
(NVD), which is published by NIST and look for new vulnerabilities that apply
to each of the components.
So far so good. If the SboM lists a component called “Bill’s
Browser”, all I have to do is look for Bill’s Browser in the NVD, and if I find
any vulnerabilities for that component, I write down their CVE numbers. Soon I’ll
have a list of all the component vulnerabilities in my software, and I’ll go
out to validate and then mitigate those vulnerabilities – which will often involve
calling my supplier and asking if there is a patch available (although there’s
a big problem with this step, too. That will require another post). What’s hard
about that?
The problem is that it’s guaranteed that the software
component that your supplier has called Bill’s Browser in their SboM isn’t
called that in the NVD. The NVD entries are all in the Common Platform
Enumeration (CPE) format. One typical CPE reads “cpe:2.3:a:altiris:report_pack_for_inventory_solution_for_windows:6.2.1047:*:*:*:*:*:*:*”.
This is the CPE for the Report Pack for Inventory Solution for Windows, version
6.2.1047. Once you find the CPE name, you can find the CVEs that apply to that
CPE.
Of course, if the Supplier who provided the SboM knows
something close to the exact name of the component and the exact version number,
you can use the NVD search
function. That works fairly well (although it certainly doesn’t cover all
possibilities for entries. For example, when I searched for “Report Pack for Inventory
Solution for Windows, version 6.2.1047” it couldn’t find anything, whereas when
I took out “version”, it did find it. And when I changed the 1047 to 1040 but left
the rest of the search the same, it still came up with nothing; obviously if
you include a version number, it has to match exactly what’s in the CPE name).
If I were to stop here, you would think I’m an idiot. It’s
not apparent so far that there’s a real problem with finding the CPE name, as
long as the supplier provides you something close to the name listed in the
CPE, and they know the exact version of the component that’s installed. And indeed
in the first healthcare PoC, the participants reported good luck in finding the
CPE names of components, even though the suppliers weren’t providing the full
CPE names in their SBoMs.
But the problem is that no organization that has more than a
few software packages will be interested in looking up all of the component CPE
names by manual searching in the NVD database, given that – as I said in one of
the posts linked above – the average software product has 135 components.
If your organization just uses 100 software packages (and
this would be a small organization, obviously), there might be 13,500 component
names to search on whenever new SBoMs are made available (the supplier should
in theory release a new SboM whenever there has been any change at all in the components
or in the software itself. Even if a supplier of a component was bought out and
the only thing that changed was the supplier name, a new SboM is needed).
Obviously, the only usable long-term solution will require
automation of the process. Specifically, some system would need to ingest SBoMs
from a supplier, match the component names with CPE names, then find any CVEs that
apply to any of those components, using the daily NVD feed. But here’s the
problem: If a component name doesn’t exactly match a CPE name, there will be a
null result for that component. In these cases, someone will need to do a manual
search on the database. And since probably no supplier uses CPEs to designate
components in its software build process, this means that all 13,500 components
in the above example will still have to be searched for manually. That’s
probably more than I could do in a month. Moreover, I’d go crazy in the process
(and who just asked “How would we know?” Wise guy).
Clearly, either someone will have to beat the suppliers with
sticks to get them to provide full CPE numbers (and even that isn’t likely to
succeed, since just getting them to provide SBoMs at all will be a challenge),
or there will need to be some automated process in the middle that will read
the component names on the SBoMs and develop proper CPE names, then search
through the NVD feed for these CPE names - perhaps using AI or fuzzy logic -
while recording any CVEs it finds associated with those CPEs. This process
might be operated by the end user organization or it might be a service
provided by a third party.
This much is probably a solvable problem, and I know the healthcare
PoC group had at least some success with this approach. However, it was
recently pointed out that “some success” might be no better than “utter failure”.
That’s because the great majority of the results of an automated search come
back null. Maybe that’s because there really are no vulnerabilities for those
CPEs – in fact, for the great majority of them, that’s just about certain. But maybe
it’s because of a small problem with the way a CPE was entered (perhaps the
supplier got one of the digits wrong in the version number, or the component supplier
has an “A” at the end of the version number, yet the NVD didn’t register that).
How will you ever know, except of course to have someone go through this
manually and try to resolve each problem?
It currently seems to me that there needs to be some third
party that runs this process. They would have a tremendous incentive to keep
refining their AI instructions, to the point that almost all of these
glitches could be corrected without human intervention. So I think this problem
can be “solved”, in the sense that the need for manual intervention would be eliminated
for all but a small percentage of the components listed in any SboM.
But don’t ask me how to solve it. That’s above my pay grade.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would
love to hear from you. Please email me at tom@tomalrich.com.
Hi Tom
ReplyDeleteI just noticed your blog and I must say that you really nailed the seriousness of the problem. We have been working on ML tools to address the matching of names in the CPE with names in the ICS software and are making some good progress in the FACT platform.
But it is not without a lot of pain. You mentioned the issues with errors in the CPE and we 100% agree with you on that, especially with regards to version numbers. Even getting the name of the vendor matched correctly is a challenge. For example, we did an analysis of the software products shipped by a major ICS vendor and found 17 different spellings of the vendor name in the software headers - even trivial differences like LTD Ltd and LTD. all prevent normal matching algorithms from working. As one drills into the subcomponents we see much worse match rates using conventional techniques. And this matters because most CVEs involving 3rd-party components do not list all the affected OEM products.
Like you, we think that this can be solved and hope to have something available in early 2020.
Thanks, Unknown! Could you drop me an email at tom@tomalrich.com? I'd like to learn more about what you're doing. I won't publish anything you say unless you agree to that.
ReplyDelete"Unknown" got back to me, and he's someone I know well: Eric Byres of Adolus (https://www.adolus.com/). Adolus is forging ahead on a very ambitious project to improve software cybersecurity. Since I haven't kept up with what they're doing for more than a year, I'm hoping to talk to Eric soon to bring myself up to speed: And I'll share what he says in a new post.
ReplyDelete