About three months ago, a small
group of friends (old and new ones) who are involved in the “SBOM business” got
together to start talking about something that is on all of our minds: the
small but important set of serious problems that are currently holding back widespread
use of software bills of materials, and how these issues might be either
mitigated or (preferably) solved altogether.
We all agreed that the problem of
SBOM production by software suppliers is much smaller than the problem of use
by general organizations, since many of the suppliers are already making extensive
use of SBOMs today.
But the suppliers are by and large producing SBOMs for their own use, to help
them manage their own supply chain cybersecurity risks (i.e. the risks posed by
the many components they include in their products). They’re not distributing
them to their users, mostly because the users aren’t asking for them.
Why aren’t the users asking for
them? There are a number of reasons, especially the lack of low-cost or open
source tools and services that will help them identify risks found in
components included in the software they use.
However, there’s one problem that overshadows
all of the others. It’s one that was sometimes discussed by the NTIA Software
Component Transparency Initiative, but which was generally considered to be
insoluble – that is, insoluble without a persistent multiyear effort that would
involve a lot of…well, lobbying of various government agencies and
nonprofit organizations that would need to be involved in any permanent fix.
Since there were other more easily soluble problems that could be addressed without
such a huge effort, and since there are partial workarounds available that can
make the big problem at least tolerable, the general consensus was to let this
sleeping dog lie for the moment.
The problem has a number of names
because it has many facets, but most people refer to it as “the naming problem”.
Briefly, the problem is that there are a lot of problems with the “CPE”
(common platform enumeration) names that are required in order to look up
vulnerabilities (CVEs) that apply to a product in the National Vulnerability Database (NVD). Very
often it will be difficult (or impossible) to look up a product, because the
user is unable to find the CPE name under which it was entered.
More generally, it turns out that
simply knowing the title and supplier name of a software product that you own won’t
provide you with a universal name, that will be valid in all times and places.
A really striking example is a software supplier you might have heard of, named
Microsoft. Well, we may think of that supplier as “Microsoft”. However, there are
many different names used with different products that we would normally
consider “Microsoft products”.
If you search on “Microsoft” in the NVD, you’ll miss a lot of products that are listed under a different supplier name, like Microsoft Corporation, Microsoft Europe, etc. In fact, someone who works a lot on this issue told me they had asked people at Microsoft what company they worked for, and they received something like 27 different responses. Even more interesting, there is no central location where you can go to find all products produced by the various “Microsoft” entities.
I wrote about this problem in 2020, soon after I joined the NTIA initiative. However, at that time I agreed with the consensus that there were other fish to be fried before we turned to that one; so I didn’t write any more posts on it until today.
To be honest, when my friends and
I started meeting weekly, we didn’t really intend to tackle the naming problem
right away. But, in one of our first few meetings, Tom Pace of NetRise did a
presentation on a very serious problem
(along with two others, discussed here
and here)
that poses serious risks to intelligent device users; it turns out this is just
one of the many facets of the naming problem.
The week after Tom’s presentation,
the group decided (although I’m not sure “decided” is the right word; “stumbled
into” might be better) to start exploring the naming problem further. In the
four or so weeks since that meeting, to my surprise, we have made a lot of
progress. Here are some of the things we’ve decided on.
The problem can be divided into
short- and long-term aspects. While we explored the long-term problem for a
week or two and made some progress on at least the outlines of a possible
long-term solution, we decided that there are short-term steps we can take,
that could lead to improvement in perhaps six months to a year. We’re going to
focus on those steps for the time being.
While it’s tempting to think the
solution is to have a central registry of suppliers or products – and while
this would actually be a true solution to the problem – in practice, that would
require a huge amount of resources, as well as close to endless lobbying,
discussion, arm-twisting, etc. to put it into place and keep it operating. This
simply isn’t going to happen.
The long-term solution has to be a
distributed one, in which different groups are responsible for their own
"namespaces”. What are those groups? They’re the groups responsible for different
types of software. Since about 90% of software components are open source, most
of these groups are open source repositories and package managers, including Maven,
PyPi, NPM, etc. These all have their own naming schemes.
However, it turns out that open
source namespaces are the easiest to deal with. This is because each repository
maintains its own namespace – i.e. a list of all the open source products
stored in the repository. In theory, if you want to find the authoritative name
for an open source component and you know the repository it came from (which is
usually determined by the language it’s written in), you can just search there
and find the name.
The hard type of software is
proprietary software – i.e. software that you usually pay for, which comes from
Microsoft, Oracle, etc. One would generally think that the supplier could tell
you the authoritative name for one of their products, but in practice suppliers
will often have multiple names for the same product. For example, the product
might have been acquired by multiple suppliers over time; they each named it
using their own naming scheme; the versions that were produced by Supplier A
will all have completely different names than those produced by Supplier B,
etc.
And, since suppliers themselves
get acquired and divested, the product name (and of course the supplier name)
will often have changed multiple times for that reason. The current owner of a
product will likely have assigned it their own name; just knowing the name that
a previous owner assigned to it won’t necessarily be any help in learning the
current name – of either the product or the supplier.
If we’re not going to have a
universal namespace maintained by an army of registrars, how are we going to find
the name of a component we’re concerned about? While this isn’t necessarily the
solution, there is a naming scheme known as purl
(package URL), that in principle can cover all open source software. The core
fields in the purl are:
- type:
the package "type" or package "protocol" such as
maven, npm, nuget, gem, pypi, etc. Required.
- namespace:
some name prefix such as a Maven groupid, a Docker image owner, a GitHub
user or organization. Optional and type-specific.
- name:
the name of the package. Required.
- version:
the version of the package. Optional.
If you’re not sure of the “real”
name of an open source component, you can in theory use the information you
have about it – these four items – to construct its purl. You should then be
able to look it up in databases like Sonatype’s free OSS Index, the largest open source software
database. And while I’m sure there are various gotchas, you should almost
always be able to positively identify the software you’re looking for.
Other vulnerability databases also
use purl to identify software, but guess which database doesn’t use it? You’re
right, it’s the NVD! The NVD only uses CPE names, meaning there’s a lot of open
source software that isn’t referenced in the NVD at all. Plus, there’s more
that isn’t referenced correctly, because of various problems constructing CPE
names.
So guess what the next item is on
my group’s agenda? It’s getting the NVD to start using purl, along with CPEs,
to identify the products in the database. Technically, this isn’t very hard to
implement, but politically it’s challenging, because of the various organizations
that have to be involved in different ways. We’re now putting together our
roadmap for doing this, and then we’ll start. I won’t give you an ETA for purl
in the NVD (I’m sure it’s more than one year, and probably more than two years),
but the point is someone is working on it.
As I mentioned, 90% of components
are open source, so what about the 10% that are proprietary? While it’s not
impossible that purl could be extended to them in some way, there are other
naming schemes like SWID tags
that are already in wide use, that might cover proprietary software. There will
need to be cooperation from the large suppliers like Microsoft (which used to
include SWID tags in all of its products), and there will need to be some group
that handles products from suppliers that are out of business or just don’t
respond to inquiries on this. So, this won’t be a piece of cake, either.
It will definitely be many years
before every software product ever produced, and every product being produced
now and in the future, will in theory be capable of being identified definitively
with a single name (from one of only two or three naming schemes). But solving
the naming problem isn’t an all-or-nothing proposition. The various incremental
steps along the way (say implementing purl in just one part of the NVD) will make
life a lot easier, both for software developers and for the users who are
trying to identify and patch vulnerabilities (as well as other risks) in the
software they use.
Update May 6, 2023: The SBOM Forum proposed at least a 70% solution to the NVD portion of the naming problem in September 2022. We're now discussing with the NIST team which runs the NVD how we can partner with them to make these and other improvements, including possibly getting funding and expertise from private industry and perhaps other sources.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
No comments:
Post a Comment