Tuesday, June 7, 2022

Going after the big one


About three months ago, a small group of friends (old and new ones) who are involved in the “SBOM business” got together to start talking about something that is on all of our minds: the small but important set of serious problems that are currently holding back widespread use of software bills of materials, and how these issues might be either mitigated or (preferably) solved altogether.

We all agreed that the problem of SBOM production by software suppliers is much smaller than the problem of use by general organizations, since many of the suppliers are already making extensive use of SBOMs today. But the suppliers are by and large producing SBOMs for their own use, to help them manage their own supply chain cybersecurity risks (i.e. the risks posed by the many components they include in their products). They’re not distributing them to their users, mostly because the users aren’t asking for them.

Why aren’t the users asking for them? There are a number of reasons, especially the lack of low-cost or open source tools and services that will help them identify risks found in components included in the software they use.

However, there’s one problem that overshadows all of the others. It’s one that was sometimes discussed by the NTIA Software Component Transparency Initiative, but which was generally considered to be insoluble – that is, insoluble without a persistent multiyear effort that would involve a lot of…well, lobbying of various government agencies and nonprofit organizations that would need to be involved in any permanent fix. Since there were other more easily soluble problems that could be addressed without such a huge effort, and since there are partial workarounds available that can make the big problem at least tolerable, the general consensus was to let this sleeping dog lie for the moment.

The problem has a number of names because it has many facets, but most people refer to it as “the naming problem”. Briefly, the problem is that there are a lot of problems with the “CPE” (common platform enumeration) names that are required in order to look up vulnerabilities (CVEs) that apply to a product in the National Vulnerability Database (NVD). Very often it will be difficult (or impossible) to look up a product, because the user is unable to find the CPE name under which it was entered.

More generally, it turns out that simply knowing the title and supplier name of a software product that you own won’t provide you with a universal name, that will be valid in all times and places. A really striking example is a software supplier you might have heard of, named Microsoft. Well, we may think of that supplier as “Microsoft”. However, there are many different names used with different products that we would normally consider “Microsoft products”.

If you search on “Microsoft” in the NVD, you’ll miss a lot of products that are listed under a different supplier name, like Microsoft Corporation, Microsoft Europe, etc. In fact, someone who works a lot on this issue told me they had asked people at Microsoft what company they worked for, and they received something like 27 different responses. Even more interesting, there is no central location where you can go to find all products produced by the various “Microsoft” entities.  

I wrote about this problem in 2020, soon after I joined the NTIA initiative. However, at that time I agreed with the consensus that there were other fish to be fried before we turned to that one; so I didn’t write any more posts on it until today.

To be honest, when my friends and I started meeting weekly, we didn’t really intend to tackle the naming problem right away. But, in one of our first few meetings, Tom Pace of NetRise did a presentation on a very serious problem (along with two others, discussed here and here) that poses serious risks to intelligent device users; it turns out this is just one of the many facets of the naming problem.

The week after Tom’s presentation, the group decided (although I’m not sure “decided” is the right word; “stumbled into” might be better) to start exploring the naming problem further. In the four or so weeks since that meeting, to my surprise, we have made a lot of progress. Here are some of the things we’ve decided on.

The problem can be divided into short- and long-term aspects. While we explored the long-term problem for a week or two and made some progress on at least the outlines of a possible long-term solution, we decided that there are short-term steps we can take, that could lead to improvement in perhaps six months to a year. We’re going to focus on those steps for the time being.

While it’s tempting to think the solution is to have a central registry of suppliers or products – and while this would actually be a true solution to the problem – in practice, that would require a huge amount of resources, as well as close to endless lobbying, discussion, arm-twisting, etc. to put it into place and keep it operating. This simply isn’t going to happen.

The long-term solution has to be a distributed one, in which different groups are responsible for their own "namespaces”. What are those groups? They’re the groups responsible for different types of software. Since about 90% of software components are open source, most of these groups are open source repositories and package managers, including Maven, PyPi, NPM, etc. These all have their own naming schemes.

However, it turns out that open source namespaces are the easiest to deal with. This is because each repository maintains its own namespace – i.e. a list of all the open source products stored in the repository. In theory, if you want to find the authoritative name for an open source component and you know the repository it came from (which is usually determined by the language it’s written in), you can just search there and find the name.

The hard type of software is proprietary software – i.e. software that you usually pay for, which comes from Microsoft, Oracle, etc. One would generally think that the supplier could tell you the authoritative name for one of their products, but in practice suppliers will often have multiple names for the same product. For example, the product might have been acquired by multiple suppliers over time; they each named it using their own naming scheme; the versions that were produced by Supplier A will all have completely different names than those produced by Supplier B, etc.

And, since suppliers themselves get acquired and divested, the product name (and of course the supplier name) will often have changed multiple times for that reason. The current owner of a product will likely have assigned it their own name; just knowing the name that a previous owner assigned to it won’t necessarily be any help in learning the current name – of either the product or the supplier.

If we’re not going to have a universal namespace maintained by an army of registrars, how are we going to find the name of a component we’re concerned about? While this isn’t necessarily the solution, there is a naming scheme known as purl (package URL), that in principle can cover all open source software. The core fields in the purl are:

  • type: the package "type" or package "protocol" such as maven, npm, nuget, gem, pypi, etc. Required.
  • namespace: some name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization. Optional and type-specific.
  • name: the name of the package. Required.
  • version: the version of the package. Optional.

If you’re not sure of the “real” name of an open source component, you can in theory use the information you have about it – these four items – to construct its purl. You should then be able to look it up in databases like Sonatype’s free OSS Index, the largest open source software database. And while I’m sure there are various gotchas, you should almost always be able to positively identify the software you’re looking for.

Other vulnerability databases also use purl to identify software, but guess which database doesn’t use it? You’re right, it’s the NVD! The NVD only uses CPE names, meaning there’s a lot of open source software that isn’t referenced in the NVD at all. Plus, there’s more that isn’t referenced correctly, because of various problems constructing CPE names.

So guess what the next item is on my group’s agenda? It’s getting the NVD to start using purl, along with CPEs, to identify the products in the database. Technically, this isn’t very hard to implement, but politically it’s challenging, because of the various organizations that have to be involved in different ways. We’re now putting together our roadmap for doing this, and then we’ll start. I won’t give you an ETA for purl in the NVD (I’m sure it’s more than one year, and probably more than two years), but the point is someone is working on it.

As I mentioned, 90% of components are open source, so what about the 10% that are proprietary? While it’s not impossible that purl could be extended to them in some way, there are other naming schemes like SWID tags that are already in wide use, that might cover proprietary software. There will need to be cooperation from the large suppliers like Microsoft (which used to include SWID tags in all of its products), and there will need to be some group that handles products from suppliers that are out of business or just don’t respond to inquiries on this. So, this won’t be a piece of cake, either.

It will definitely be many years before every software product ever produced, and every product being produced now and in the future, will in theory be capable of being identified definitively with a single name (from one of only two or three naming schemes). But solving the naming problem isn’t an all-or-nothing proposition. The various incremental steps along the way (say implementing purl in just one part of the NVD) will make life a lot easier, both for software developers and for the users who are trying to identify and patch vulnerabilities (as well as other risks) in the software they use.

Update May 6, 2023: The SBOM Forum proposed at least a 70% solution to the NVD portion of the naming problem in September 2022. We're now discussing with the NIST team which runs the NVD how we can partner with them to make these and other improvements, including possibly getting funding and expertise from private industry and perhaps other sources. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

No comments:

Post a Comment