Thursday, September 15, 2022

At long last…


When I first started attending the meetings of the NTIA Software Component Transparency Initiative (which of course was the name for the “SBOM initiative”) in the summer of 2020, I immediately started hearing about what was called the “naming problem”. I wrote a post about the problem in November 2020 and called it the “one problem that towers over the others”, as far as SBOMs are concerned.

In other words, at that time, I considered the naming problem to be the most serious roadblock in the path to widespread (or even narrowspread, to be honest) distribution and use of SBOMs by the general community. However, the whole group seemed to have decided that this was too hard a problem to address at the time, given that they were still trying to a) figure out how SBOMs were going to work in general, and b) interest the developer community in producing them. This was still the accepted view when the NTIA Initiative ended at the end of 2021, even though a lot of progress had been made in both of those areas.

I have good news and bad news. The good news is that I no longer consider the naming problem to be the biggest impediment to the spread of SBOMs. But the bad news is that I don’t say this because I think the problem has diminished. Rather, I’ve come to realize there are at least five or six just-as-serious impediments to SBOMs (and a host of not-so-serious impediments, which are impediments nevertheless); in other words, the naming problem now has company, instead of “towering” over the others. There’s progress for you!

Early this year, it seemed to me and a few others that it was time for the private sector to take the lead in addressing head-on the most serious problems for SBOMs. The idea was that we would meet weekly and discuss one problem until we at least understood how it might be solved. Then, if there was something we could do to put in motion the solution to the problem, we would do that. After that…on to the next problem.

Why were we doing this? I’ll admit that we all had selfish reasons. Our livelihoods are all tied in one way or another to the success of SBOMs. We will all benefit if SBOMs start to be widely used outside of the development community (where they’re already widely used for product risk management purposes).

The question came up, should the participants worry about helping potential competitors? After all, when one of these problems is solved, it will benefit their competitors, as well as themselves. My answer to this question (which I didn’t hear often, to be sure) is that ultimately the SBOM “market” will include every organization on the planet. It’s difficult to imagine any organization of any size in the world today that doesn’t use software (even if it’s just the software in one person’s smartphone); it will soon be literally impossible to imagine that.

Given this, it follows that today’s market for SBOM services and tools is an almost infinitesimal fraction of what it can be. Doesn’t it make sense to focus on expanding the market so it’s at least a significant fraction of what it ultimately will be, rather than worrying about how you and your competitors are going to divide up the very small market that’s there today?

Of course, it made sense to my friends to take the latter course. The only question was, which of the five or six serious problems would we start with? I wasn’t sure which one, but I was sure of one thing: it wouldn’t be the naming problem. I honestly thought that this problem would take a year to even understand, another year to develop at least a partial solution for it, and finally about eight long, grinding years – full of political battles – to see the near-solution implemented.

My friends and I created an informal group called the SBOM Forum and started meeting for an hour every Friday. In one of our earliest meetings, Tom Pace of NetRise described an eye-opening experience he had with the NVD, in which a device, that appeared in the NVD to have never had a vulnerability, in fact had at least 1,237; Tom found these with a simple scan of just two of the firmware products installed in the device. Moreover, Tom later came to realize that the same product has probably 40,000 unpatched vulnerabilities, even though an NVD search will find nary a one of them.

The problem Tom came across was just one of the many manifestations of the naming problem. The naming problem in general refers to the fact that software products have many names in the many locations in which they’re found. However, the branch of the naming problem that causes the most consternation for the software community is centered on the National Vulnerability Database (NVD), by far the most heavily used vulnerability database in the world. The NVD uses a very problematic naming system called CPE (common platform enumeration), which is the source of most aspects of the problem.

To my horror, after Tom’s discussions with our group, we stumbled into deciding to take on the naming problem first. But I was in for a big surprise: Instead of taking two years to identify the problem and document a solution to at least 70-80% of the problem, we took a little more than four months. We published the document this week.

The six most serious aspects of the problem with CPE are described on pages 5-7 of our document. I used to think that all those aspects were going to require their own specialized measures, which was why I was sure that any “solution” we proposed would be a godawful mess. However, I didn’t realize that our group was lucky enough to have the perfect Hero to slay the foul Naming Beast: Steve Springett, co-leader of the OWASP CycloneDX project and without much doubt the most creative person in the SBOM community.

As we started talking about the problem, Steve quickly realized that a big component (no pun intended) of any solution would have to be purl, a unique type of identifier that is already in wide use - although it isn’t used so far in the NVD (Steve is a maintainer of the purl project, which has helped our group a lot already). Purl has so far been used mostly with open source software (and used very extensively. Just one tool, Dependency-Track, is currently used about 27 billion times every month to look up a vulnerability for an open source product in Sonatype’s OSS Index vulnerability database, which is based on purl).

Steve realized that purl’s unique extensibility would allow us to “incorporate” two other identifiers into purl: SWID for proprietary software, and Software Heritage ID for legacy open source software (since our document explains the various naming schemes that our solution will utilize, I won’t explain them here).

The best thing about purl: for reasons described in the document (be sure to read the discussion of “intrinsic” vs. “extrinsic” identifiers), integrating purl into the NVD will not require any database linking; every open source product that’s in a package manager already has a purl. When searching for vulnerabilities in that version of the product, the user can easily create the correct purl – 100% of the time.

While purl, SWID and SWHID cover a good portion of the software universe, they don’t cover hardware. However, the NVD includes hardware, and devices also have CPE names (along with the problems belonging thereto). I would have been fine with declaring victory with software and moving on, but Steve insisted we could do hardware as well. He and Tony Turner of Fortress Information Security recognized that the two most widely used hardware identifiers, GTIN and GMN (both part of the “GS1” family of hardware standards), could be integrated with the NVD as well.

However, in order to allow those two identifiers to be used with the NVD, various databases need to be linked with the NVD. This is the only part of our proposal that requires substantial work. I think it would amount to just a few person-months, but since there will be various groups involved, it might take longer than that.

But the work will be worth it. One of the GTIN identifiers is the UPC code. Once our proposal is fully implemented, you should be able to find out about vulnerabilities in a device by entering its UPC code – either by scanning the code itself or entering the numbers by hand – in the NVD. Whoever thought using the NVD could be this much fun?

I’m hoping that, two years from today, our “proposal” will be fully implemented; Steve has already started discussing this with the group that will be most involved in the implementation: MITRE Corporation. CISA at some point (perhaps not until early next year) will hold a meeting to discuss the question of software naming in the NVD.

What can you – and your organization - do to advance this proposal? I’d say the first thing is to read the document. I’ll admit it’s dense. I’m going to put up two or three blog posts trying to shed light on some of the harder parts. And OWASP will have a webinar soon to discuss the proposal.

Second, you can read the blog post by Steve Springett. At the bottom, he has a “How to Help” section. To what he has there, I would add that we would like to put together a set of short testimonials (1-3 sentences) on why your organization supports this proposal, including how this will enable the organization to do whatever you do better and more efficiently.

For example, the document includes a statement from a very large software maker (identified in the document) to the effect that they regularly create SBMs for all of their products, but when they look at just the output of the tooling they use to create the SBOM, only one in five components can be identified in the NVD. Of course, this means they can’t “operationalize” their SBOM production process. While they can often find a lot of those components, they – like many other companies and consultants – need to employ AI, fuzzy logic, references to other databases, prayer…whatever will help. Being able to fully automate SBOM production will greatly increase the efficiency of their software development process.

But fully automating the SBOM production (or interpretation) process is out of the question for any organization, until the naming problem is solved.[i] Hopefully, that day is now closer than it’s ever been. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] I won’t pretend that the naming problem will ever be “solved”, since there are so many “corner cases” that will never all be addressed in the structure outlined in our proposal. However, it can be close to solved for two important areas: open source components (which of course compose 90% of software components) and hardware devices. The one area where our proposal is weakest is proprietary software, since that will require commercial developers to create SWID tags for all of their products, both new and existing (although they will only have to enter 4 or 5 fields in the tag).

Sure, this will be a little work, and some developers will resist doing this. But the choice isn't necessarily theirs. If they don't put a SWID tag in their product, their users will be at the mercies of CPE names when they try to look up vulnerabilities for the product in the NVD. As with most consumer products, the consumers (whether organizations or individuals) "vote" for their favorite products in the marketplace, using dollars, euros, etc. A supplier that doesn't provide SWID tags may find themselves "losing" a lot of elections!

No comments:

Post a Comment