Monday, August 19, 2024

The purl of great value

In the OWASP SBOM Forum (as well as the OWASP Vulnerability Database Working Group, a part of the SBOM Forum project), we have begun to focus our efforts on what I think is the most important issue in vulnerability management today: the need to extend the purl identifier so it covers proprietary as well as open source software.

We have made a lot of progress in our most recent two meetings, on August 9 and August 16. This is mainly because Steve Springett (leader of the OWASP Dependency Track and CycloneDX projects, as well as one of the early contributors to purl and still a purl maintainer) was able to join us along with Philippe Ombredanne, the creator of purl (and still leader of the purl project), who like Steve is a member of the SBOM Forum.

The question we’re trying to answer is how purl can be made an (almost) universal software identifier (it’s already by far the leading identifier for open source software). The most important part of this problem is developing a way to extend purl to cover proprietary (“closed source”) software products. You can read about our efforts so far in this recent blog post.

During the meetings on the 9th and 16th, we discussed two proposals for how purl can be extended to closed source software. These aren’t mutually exclusive, since they would allow purls to be created for different collections of closed source software products. One of these proposals will not be difficult to implement; the other will be difficult (mainly from the human interaction point of view. There are no technical challenges), but certainly not impossible.

The less-difficult (I won’t call it “easy”) proposal is based on an idea that Steve Springett brought up while the SBOM Forum (not yet part of OWASP at the time) was developing this white paper on software identification in 2022. We didn’t include the idea in that paper, but Steve brought it up again when we started discussing how to extend purl two weeks ago.

A little background: purl is based on the concept of a repository for the software binaries: usually, a package manager for open source software. While the name and version string of a software product can vary widely between different package managers, they will never vary within the package manager – that is, the product/version pair that identifies a product available within the package manager will always be the same, although the same product in a different package manager may well have a different product name or version string, even though the binaries might be identical.

This means that someone that wishes to name a particular product/version available in a package manager like Maven Central will be able to do so using just three pieces of information (many other fields are allowed, but they aren’t mandatory):

1.        The purl Type, in this case “maven”[i];

2.        The name of the product; and

3.        The version of the product (i.e., the version string).[ii]

This means that the purl created by the organization that reports the vulnerability (perhaps in a CVE report) should always exactly match the purl created by an end user or developer that wants to find about vulnerabilities in a software product they use. If they are the same product and version and they’re found in the same package manager, the purl will always be the same, unless the person that created it made a mistake.  No central database lookup is required to find the correct purl, as there is for the CPE identifier (and even finding a CPE name through an NVD search doesn’t guarantee that it’s the same product that the CVE applies to – see the discussion on pages 4-6 of our 2022 white paper).[iii]

Of course, proprietary software – software developed by commercial organizations like Microsoft™, which is not usually made available for free, at least not through Microsoft’s commercial distribution channels – isn’t available in package managers. However, Steve Springett realized that online software “stores”, like the Apple Store, Microsoft Store and Google Play, are very similar to package managers, in that they offer a huge number of products available in the store, that can all be downloaded from a single URL – that of the store.[iv]

Steve suggested that it wouldn’t be hard to add a purl Type for any software store that wishes to participate (there are a lot of online software stores, although the three I just mentioned are probably the three biggest. They each probably have millions of individual product/versions available for download). Since Steve helped Philippe Ombredanne develop purl originally and is now a purl maintainer, he knows what he’s talking about when he says this.

Since Steve is already working with Apple on something else now, he will try to at least identify who is the person there that we need to talk to about this. If they’re interested, maybe we could work with them as a guinea pig on this idea. However, we can certainly use multiple guinea pigs, so if you are part of an online software store or know of a store that might want to work for us, please email me.

The second proposal for extending purl to cover proprietary software is the one that the SBOM Forum described on pages 11 and 12 of the white paper linked earlier, although there were no details on how this would be implemented. The idea (which was Steve’s, of course) was to create a new purl type called SWID. The software supplier will create a SWID tag and distribute it with the binaries for a new product/version. An end user, in order to search for vulnerabilities in a product they use, can create the correct purl, based on the information in the SWID tag.

The full SWID specification is complex, but fortunately creating a purl with the SWID Type is straightforward. In fact, Steve has developed a purl SWID type generator that just requires input of the required fields.

However, Steve pointed out in last Friday’s meeting that he isn’t sure this specification is really going to be a useful identifier; he needs some software developers to test the spec with their products – i.e., a proof of concept. If you work for a developer who might want to participate in this, please email me.

I was quite happy that last Friday’s meeting produced ideas for two concrete steps – finding a software store willing to test Steve’s first idea, and conducting a small proof of concept of Steve’s second idea – that will move us forward on extending purl to cover proprietary software. But implementing both of these ideas will take a fair amount of work.

Let me repeat why it’s important that we move in this direction: Because the NVD appears to be close to dead in the water, CPE is probably near death as well, since its existence is very tied to the NVD. As the SBOM Forum explained in our 2022 white paper (pages 4-6), CPE is a very problematic identifier. While I’m not advocating that the 20-something years of CVE/CPE correlations currently found in the NVD be thrown away, I don’t want CPE to be the only show in town much longer. The sooner we can extend purl to proprietary software and have it take over as the primary software identifier worldwide, the sooner we can achieve that goal.

I want to point out that I would personally love to lead both the above efforts. However, I’m already donating a large amount of time to the SBOM Forum and the Vulnerability Database Working Group. Being an independent consultant, I can’t donate more than that, but if we can get financial support, I could lead both efforts. Organizations or individuals can give “restricted” donations to support these two efforts through OWASP (a 501(c)(3) nonprofit organization) and have them directed to the SBOM Forum. In many cases, this donation will be tax-deductible.

Please let me know if you or your organization can donate time, funds or both to this project!  


[i] For a complete list of purl types, go here.

[ii] The version string is technically optional, but it is hard to think of many use cases in which it would not need to be included. This especially applies to vulnerability databases (our main concern, of course), since a vulnerability should always be reported only for the product version(s) where it is found. For example, saying that a vulnerability is found just in “Oracle Server”, without specifying the version(s) of Oracle Server, is meaningless.

[iii] For an in-depth discussion of the importance of “intrinsic identifiers” like purl – as opposed to extrinsic identifiers like CPE – see the SBOM Forum’s 2022 white paper.

[iv] The fact that the software in an online store is for sale, whereas the software in a package manager is available for free, doesn’t change the fact that they are functionally identical: they both provide a single download location for many software products, in which the name of each product will not change between versions.

No comments:

Post a Comment