Tom Alrich's Blog: How can we (really) automate software vulnerability identification?

I probably don’t need to tell you that vulnerability management is important for any organization, public or private, that uses software. If you’re not convinced of this, all you need to do is look at devastating ransomware attacks like WannaCry, NotPetya and Ryuk. All of these exploited known vulnerabilities for which patches were available.

I also probably don’t need to tell you it is impossible to manage vulnerabilities that affect software you use, if you can’t learn about them using frequent, fully automated searches – in which you enter an identifier for a software product and version and immediately discover all recently identified vulnerabilities that affect that product and version.

Yet, that is the situation today: The most widely used vulnerability database in the world is the US National Vulnerability Database (NVD). However, because of the NVD’s currently huge backlog of “unenriched” CVE (vulnerability) records dating from February of this year, any search for vulnerabilities that apply to a particular software product and version will yield on average fewer than one third of the vulnerabilities that have been identified this year for that product and version. Even worse, the NVD provides no warning about this situation.

This is analogous to a doctor that stopped studying new diseases eight months ago and can only diagnose diseases that were identified before then – yet never warns his patients that they could possibly have contracted a disease he hasn’t yet learned about. In both cases, the end user/patient is more likely to be harmed due to not knowing about a vulnerability/disease, than to benefit from knowing about one that they face. Ignorance is not bliss.

However, the NVD’s biggest problem isn’t their current backlog, but the fact that the CPE (“common platform enumeration”) software identifier that is required for all vulnerability lookups in the NVD has many problems - and there is no good solution for them. These problems cause many searches to fail, without any explanation for the failure. Even worse, the user will usually not even be informed that the search has failed.

In 2022, the OWASP SBOM Forum (which I co-lead) published a white paper on the CPE problem in the NVD. The central argument of that paper was that the purl (product URL) software identifier is far superior to CPE, and that CVE.org (the agency of the Department of Homeland Security that oversees the CVE Program) and the NVD should move as quickly as possible toward supporting both purl and CPE. After writing that paper, we submitted a “pull request” to CVE.org to add purl support to CVE records. That request came into effect when the CVE 5.1 specification was approved earlier this year.

However, the 5.1 specification alone didn’t solve the problem. The CVE Numbering Authorities that create CVE records (i.e., report new vulnerabilities in software products, usually products developed by their own organization. For example, Microsoft, Oracle, Red Hat, Schneider Electric and HPE are all CNAs) need to start adding purls to those records, yet few if any have done so thus far. One reason for this is that, even if the CNAs started doing that, the purls would be “all dressed up with nowhere to go”, since neither the NVD nor the CVE.org database currently allows a search using purl.

But there’s an even bigger problem: While purl has literally conquered the world of open source software, it can only be used to identify a tiny percentage of proprietary software products with vulnerabilities today. This means a user of a proprietary software product cannot look that product up in the NVD using purl; instead, they must use CPE. Purl can never be on an equal footing with CPE until it can be used to identify proprietary software products, not just open-source products.

The OWASP SBOM Forum has decided this is an unacceptable situation, especially since purl eliminates most of the problems that affect CPE. We are asking, “What will it take to give purl the capability to identify proprietary (closed-source) software, as well as open-source?”

Fortunately, two very smart individuals are members of the Forum. One is Steve Springett, creator and leader of two of OWASP’s major projects: Dependency-Track (which performs over 20 million automated vulnerability lookups every day - although few of these use the NVD. In fact, D-T mainly uses Sonatype’s OSS Index, an open source vulnerability database that is based on purl) and CycloneDX. The other is Tony Turner, the cybersecurity expert and SANS instructor who co-leads the SBOM Forum with me, along with Jeff Williams of Contrast Security.

Both Steve and Tony are quite familiar with purl, since they are both part of the project team. In fact, in the “early days” of purl (which were less than ten years ago, believe it or not), Steve worked closely on the design with Philippe Ombredanne, the creator of purl (who is also a member of the SBOM Forum). When the SBOM Forum developed our paper in 2022, Steve described two ideas for how to expand purl to identify proprietary software.

Before I explain Steve’s ideas (one of which Tony came up with separately), I need to point out the most important feature of purl: It isn’t based on a centralized “namespace” like CPE is. CPE names are created by contractors who work for the NVD (which is part of NIST). Unless one of those contractors creates the CPE name, it isn’t valid[i].

If a CNA or software user wants to learn the CPE name for a software product, they must use a variety of methods to find it – fuzzy logic, generative AI, prayer, etc. There is a centralized “CPE database”, but it is simply a list of all the CPEs that have ever been created, without any contextual information. As Bruce Lowenthal of Oracle has pointed out, this would be like listing all the words in the Bible in alphabetical order and calling that an English dictionary.

By contrast, purl creates a decentralized namespace. Purl consists of a series of one-word types, which currently mostly refer to package managers for open-source software (e.g., the “maven” type refers to the Maven Central package manager). All you need to know about package managers now is that they’re a single web location from which you can download software, if you know the name of the product and its version string. Since a single product/version pair can never be replicated within the package manager, each pair is unique. Therefore, each package manager has a controlled namespace.

What’s more important is that the combination of three pieces of information – package manager (type), product name within the package manager, and version string - is guaranteed to be unique within the entire purl namespace (i.e. across all purl types). What’s even more important is that the user of the product doesn’t have to query a central database to find out the purl for their product. The user can create the purl on their own, using information they already have.

To create the unique purl, the user just needs to know the type (package manager), and the name and version string in that package manager. For example, the purl for version 1.11.1 of the Python package named “django” in the PyPI package manager is “pkg:pypi/django@1.11.1”.[ii]

Of course, even though the user can always re-create the correct purl for the product, that will only help them identify a vulnerability if the supplier reports vulnerabilities in that product/version to CVE.org[iii] using the same purl; that way, the purl the user enters in the vulnerability database will match the purl on the CVE record. This is how CPE is supposed to work, but since it’s impossible to know for certain what the NVD contractor actually created, there can never be any certainty regarding CPE.

For example, if the contractor used “Microsoft” as the vendor name, that CPE will be different than if they used “Microsoft, Inc.” If a user, who is trying to learn about vulnerabilities in a Microsoft product, creates a CPE according to the CPE specification, they will have to guess which of these is the vendor name used by the contractor, since they will be different CPEs.

What is worse is that if they guess wrong and search on the wrong CPE, they will simply be informed that “There are 0 matching records”. This is the same message they would receive if they had guessed correctly, but there are no vulnerabilities listed in the NVD that apply to that product/version (which might be interpreted to mean the product/version has a “perfect record”). There is no way for the user to learn which is the case.

With purl, as long as the user knows the package manager they downloaded the product from and the product’s name and version string in that package manager, they should always (barring a mistake) be able to create the same purl that the supplier used when they reported the vulnerability. This is why purl has literally conquered the open source software world. In that world, it would be difficult even to say there is a number two software identifier after purl.

Of course, the key to purl’s success is the existence of package managers in the open source world; it would be much more difficult to create a distributed namespace without them. That raised the question in a few creative peoples’ minds: Is there an analogue to package managers in the proprietary software world? At different times, both Steve and Tony realized that the answer to this question is yes: it’s app stores.

Like package managers, app stores (these include the Apple Store - which is in fact five stores - as well as Google Play and the Microsoft Store, although there are many smaller stores as well) do the following:

1. Provide a single location from which to download software;

2. Control the product namespace within the store, so that each product has a unique name; and

3. Ensure that each version string is unique for the product to which it applies. For example, the product named Foo won’t have two versions that have the same version string, say “4.11.6”.

In other words, app stores can probably be treated in purl like package managers are treated today. Each app store will have its own purl type, just like package managers do now. Perhaps the most impressive aspect of adding app stores to the purl ecosystem is that, as soon as a purl type is created for a new store, all the products in that store (for example, Google Play currently contains about 3.5 million products) will instantly have a purl. No NVD employee or contractor (or anyone else) needs to do anything to enable this to happen.

What about proprietary products that aren’t in app stores?

The great majority of proprietary software products are not available in app stores, but from the website of either the developer or a distributor. How can purl be expanded to include them?

In the SBOM Forum’s 2022 paper, we provided a two-paragraph high level description of the purl solution we were suggesting for proprietary software, based on an idea of Steve Springett’s:

1. When a developer releases a new software product or a new version of an existing software product, they will create a short document (called a tag) that provides important information on the product, especially the name, supplier and version string.

2. When a user downloads that product from the developer’s website (presumably after paying for it), the user will also receive the tag; they can use the information in the tag to create the purl for the product (perhaps like the purl described above)[iv].

Since the supplier created the tag in the first place, when they report a vulnerability for the product to CVE.org, they should use a purl that includes the information from the tag. Thus, the purl created by the user will match the one created by the supplier, since they are both based on the same tag. When the user searches a vulnerability database using that purl, they are sure to learn about any vulnerabilities the supplier has reported for the product.

Rather than create our own format for the product information tag, Steve suggested that we use the existing SWID (“software identification”) format. SWID is a specification (codified in the ISO/IEC 19770-2 standard in 2006) that was developed by NIST. It was originally intended to be the replacement for CPE in the NVD and to be distributed with the binaries for a software product. However, it never gained much traction for that purpose. NIST has dropped the idea of replacing CPE with SWID tags in recent years.

Steve realized that, since SWID is an existing standard and a lot of software products have SWID tags now (for example, for about two years, Microsoft distributed SWID tags with all their new products and product versions), it would be better to use that than to create a new format; this was especially important, since the SWID format includes all the information required to create a usable purl. Steve defined a new purl type called “SWID” and got it be added to the purl specification in 2022. He also developed a tool that creates a purl based on information in a SWID tag.[v]

However, our 2022 document didn’t address two important questions:

1. For legacy products, if the supplier didn’t create a SWID tag originally, who should create one now? Presumably, it will be the current supplier of the product, even if the product has been sold to a different supplier in the meantime.

2. How will the user of a product, for which the supplier has created a SWID tag, locate and access the tag? While the supplier could develop a mechanism through which a customer can automatically locate and download the tag from their website, there will soon be a much more universal method for discovering and accessing software supply chain artifacts: the Transparency Exchange API. This is being developed by the CycloneDX project. It will be fully released by the end of 2025, when it will also be approved as an ECMA standard.

How will all of this happen?

The OWASP SBOM Forum believes that, once purl can represent proprietary software products (after the required new types are implemented in the purl specification), the following set of steps[vi] will be set in motion:

1. A “purl expansion working group” – including members from many different types of organizations – will meet regularly to work out required details for expansion of purl to proprietary software products. The group will publish these details (most likely as OWASP documents). The group will also:

a. Recruit operators of app stores to participate in the purl community, along with creating a new purl type for each store and submitting the pull request to add that type to the purl specification; and

b. Conduct tabletop exercises with software suppliers to test the formats and procedures required to implement the purl SWID tag program. This will include testing the purl SWID type definition. This definition was created more than two years ago, but it has only been tested by a few software developers. It needs to be subjected to broader “tabletop” testing.

2. Private and governmental security organizations (including CVE.org) conduct awareness and training activities for the activities described in this paper, especially regarding the development, distribution and use of SWID tags to create purls for proprietary software products. These activities will target CNAs, software suppliers, security tool vendors, vulnerability database operators and larger end user organizations, including government agencies.

3. Suppliers create SWID tags for their products, starting with new products and product versions and continuing with legacy products that do not yet have SWID tags.

4. Suppliers make their SWID tags available through one (or more) of three channels: a) directly to customers, b) in a machine-accessible format on their website, and c) using the Transparency Exchange API, when it is available.

5. After being trained in purl and the new purl types for proprietary software, CNAs start including purls in CVE records. The purls are based on the suppliers’ SWID tags.

6. Vulnerability databases based on CVE records (perhaps including the NVD) advertise the fact that users can now find vulnerabilities in proprietary software using purl. They offer training materials (webinars, videos, website content and hard-copy publications) for users.

7. Users begin to see the advantage of using purl. The primary advantage is that they can deploy fully automated tools for vulnerability identification without having to intervene regularly in the identification process, as is the case with CPE.

8. As suppliers realize their SWID tags are being accessed by their customers, they also see this is giving them a small but tangible marketing advantage over competitors.

9. Purl-based open source vulnerability databases see increased traffic once they start accepting the new purl types, as users realize they now have a “one-stop-shop” for identifying vulnerabilities in both open source and proprietary software.

10. Operators of CPE-based vulnerability databases (especially the NVD) notice that not having to create at least one CPE for every new CVE record saves their staff a lot of time. They also notice that users of those databases are expressing more satisfaction with their experience, since a much higher percentage of the purls they enter are finding their match in the CVE records, than was the case when CPE was the only software identifier available to them.

11. As CNAs begin to realize that users are taking purl seriously, they add more purls, and fewer CPEs, to CVE records.

12. The above set of steps cycles continually, until growth of the overall vulnerability database “market” results in continuous growth of both purl and CPE, with roughly constant “market shares”.

The OWASP SBOM Forum is under no illusion that the above set of steps will be accomplished very quickly, given the current rudimentary state of awareness regarding purl and its advantages. On the other hand, the fact that truly automated vulnerability management is currently almost impossible in the NVD makes it even more important that we start implementing a real solution to those problems, while still hoping that the NVD will eliminate their huge backlog of unenriched CVE records in the coming one or two years.

There is good reason to believe that if we start now, within 3-4 years purl will be widely accepted and used to identify vulnerable proprietary software products in most vulnerability databases. We say this because this will be the second time that purl has been quickly accepted. Here is the story of the first time:

Steve Springett[vii] states that in 2017 and 2018, purl had little traction in the open source world, because it was so new. Steve’s Dependency Track and CycloneDX projects, along with Sonatype’s OSS Index vulnerability database, were a few early adopters of purl in 2018. Yet, purl was in wide use in the open source community by 2022. Steve points out that today, purl has been adopted by “most SCA vendors, hundreds of open source and proprietary tools, and multiple sources of vulnerability intelligence.” I would add that purl is used today by literally every major vulnerability database worldwide, other than the NVD and databases based on NVD data. Indeed, purl has “won the war”, when it comes to identifiers for open source software.

Of course, the world of proprietary software is quite different from the open source world, since the participating organizations are sometimes true competitors; that is not often the case with open source software. However, once new purl types are developed to allow identification of proprietary software, it should not require a heavy lift for databases now based on purl to accommodate those new types. This means that, soon after the new purl types for proprietary software have been incorporated into the purl specification, big purl-based vulnerability databases like OSV and OSS Index, which today only support open source software, may quickly support vulnerabilities in proprietary software products as well.

Looking ahead

The OWASP SBOM Forum has recently published a white paper that discusses all the above topics in more detail. It is available for download here. We are actively discussing these topics in our meetings and welcome new participants. Our meetings are every other Tuesday at 11AM ET and every other Friday at 1PM ET. To receive the invitations for these meetings, email tom@tomalrich.com.

We currently expect this to be a two-phase project:

1. Planning and Design. This will consist of just the first of the above steps. We believe this phase will require no more than 4-5 months of bi-weekly meetings (plus online asynchronous work between meetings, including soliciting participation by app stores and software suppliers and conducting the tabletop exercise to test adequacy of the SWID purl type). This phase will require a modest budget for coordination of those activities.

2. Rollout. All steps listed above other than the first are included in this phase. This phase can be summed up as “training and awareness”. While training and awareness activities are not inherently difficult, they require large numbers of people to be involved, both on the “trainer” and “trainee” sides. We estimate that this phase will require five to ten times the amount of resources required for the first phase.

We estimate that the first phase will require approximately $50,000 to $100,000 in funding, although we are willing to start work on this phase with less than that amount committed. Since the resources required for the second phase will depend on the design developed in the first phase, we will wait until at least a high-level design is available during the first phase, before estimating the second phase and seeking funding.

We invite all interested parties, including software developers, software security service and tool providers, and end users of software of all types, both to participate and to donate to this effort. Donations (both online and directly) over $1,000 can be made to OWASP and “restricted” to the SBOM Forum[viii]. Any such donations are very welcome (OWASP is a 501(c)(3) nonprofit organization, meaning many donations will be tax-deductible. However, it is always important to confirm this with tax counsel). To discuss a donation of any size, please email tom@tomalrich.com and tony@locussecurity.com.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

[i] Due to the NVD’s current problems in creating CPEs, CISA has been designated an “Alternate Data Provider”, who can create authoritative CPEs that have the same status as those created by the NVD contractors. CISA’s “Vulnrichment” program has created many CPEs since their designation, but these are just a fraction of the number required to reduce the backlog.

[ii] Every purl begins with the prefix “pkg”. This prefix is not needed today, but will be in the future.

[iii] Many open source vulnerabilities are not reported to CVE.org, but instead to a vulnerability database like GitHub Security Advisories (GHSA). Many of these databases share their vulnerabilities with the OSV database (managed by Google), where they are displayed using the OpenSSF Vulnerability Format. Most OpenSSF vulnerabilities can be mapped to the CVE format.

[iv] Of course, the user should not have to create the purl manually; the process can be completely automated within a vulnerability management tool.

[v] Steve’s tool requires the user to manually input data from the SWID tag, but the code can of course be adopted for automated use by a vulnerability management tool.

[vi] These steps aren’t a “chain”, since they will ideally happen simultaneously, at least after an initial “startup” period. In general, each step listed depends on the previous step being accomplished.

[vii] In an email on October 19.

[viii] OWASP reserves ten percent of each “restricted” donation to fund administration. That is, OWASP doesn’t simply pass the donation through to the project team – in this case, the SBOM Forum. Instead, as the project team performs work or incurs other expenses on the project, they submit invoices to OWASP, which determines whether they are appropriate before paying them.

Tom Alrich's Blog

Sunday, October 20, 2024

How can we (really) automate software vulnerability identification?

No comments:

Post a Comment

Get new posts by email: