Monday, October 14, 2024

How can we truly automate software vulnerability identification?

Given the proliferation of serious software vulnerabilities like the log4shell vulnerabilities in the log4j library, software vulnerability management is an important component of any organization’s security program. Successful vulnerability management starts with successful vulnerability identification. This requires that:

1.      The supplier of the software reports vulnerabilities they find in their products. These reports are incorporated into vulnerability databases, especially the US National Vulnerability Database (NVD). Almost all software vulnerabilities are reported by the supplier of the software, not a third party.

2.      Later, users of the software can search the NVD for new vulnerabilities that apply to software products they use. Learning about these vulnerabilities enables the user to coordinate with the suppliers of those products, to learn when they will patch the vulnerabilities and encourage them to speed up patches for the most important vulnerabilities.

However, one important assumption underlies these two requirements: that the user will always be able to learn about vulnerabilities that apply to a product they use when they search a vulnerability database like the NVD. The user will only be able to do this if they know how the supplier has identified the product in the database.

It might seem like the solution to this problem is obvious: The supplier will report the vulnerability using the name of the product and the user will search for that name. The problem is that software products are notorious for having many names, due to being sold under different brands or in different sales venues, acquisition by a different supplier, etc. Even among the employees of a large software supplier, their own products may be known by different names. Trying to create – and especially maintain – a database that lists all the names for a particular software product would be hugely expensive and would ultimately fail, due to the rapidly increasing volume of new software products.

Given there will never be a definitive database of all the names by which a single software product is known, how can a user be sure their search will find the correct product in a vulnerability database? There needs to be a single machine-readable identifier for the product, which the supplier includes in the vulnerability report and the user searches for in the vulnerability database. We have already ruled out the idea of a centralized database that lists all the possible names for a single software product. How can we accomplish this goal without a central database?

The solution is for the identifier to be based on something that the supplier will always know before they report a vulnerability for their product, and that the user will also know (or can easily learn) before they search for that product in a vulnerability database. A good analogy for this is the case of the formula for a chemical compound.

If a chemist has identified a compound whose molecules consist of two hydrogen atoms and one oxygen atom, the chemist will write it as “H2O” (of course, the “2” is normally written as a subscript). Every other chemist will recognize that as water. Similarly, a compound of one sodium and one chlorine atom is NaCl, which is table salt. Note that all chemists can create and interpret these identifiers, without having to look them up in a central database. A chemist who reads “NaCl” always knows which compound that refers to.

There is a software identifier that works in the same way. It’s called “purl”, which stands for “package URL”. It is in widespread use as an identifier in vulnerability databases for open source software that is made available for download through package managers (these are the primary locations through which open source software is made available for download, although not all open source software is available in a package manager).

To create a purl for an open source product, the supplier or user only needs to know the product name, the version number (usually called a “version string”) and the package manager name (such as PyPI). Because every product name/version string combination will always be unique within one package manager (although the same product/version might be available in a different package manager), the purl that includes those three pieces of information is guaranteed to be unique; it is also guaranteed always to point to the same product, since the combination of product name and version string will never change for that product/version.

For example, the purl for version 1.11.1 of the package named “django” in the PyPI package manager is “pkg:pypi/django@1.11.1”. If a user wants to learn about vulnerabilities for version 1.11.1 of django in the pypi package manager, they will always be able to find them using that purl. If they upgrade their instance of django to version 1.12.1, they will search for “pkg:pypi/django@1.12.1” (the “pkg” field is found in all purls). Since the supplier will always use the same purl to report vulnerabilities, the user can be sure their search will find all reported vulnerabilities for that product/version.

Besides purl, the only vulnerability identifier in widespread use is CPE, which stands for “Common Platform Enumeration”. Without going into a lot of detail, CPE is the identifier used in the National Vulnerability Database. It was developed more than 20 years ago by the National Institute of Standards and Technology (NIST), which operates the NVD.

A CPE is created by a NIST employee or contractor and added to a vulnerability (CVE) record in the NVD. Unfortunately, there is no way that anyone can predict with certainty the CPE that this person will create. Some of the reasons why this is the case are described on pages 4-6 of the OWASP SBOM Forum’s 2022 white paper titled “A proposal to operationalize component identification for vulnerability management”.

Currently (as of the fall of 2024), there is an even more serious problem with CPE, in that since February the NVD staff has drastically reduced the number of CPEs it creates. The result is that over two thirds of new CVE records entered in 2024 do not have a CPE name attached to them. This makes those CVEs invisible to automated searches using a CPE name. A user that searches with a CPE name today may potentially never learn about two thirds of the vulnerabilities that apply to their product/version.

The upshot of this situation is that, if truly automated software vulnerability management is going to be possible again, purl needs to be the default software identifier, both in CVE records and the National Vulnerability Database. While most of the groundwork for achieving this result has already been laid, there remains one big obstacle: Currently, there is no workable way for purl to identify proprietary software. Since the majority of private and public sector organizations in the world rely primarily on proprietary software to run their businesses, this obstacle needs to be removed, so that users of proprietary software products can easily learn about vulnerabilities present in those products.

The OWASP SBOM Forum has identified two methods by which the purl specification can be expanded to make vulnerabilities in proprietary software products as easily discoverable as are vulnerabilities in open source products today. We will soon be starting a working group to address this problem. If you would like to participate in that group and/or provide financial support through a donation to OWASP, please email me.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. 

No comments:

Post a Comment