Saturday, November 4, 2023

CISA provides thoughtful answers to useless questions

 

In college, I learned three important principles (as well as an important corollary) regarding writing papers:

1.      A paper must always ask and answer a question. Otherwise, it’s at best an interesting narrative. While it might get a B, it won’t get an A.

2.      The paper will be considered a failure if it doesn’t answer the question it asks, no matter how well written it is.

3.      If you get most of the way through a paper and you realize to your dismay that you aren’t answering the question you asked at the beginning, you have two options. The first is to throw out what you’ve written so far and answer the original question, even though that will inevitably require a lot more work than you had planned. The second option is to throw out your original question and ask one that you know you can answer easily; preferably, it will be the question that you in fact were answering in what you’ve written so far. That way, you can still finish the paper in the time you allotted for it. Doesn’t that sound more appealing?

But the most important lesson I learned was a direct corollary of the third truth: You should always make sure that the question you ask in the beginning of your paper is one that you’ll be able to answer easily. That way, nobody can accuse you of not achieving the goal you set out to achieve. Even though you will still have to spend a lot of time gathering citations, etc., you won’t have to spend much time…you know…thinking about what you’re going to write. If you ask the right question, the paper will write itself.

Which brings me to CISA’s recently published white paper on “Software Identification Ecosystem Option Analysis”. This paper is almost a textbook example of the above three principles, and especially of the corollary. You may know that CISA has been promising a white paper to address the software naming problem for at least a year. But the thing about the naming problem is that it’s a really…hard…problem. It’s not something that can be solved with a paper or two. Had the paper been titled something like “How can we solve the naming problem?”, it would have violated principle number 2 as well as the corollary: it would have set the team that developed the paper up for failure, since they could never have provided even a partial answer to that question.[i]

But the people at CISA are nothing if not savvy. By stating that the paper was an “option analysis”, they almost guaranteed it would be considered a success, since an analysis of options can never be wrong; moreover, it can never be considered a failure. If you say you produced an “option analysis” about naming, and somebody points out five years from now that the naming problem is still around in some form (which it inevitably will be, although hopefully in diminished form), you can just say to them, “Of course, the naming problem is still here. We just provided an analysis of the options, but we never said we were going to solve the naming problem. Other people need to look at our analysis and decide which options to pursue.” Or something like that.

Indeed, the last page of CISA’s document (page 22) includes the stirring statement, “…the options can serve as starting points to refine the merits of various operational models…” In other words, the white paper is already successful, because it will help researchers refine their models. Who dares to say this isn’t success?

However, I happen to think that CISA’s paper could have been much more successful if the writers had taken time up front to ask themselves, “Why does the naming problem need to be solved?” Obviously, the fact that there isn’t a consistent, universal naming scheme for software products by itself shouldn’t keep anybody awake at night. What is the real problem this causes?

While software naming issues show up in many areas – e.g., cataloguing software products of different types – there is one area where the naming problem is causing significant and ongoing harm: that is in software vulnerability management. Specifically, the naming problem makes it difficult – and often impossible - for a user organization to learn about vulnerabilities that are present in a software product it uses (whether in the product itself or in one of its components).

This is best illustrated in the case of CPE names in the NVD, discussed on pages 4-6 of the SBOM Forum’s (now the OWASP SBOM Forum’s) white paper on solving the naming problems in the NVD. If a software product can’t be accurately identified in a vulnerability database, the user will never be able to learn about vulnerabilities they need to remediate (most likely by regularly contacting the supplier’s help desk until they release a patch for the vulnerability).

Thus, if I had been asked, I would have suggested that the CISA paper ask and answer the question, “How can we make it more likely that users trying to learn about vulnerabilities in the software they use will be successful?” The answer to this question would certainly involve questions regarding the different identifiers available and how they can be properly utilized in vulnerability management, but also other problems like the structure and governance of vulnerability databases.

Unfortunately, this wasn’t the question that the CISA team asked – and answered – in their paper. What was the question they actually answered? While it was never stated directly, I would summarize it as the following:

“Any solution to the naming problem requires a single global uber-identifier, into which all other software identifiers can be mapped. What is that identifier?”

On the last page (page 22), they give their answer: There are three options that “can serve as starting points to refine the merits of various operational models.” They are:

1.      OmniBOR, which used to be known as GitBOM. Ed Warnicke, co-founder of GitBOM, provided a really interesting presentation to one of the NTIA working groups in (I believe) 2021, and I got quite excited after seeing it. The idea behind GitBOM was really intriguing, although it was clearly focused almost entirely on open source software. I’m sure there was some way that proprietary software could be handled by GitBOM, but it’s hard to call an identifier “universal” if it treats the software that runs probably 99% of organizations worldwide as kind of a second-class citizen. And if OmniBOR/GitBOM is restricted to just open source software, it immediately runs into the problem that one identifier, purl, has already conquered the open source world.

2.      CPE, the identifier on which the National Vulnerability Database (NVD) is based – as well as a small number of other databases that are based on the NVD but purport to make up for some of the NVD’s problems. To be fair, the CISA team doesn’t give CPE a whole-hearted endorsement. This is a good thing, since, far from being a solution to the naming problem, CPE is probably the biggest contributor to it.

3.      purl, which is now undoubtedly the most widely used software identifier worldwide and is very unlikely to be dislodged from that post. This is evidenced by the fact that I don’t know of any vulnerability database, other than the NVD and its derivatives, that is not based on purl. On the other hand, the vulnerability databases that use purl are all 100% focused on open source software. Since probably at least 90% of software products worldwide are open source (including at least 90% of components in proprietary software), this shows that purl is already close to being a universal identifier. But there’s no denying that it doesn’t now address proprietary software[ii] and that it doesn’t even fit all open source software perfectly.

However, CISA’s paper doesn’t even ask the real question, which is whether a) it would ever be possible to have a truly universal software identifier (which I doubt, at least in most of our lifetimes), and b) whether it’s even necessary to have a universal identifier to address the naming problem.

Of course, b) is the really interesting question. Previously, I used to think it would be impossible to have multiple software identifiers in a single database. Thus, the NVD and its imitators are based on CPE, while the databases that focus on open source are based on purl. Yea verily, never the twain shall meet – or at least that’s what I used to think.

However, I now realize that a single vulnerability database can easily utilize multiple software identifiers. For example, the OWASP SBOM Forum’s 2022 paper on the naming problem advocated incorporating purl identifiers into the NVD, but it also acknowledged that CPE identifiers will need to remain in the database for years, since there is such a wealth of information embedded with the CPEs now (more specifically, embedded in the CVE reports that call out those CPEs). While it’s nice to fantasize about transferring information now in CPEs to whatever will replace CPEs later on, the resources necessary to do this on the large scale that would be required are simply not available. For the foreseeable future, both CPE and purl will remain in active use, often in the same database, each including whatever data is now included with them.

There’s another identifier that is also available in different flavors: vulnerability identifiers (e.g., CVE, Google OSV, GitHub security advisories or GHSA, etc.). As with software product identifiers, the different vulnerability identifiers will need to continue to be available, often in the same database.

Why do I say that both software and vulnerability advisories need to continue to be used as they are today? After all, the CISA paper repeatedly discusses the need to “harmonize” the different software identifiers, meaning (of course) that they should be consolidated into one of the three identifier options listed at the end of the paper.

I used to agree with this idea, since it seemed out of the question that it would be advantageous to combine multiple identifiers for the “same” thing (e.g., software products or vulnerabilities) in one database. Why not choose one uber-identifier and map each name in the other identifiers to that one?

This would make sense if the items identified by the different identifiers were truly interchangeable. For example, it would make no sense to have different identifiers for different types of animals; they can all have a name that fits into a single taxonomy, which was initially developed by Linnaeus.

However, there are reasons why the different software identifiers can’t be easily consolidated into one. For example, take the case of CPE and purl. They’re both software identifiers, but what do they identify? CPE is a centrally administered identifier. They are created by members of the NIST NVD team, when a CVE report is submitted that refers to a software product for which the organization submitting the report (usually a proprietary software supplier that is also a CVE Numbering Authority or CNA) does not know of an existing CPE name. CPEs were designed with proprietary software suppliers in mind, since most CVE reports are submitted by such a supplier.

On the other hand, purl isn’t centrally administered at all, and it would make no sense to change it to be centrally administered (as the CISA paper suggests should happen). The whole point of purl is that the person who wants to learn about vulnerabilities in an open source software (OSS) product that they utilize (or an OSS component of a product they utilize) just needs to know three things about the product: the package manager (or similar ecosystem) from which they downloaded the product, the name of the product in that package manager, and the version that they downloaded (other information may be included, but is optional).

If they have these three pieces of information, the user can create a purl that should always match the purl for that same product (from the same package manager) in a vulnerability database. The fact that no centralized name database is required makes purl the ideal identifier in the open source world, which changes very rapidly and doesn’t rely on paid maintainers. Obviously, if a centralized database were required, someone would have to come up with a huge chunk of change to finance that effort.

Since purl requires knowledge of the package manager from which the software was downloaded, and since one open source project can be available in multiple package managers with slightly different code, this means that the single project can have multiple purls. And if the project consists of multiple modules (e.g., a library), each of those modules can have its own purl as well. Yet there can be only one CPE for the project (product). This means there’s no good way to map a single CPE to a single purl, unless some arbitrary decision is made about which purl maps to the CPE[iii].

Let’s go back to the question I would have asked, “How can we make it more likely that users trying to learn about vulnerabilities in the software they use will be successful?” The answer to this question now seems simple to me: We need to develop a vulnerability database that can accept queries made with any major software identifier (e.g., CPE or purl) or any major vulnerability identifier (e.g., CVE or OSV), and return whatever results the user would receive today if they were to make a query to a database that was designed around that identifier (for example, a user that queries the database for CVEs that correspond to a particular CPE name would receive the same response they would have received if they had queried the NVD using that same CPE name).

In fact, the new central database might not, strictly speaking, be a database at all but more of a “switchboard” that would relay each query to an appropriate “client” database (or even multiple client databases). It would then return to the user whatever response it received from the other databases (with an AI-based front-end module that would determine how best to reformulate and re-route each query). While this approach would probably not initially yield any more information than the user would have received had they queried the client database individually, it would at least centralize (and perhaps standardize) vulnerability queries. As time went on and additional funding became available, more efforts to harmonize and clean up the data (including the CVE reports in CVE.org) could be made.

In past months, I’ve advocated the idea of a Global Vulnerability Database, meaning one that’s sourced and supported globally. However, I’m now expanding my understanding of “global” to include the ability to accept queries for multiple software and vulnerability identifiers. Also, I’m also giving up my idea that the GVD could be built on top of an existing database like the NVD; it will have to be built from scratch, but it can well incorporate data and features from the existing vulnerability databases – and, of course, the existing databases would continue to do what they do now, since they would now, at least for many queries, become clients of the GVD.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.


[i] The OWASP SBOM Forum – at the time just the SBOM Forum – produced a paper in September 2022 that unabashedly aimed directly at the naming problem. We obviously weren’t following the lesson I’d learned in college, because we called the document “A Proposal to operationalize component identification for vulnerability management”. This paper was a direct assault on the naming problem, or at least the most prominent manifestation of this problem: CPE (Common Platform Enumeration) names found in the National Vulnerability Database (NVD). Not surprisingly, the paper didn’t lead to the CPE problem being solved, but it has proven to be very useful in discussions with various groups like the NVD team at NIST and the team at ENISA that is building a vulnerability database from scratch - in compliance with Section 12 of the EU NIS 2 cybersecurity regulation, which came into effect in 2022. 

[ii] The SBOM Forum’s paper includes a short description, on pages 12 and 13, of our idea for how to identify proprietary software using purl; there are certainly many other ways to do that. But it’s also true that purl identifiers for proprietary (or “closed source”) software will never be as robust as those for open source. 

[iii]  If the person that is mapping CPEs to purls knows from which package manager the software on which a CPE is based was downloaded, they could in theory map the CPE to the purl. But having that knowledge will always be the exception, never the rule.

No comments:

Post a Comment