Sunday, May 19, 2024

Clarifying the Global Vulnerability Database

Brian Martin recently put up a long, thoughtful post on LinkedIn that critiqued my post from last November about what I called (and still call) the Global Vulnerability Database (GVD). That post was one of many I’ve written that started by being about one thing (in this case, a GVD that would be a single database) but evolved as I was writing the post into something else by the end (something that isn’t a single database at all, but more of an “intelligent switching hub” for evaluating vulnerability database queries and routing them among the most appropriate existing databases).

This evolution didn’t bother me, since one of the advantages of calling something you write a blog post, rather than an essay or a white paper, is that you don’t have to go back and rewrite the whole post when such an evolution occurs – people expect inconsistency (and I deliver it, I’m proud to say). I figured that it wasn’t worth rewriting the post unless it drew a lot of interest, so I let it stand as I’d written it.

It didn’t draw much interest until Brian wrote his post a couple of weeks ago. Brian has a lot of experience with vulnerability databases and has a good following for his thoughts, so his post drew a lot of comments.

When I read Brian’s post, I realized that a number of his objections wouldn’t have been valid if I’d rewritten the post so that it focused on the single idea of a switching hub, not an actual database – although I presume I won’t be thrown in jail if users think of it as a single database. The point is that it should be possible in 2024 to field diverse queries – regarding different types of products (open source software, proprietary or “closed source” software, and intelligent devices), different types of vulnerabilities (CVE, OSV, GitHub Security Advisories, etc.), and different identifiers (CPE and purl) - and have an intelligent engine that decides, for each query, which is the best database or combination of databases to resolve the query.

Once that decision has been made, the appropriate queries will go out to the different individual databases (CVE.org, the NVD if it still exists, OSV, OSS Index, VulnCheck, VulnDB, etc.). Then, the results will be processed and returned to the user as an answer to their question. There would need to be a lot of intelligence behind both of these steps, since they won’t be easy at all (and they will require quite a lot of prior knowledge, such as whether a report in OSS Index that a particular software product – identified with a purl – is affected by a CVE has the same status as a report in CVE.org that the same purl is affected by the same CVE, since they will have been derived very differently).

To rectify my sin of last November in not rewriting my post before I put it up, I put up a new post on May 9. This made a single, coherent statement, but still doesn’t include all the detail (such as what’s in the paragraphs above) that I would include if I had the time to write a white paper. I think this answers many of Brian’s questions (such as whether the GVD would be hugely expensive and require legions of volunteers. That would be the case if we tried to put up a single “harmonized” database that maintains data from all existing vulnerability databases, and that’s the reason why last November I switched in mid-post to the idea of a switching hub).

However, Brian brought up one important issue that I want to address now (he brought up others, which I hope were mostly addressed in my May 9 post. If there are other issues that you still want me to address, Brian, please let me know).

Is CPE a dead end?

Brian repeated a sentence from my November post, “Specifically, a new identifier is needed for proprietary software, since I (and others) regard CPE as a dead end, even though it was pioneering in its time.” Brian ended up basically agreeing with that statement, but his reasoning isn’t mine, and I’d like to describe that.

Pages 4-6 of the OWASP SBOM Forum’s 2022 paper on how to fix the naming problem in the NVD (which is still valid today, even though the NVD now seems to be on its way to extinction or worse: irrelevance) describe some serious problems with CPE. However, they don’t address what I consider to be the most serious problem: the fact that there will never be a way to populate fields like vendor and product name in a way that will be universally agreed on, without resort to more databases – which themselves will need to be constructed, maintained, etc.

For example, a CPE listing “Microsoft” as the vendor will be different from one listing “Microsoft, Inc”, which will be different from one listing “Microsoft, Inc.” with a period, etc. Because a CPE won’t be easily found unless it matches the CPE that’s searched for, trying to search for a particular product or vendor always involves guessing about the choices that were made by the person (usually an NVD staff member, until recently) who created the CPE. The NVD may have some sort of criteria they follow (e.g., “Always put a comma before ‘Inc’ and a period after it.”), but they’re clearly just rough rules of thumb if they exist at all, since CPE names vary for seemingly random reasons.

Because, as Brian points out, the CNAs will probably be creating most CPE names from now on and the CNA is often the developer of the product being named, this in theory is better. Yet, how is an end user that wants to know whether the product I’m using to write this post is called “Microsoft Word”, “Word”, “Word 365”, “Microsoft Office Word”, etc. going to find which one of those Microsoft (which is a CNA) uses? Even worse, the product name might vary by the division within Microsoft that creates the CPE, etc.

You might say something like, “What does Microsoft call the product on their web site?” And I ask, which of the Microsoft web sites are you referring to? Is Microsoft going to enforce standard naming across all web sites worldwide? And what about blog posts on the Microsoft sites? Will they follow some sort of internal Microsoft standard? Etc., etc.

What some people, including some who should know better, have suggested is that there should be a centralized database of product names, company names, version strings (since versions can be identified in many ways), etc. Then “all you have to do” to find the correct CPE is look up the company name, product name, and version string (which also varies a lot) in the directory. The company will hopefully rigorously enforce use of their chosen names, and the CNAs will be severely disciplined if they use any other in naming their products in a CVE report…And while we’re at it, the lion will lie down with the lamb and I will study war no more and people will stop having loud cell phone conversations on trains; that is, all the world’s problems will be solved…

By the way, who will pay for that inordinately expensive database of product and company names? It will cost a huge amount of money, both to put together and to maintain – much more than the cost of the NVD and CVE.org databases combined. Face it: an identifier that requires an expensive auxiliary database to make it work is a dead end. Even if all the other problems with CPE didn’t exist, this alone would ultimately sink it.

This is why the OWASP SBOM Forum recommended purl as the replacement for CPE in our 2022 paper. The paper goes to inordinate lengths to explain why purl is better, but the main reason is that no lookup is required. As long as you know the package manager (or source repository) that you downloaded an open source component from, as well as the name and version string in that package manager, you can create a purl that will always let you locate the exact component in a vulnerability database. This is why purl has literally won the battle to be the number one software identifier in vulnerability databases worldwide, and literally the only alternative to CPE.

Currently, there are no purls in CVE.org. However, the fact that CVE now supports purl in CVE Format 5.1 (formerly “CVE JSON spec v5.1) – a change requested by the SBOM Forum two years ago – means there will be purls when the CNAs start adding them to their CVE reports (which unfortunately will probably not be soon, given the substantial training that will need to be conducted.

However, there is one big fly in the purl ointment: It currently doesn’t support proprietary (or “closed source”) software. Our 2022 paper did suggest a solution for that problem (proposed by Steve Springett, who is a purl maintainer, among many other things): There should be a new purl type called SWID, which will be based on the contents of a SWID tag created by the supplier. Anybody with the SWID tag for the product they want to inquire about (and for at least a few years, some big software suppliers like Microsoft included a SWID tag with the binaries for all of their products) will be able to create exactly the same purl that the supplier used to report the vulnerability. In fact, Steve got the SWID type added to purl.

What’s preventing this from being the solution for naming proprietary software is that there’s no good way for an end user, who might not have access to the binaries of a product they’re using – or who is using a legacy product that doesn’t have a SWID tag – to find the tag, if there is one.

I think this is a solvable problem, but it will depend – as a lot of worthwhile practices do – on a lot of people taking a little time every day to solve a problem for everybody. In this case, software suppliers will need to create a SWID tag for every product and version that they produce or that they still support. They might put all of these in a file called SWID.txt at a well-known location on their web site. An API in a user tool, when prompted with the name and version number of the product (which the user presumably has), would go to the site and download the SWID tag – then create the purl based on the contents (there are only about four fields needed for the purl, not the 80 or so in the original SWID spec).

There can be other solutions like this as well, and they don’t even have to be based on SWID tags (as long as they’re based on purl). The point is that we should no longer have to rely on a software identifier like CPE, that requires a separate database (or databases) to work. Of course, since there are so many CVE reports that have only CPEs on them (in fact, I think they all do today), it will be years (if not decades) before we can finally be done with CPE. But we should try to move to purls as soon as possible, so we can at least stop the bleeding.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com. Also, if you would like to learn more about or join the OWASP SBOM Forum, please email me.

My book "Introduction to SBOM and VEX" is now available in paperback and Kindle versions! For background on the book and the link to order it, see this post.

No comments:

Post a Comment