Brian Martin recently put up a long, thoughtful post on LinkedIn that critiqued my post from last November about what I called (and still call) the Global Vulnerability Database (GVD). That post was one of many I’ve written that started by being about one thing (in this case, a GVD that would be a single database) but evolved as I was writing the post into something else by the end (something that isn’t a single database at all, but more of an “intelligent switching hub” for evaluating vulnerability database queries and routing them among the most appropriate existing databases).
This evolution didn’t bother me, since one of the advantages
of calling something you write a blog post, rather than an essay or a white
paper, is that you don’t have to go back and rewrite the whole post when such
an evolution occurs – people expect inconsistency (and I deliver it, I’m proud
to say). I figured that it wasn’t worth rewriting the post unless it drew a lot
of interest, so I let it stand as I’d written it.
It didn’t draw much interest until Brian wrote his post a
couple of weeks ago. Brian has a lot of experience with vulnerability databases
and has a good following for his thoughts, so his post drew a lot of comments.
When I read Brian’s post, I realized that a number of his
objections wouldn’t have been valid if I’d rewritten the post so that it
focused on the single idea of a switching hub, not an actual database –
although I presume I won’t be thrown in jail if users think of it as a single
database. The point is that it should be possible in 2024 to field diverse
queries – regarding different types of products (open source software,
proprietary or “closed source” software, and intelligent devices), different
types of vulnerabilities (CVE, OSV, GitHub Security Advisories, etc.), and
different identifiers (CPE and purl) - and have an intelligent engine that
decides, for each query, which is the best database or combination of databases
to resolve the query.
Once that decision has been made, the appropriate queries will
go out to the different individual databases (CVE.org, the NVD if it still
exists, OSV, OSS Index, VulnCheck, VulnDB, etc.). Then, the results will be processed
and returned to the user as an answer to their question. There would need to be
a lot of intelligence behind both of these steps, since they won’t be easy at
all (and they will require quite a lot of prior knowledge, such as whether a
report in OSS Index that a
particular software product – identified with a purl – is affected by a CVE has
the same status as a report in CVE.org that the
same purl is affected by the same CVE, since they will have been derived very
differently).
To rectify my sin of last November in not rewriting my post
before I put it up, I put up a new post
on May 9. This made a single, coherent statement, but still doesn’t include all
the detail (such as what’s in the paragraphs above) that I would include if I
had the time to write a white paper. I think this answers many of Brian’s questions
(such as whether the GVD would be hugely expensive and require legions of
volunteers. That would be the case if we tried to put up a single “harmonized”
database that maintains data from all existing vulnerability databases, and
that’s the reason why last November I switched in mid-post to the idea of a
switching hub).
However, Brian brought up one important issue that I want to
address now (he brought up others, which I hope were mostly addressed in my May
9 post. If there are other issues that you still want me to address, Brian,
please let me know).
Is CPE a dead end?
Brian repeated a sentence from my November post, “Specifically,
a new identifier is needed for proprietary software, since I (and others)
regard CPE as a dead end, even though it was pioneering in its time.” Brian ended
up basically agreeing with that statement, but his reasoning isn’t mine, and I’d
like to describe that.
Pages 4-6 of the OWASP SBOM Forum’s 2022 paper
on how to fix the naming problem in the NVD (which is still valid today, even
though the NVD now seems to be on its way to extinction or worse: irrelevance)
describe some serious problems with CPE. However, they don’t address what I
consider to be the most serious problem: the fact that there will never be
a way to populate fields like vendor and product name in a way that will be
universally agreed on, without resort to more databases – which themselves will
need to be constructed, maintained, etc.
For example, a CPE listing “Microsoft” as the vendor will be
different from one listing “Microsoft, Inc”, which will be different from one
listing “Microsoft, Inc.” with a period, etc. Because a CPE won’t be easily found
unless it matches the CPE that’s searched for, trying to search for a
particular product or vendor always involves guessing about the choices that
were made by the person (usually an NVD staff member, until recently) who
created the CPE. The NVD may have some sort of criteria they follow (e.g., “Always
put a comma before ‘Inc’ and a period after it.”), but they’re clearly just
rough rules of thumb if they exist at all, since CPE names vary for seemingly
random reasons.
Because, as Brian points out, the CNAs will probably be
creating most CPE names from now on and the CNA is often the developer of the
product being named, this in theory is better. Yet, how is an end user that
wants to know whether the product I’m using to write this post is called
“Microsoft Word”, “Word”, “Word 365”, “Microsoft Office Word”, etc. going to
find which one of those Microsoft (which is a CNA) uses? Even worse, the
product name might vary by the division within Microsoft that creates the CPE, etc.
You might say something like, “What does Microsoft call the
product on their web site?” And I ask, which of the Microsoft web sites are you
referring to? Is Microsoft going to enforce standard naming across all web
sites worldwide? And what about blog posts on the Microsoft sites? Will they follow
some sort of internal Microsoft standard? Etc., etc.
What some people, including some who should know better, have
suggested is that there should be a centralized database of product names,
company names, version strings (since versions can be identified in many ways),
etc. Then “all you have to do” to find the correct CPE is look up the company name,
product name, and version string (which also varies a lot) in the directory. The
company will hopefully rigorously enforce use of their chosen names, and the
CNAs will be severely disciplined if they use any other in naming their products
in a CVE report…And while we’re at it, the lion will lie down with the lamb and
I will study war no more and people will stop having loud cell phone
conversations on trains; that is, all the world’s problems will be solved…
By the way, who will pay for that inordinately expensive
database of product and company names? It will cost a huge amount of money,
both to put together and to maintain – much more than the cost of the NVD and
CVE.org databases combined. Face it: an identifier that requires an expensive
auxiliary database to make it work is a dead end. Even if all the other
problems with CPE didn’t exist, this alone would ultimately sink it.
This is why the OWASP SBOM Forum recommended purl as the replacement
for CPE in our 2022 paper. The paper goes to inordinate lengths to explain why
purl is better, but the main reason is that no lookup is required. As long as
you know the package manager (or source repository) that you downloaded an open
source component from, as well as the name and version string in that
package manager, you can create a purl that will always let you locate the
exact component in a vulnerability database. This is why purl has literally won the battle
to be the number one software identifier in vulnerability databases worldwide,
and literally the only alternative to CPE.
Currently, there are no purls in CVE.org. However, the fact
that CVE now supports purl in CVE Format 5.1 (formerly “CVE JSON spec v5.1) – a
change requested by the SBOM Forum two years ago – means there will be purls
when the CNAs start adding them to their CVE reports (which unfortunately will
probably not be soon, given the substantial training that will need to be
conducted.
However, there is one big fly in the purl ointment: It
currently doesn’t support proprietary (or “closed source”) software. Our 2022
paper did suggest a solution for that problem (proposed by Steve Springett, who
is a purl maintainer, among many other things): There should be a new purl type
called SWID, which will be based on the contents of a SWID tag
created by the supplier. Anybody with the SWID tag for the product they want to
inquire about (and for at least a few years, some big software suppliers like
Microsoft included a SWID tag with the binaries for all of their products) will
be able to create exactly the same purl that the supplier used to report the
vulnerability. In fact, Steve got the SWID type added to purl.
What’s preventing this from being the solution for naming
proprietary software is that there’s no good way for an end user, who might not
have access to the binaries of a product they’re using – or who is using a
legacy product that doesn’t have a SWID tag – to find the tag, if there is one.
I think this is a solvable problem, but it will depend – as
a lot of worthwhile practices do – on a lot of people taking a little time every
day to solve a problem for everybody. In this case, software suppliers will
need to create a SWID tag for every product and version that they produce or that
they still support. They might put all of these in a file called SWID.txt at a
well-known location on their web site. An API in a user tool, when prompted
with the name and version number of the product (which the user presumably
has), would go to the site and download the SWID tag – then create the purl
based on the contents (there are only about four fields needed for the purl,
not the 80 or so in the original SWID spec).
There can be other solutions like this as well, and they don’t
even have to be based on SWID tags (as long as they’re based on purl). The
point is that we should no longer have to rely on a software identifier like
CPE, that requires a separate database (or databases) to work. Of course, since
there are so many CVE reports that have only CPEs on them (in fact, I think
they all do today), it will be years (if not decades) before we can finally be
done with CPE. But we should try to move to purls as soon as possible, so we can
at least stop the bleeding.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com. Also, if you would like to learn more about or join the OWASP SBOM Forum, please email me.
My book "Introduction to SBOM and VEX"
is now available in paperback
and Kindle versions! For background on the book and the link to order it,
see this post.
No comments:
Post a Comment