Vulnerability database expert Brian Martin and I have been having a good back-and-forth discussion on LinkedIn about vulnerability database issues in general, including discussion of my proposal for a Global Vulnerability Database.
Today, Brian put up a new post that moves the discussion forward. His post includes 6 or 7 passages that point to what I think are common misconceptions that haven’t been well articulated previously. Because Brian has articulated them so clearly, I want to comment on each one of them. I’ll quote each of Brian's passages in red and then comment in black italics:
While a Persistent Uniform Resource
Locators (PURL) is one solution, it isn’t the only one used by vulnerability
databases. So not only do you need to have an intelligent mapping from PURL to
PURL, you also need it from CPE to PURL, and possibly other identifiers. It’s
easy to have multiple valid PURLs all for the same piece of software.
BTW, purl in this context stands for “package URL”. Here
is a good description of purl, posted by Philippe Ombredanne, the creator of
purl.
Brian, when you say “mapping from purl to purl”, I think
you’re talking about my earlier comment about comparing a CVE-purl connection
in OSS Index with the same connection in CVE.org (once the CNAs start creating
those). That’s a very special case, which I’d prefer to discuss with you
offline.
However, “mapping CPE to purl” is literally impossible if
there is more than one package manager for a particular OSS project. This is
because most CPEs for open source software don’t refer to the package manager
(except sometimes as part of the product name), meaning the user has no way of
knowing which PM the vulnerability is found in.
Regarding the last sentence, “It’s easy to have multiple
valid PURLs all for the same piece of software”, the problem is there’s no way
to be certain that the code for a product named “log4core” in one package
manager is bit-for-bit identical to the code for the “same” product in another
package manager. Given that, the fact that CVE-12345 is found in one PM doesn’t
allow you to conclude that it will be found in another PM.
This in one way is a limitation of purl, since you can’t make
a statement that for example, CVE-12345 applies to all package managers that
contain a product called “log4core”. You can only make that statement if you
have tested log4core in all package managers. Purl will keep the CNA honest,
meaning they will only list a purl in a CVE report if they have tested the
product in that package manager – and a user should never assume a CVE in one
PM will apply to another. In other words, CPE gives the user a false sense of
comprehensiveness.
Somewhere there are / were CPE
specifications, likely before NVD took control of it. Early in the VulnDB days,
we used them so we could generate our own CPE for products that didn’t appear
in NVD. The fact that a seasoned vulnerability practitioner isn’t sure
standards exist speaks volumes to how poorly NVD has managed CPE.
As unaccustomed as I am to defending NVD, I need to do so
now. There’s simply no way there can be a unique CPE for any product – i.e.,
one that any user will always be able to create accurately. Pages 7-9 of the OWASP
SBOM Forum’s 2022 document
on the naming problem differentiate extrinsic identifiers like CPE from
intrinsic identifiers like purl.
Briefly, an extrinsic identifier requires the user to do
a lookup to at least one external database, before they can be sure they have
the correct identifier. In the case of CPE, that database is the CPE Dictionary.
On the other hand, an intrinsic identifier like purl just requires the user to enter
information they already know with certainty: the package manager from which
they downloaded the software, the product name in that package manager, and the
version string in that package manager.
The reason that CPE is ultimately unworkable is the fact
that creating a CPE name usually requires making arbitrary choices (e.g., “version
1.2” vs. “v1.2”), rather than only requiring information that can always be
exactly verified by a user, Nobody can know for sure what choice was made by
the person that created the CPE without doing a search of the CPE dictionary,
and perhaps multiple searches using fuzzy logic or something like that.
(quoting Tom) “As long as you know
the package manager (or source repository) that you downloaded an open source
component from, as well as the name and version string in that package manager,
you can create a purl that will always let you locate the exact component in a
vulnerability database. This is why purl has literally won the battle to be the
number one software identifier in vulnerability databases worldwide, and
literally the only alternative to CPE.”
Unless… you end up having half a
dozen PURLs for the same package, because it is available on a vendor’s page,
GitHub, GitLab, Gitee, and every package manager out there.
And this is exactly the point about using purl in a vulnerability database: It only tells you what the CNA that created the CVE report with purl knows: the package manager, product name and version string of the software in which they found the vulnerability. The user can’t draw any conclusion about a product with the same name and version string in any other PM, unless the CNA that produced the report added purls for them as well (meaning they tested the same product and version in each PM).
Who will maintain this epic list of
PURLs? As of this blog, there are only 379 CNAs with tens of thousands of
software companies out there. Not to mention the over one hundred million
repositories on GitHub alone. While a PURL may be an open standard where CPE is
not, it forces the community to set a PURL for every instance of the location
of that software. That sounds like the big database you don’t think is viable?
Again, that’s the point of purl: no list is required. Any user can create the correct purl just from the three pieces of information they already know. As Steve Springett often says, every open source product in a package manager already has a purl – there’s no need to create it.
(quoting Tom) “However, there is one
big fly in the purl ointment: It currently doesn’t support proprietary (or
“closed source”) software.”
And the other shoe drops. =) So, this
is not a critique by any means, just highlighting the problems the community
faces. The problems we faced 10 years have just compounded and here we are. Not
that there were realistic solutions to all of these problems back then, and
even if there were, we certainly didn’t address them then.
That’s correct. Currently, purl
only covers open source software, although Steve Springett (who worked with
Philippe to create purl, as mentioned in Philippe’s post that I linked above)
points out that any online software “store” (Google Play, the Apple Store,
etc.) could easily be made into a purl type, since the store controls the
namespace of the proprietary products that are for sale in the store (just like
a package manager controls the namespace of the packages in the PM).
In other words, what is needed
is a controlled namespace, so one product will always have one name. Steve also
suggested that SWID tags could be a more general way to identify proprietary
software. He wrote the purl PR for a new identifier called SWID – which was
adopted in 2022. See below.
(quoting Tom) “I think this is a
solvable problem, but it will depend – as a lot of worthwhile practices do – on
a lot of people taking a little time every day to solve a problem for
everybody. In this case, software suppliers will need to create a SWID tag for
every product and version that they produce or that they still support. They
might put all of these in a file called SWID.txt at a well-known location on
their web site. An API in a user tool, when prompted with the name and version
number of the product (which the user presumably has), would go to the site and
download the SWID tag – then create the purl based on the contents (there are
only about four fields needed for the purl, not the 80 or so in the original
SWID spec).”
Unfortunately, I think at this point,
this is a pipe dream. I am quite literally discovering new, well-known
“standards” only by seeing them as requests ending in a 404 response in my web
logs. So any such solution based on well-known I think isn’t viable now, and
likely won’t be moving forward.
Please read what the OWASP SBOM Forum proposed regarding
SWID on pages 11 and 12 of our 2022 paper. The point is that there needs to be
some unique user-discoverable source of information on the product. Otherwise,
the only alternative is to create (and maintain) a hugely expensive database of
all proprietary software, along with the different product names and vendor
names it was associated with through its lifetime – and that requires a huge
number of very subjective judgments.
For example, if Product A from Vendor X is sold to Vendor
Y who renames it Product B, is it the same product or not? If B is very
different from A, you would just say it’s different. But if B is literally just
A with a different name, you’d say it’s the same. Where do you draw the line
between these two cases? There’s simply no way to do so.
There are certainly other ways that information on proprietary software could be made user-discoverable, so that no big secondary database (probably much larger than the vulnerability database itself) is required. One way is Steve Springett’s Common Lifecycle Enumeration project. That will take much longer to put in place than our SWID proposal, but IMO is ultimately the correct thing to do. If you have other ideas, we’d love to hear them.
(Tom here) Of course, all of the above discussions are examples of the Naming Problem. There’s no question that this problem will be with us for a long time and will never be “solved” in any final way. However, the good news about the Global Vulnerability Database idea is that the naming problem doesn’t need to be solved first, precisely because the GVD won’t require “harmonization” of software names.
The software will be named what it’s named in the vulnerability databases to which queries are routed; it will be up to the individual databases to continue their (presumably ongoing) efforts to improve their naming. If there's reason to believe there are serious naming problems in one vuln DB, the GVD might suspend routing queries to it. The GVD will be no more accurate than the individual DBs, but it won’t be less accurate, either.
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com. Also, if you would like to learn more about or join the OWASP SBOM Forum, please email me.
My book "Introduction to SBOM and VEX"
is now available in paperback
and Kindle versions! For background on the book and the link to order it,
see this post.
No comments:
Post a Comment