Chris Hughes recently put out a very well-written post (not that this is any exception for him, of course) on the naming problem, which focuses on the document that CISA and DHS published a couple of months ago[i]. The latter document was also well written, but both it and Chris’ post suffer from the fact that they don’t start with a specific use case – and therefore they both end by throwing up their hands and saying the problem is “complicated”. I agree that the naming problem is complicated if you try to address multiple use cases, or worse, if you don’t specify any use case at all (in which case you implicitly assume the burden of addressing all use cases). But solving the naming problem, at least in principle, is easy if you confine yourself to a specific use case.
To identify your use case, ask why
software naming is important to you. After all, nobody loses sleep over the
fact that software names are currently confusing and inconsistent, unless this
hinders them in accomplishing something they need to accomplish.
Here are some (of many) use cases
for which a solution to the naming problem would be important:
1.
An end user recently
used an open source software product that they find helpful, but they don’t
know where to find it. In fact, they find it is available in multiple
repositories, although under slightly different names and version numbers. How
can they find the exact one that they used?
2.
A company has heard of
a proprietary software product that they want to buy for their own use, but
they don’t know the name of the supplier of that product.
3.
A software user has received
a software bill of materials (SBOM) from the supplier of a product they use
heavily. They would like to learn about vulnerabilities that apply to the
components found in the product, but the identifiers for the components in the
SBOM don’t seem to be in any vulnerability database they can find.
4.
A software supplier needs
to report a vulnerability in their product to CVE.org, which assigns CVE
numbers, but doesn’t know how to identify it.
These are very different use
cases, and there are many others. There’s no assurance that any discussion of a
solution to the “naming problem” will solve all of them or even more than one
of them. Thus, any discussion of the naming problem needs to start with an
identification of the use case being addressed. Items 3 and 4 are both part of
the use case discussed in this post.
The use case that drives this
discussion is identifying vulnerabilities applicable to a software product, and
I believe it’s behind both CISA’s and Chris Hughes’ arguments, although it’s not
explicitly stated. Of course, software vulnerability management is very
important for cybersecurity in general. The topic has taken on increasing
importance in the last couple of years, in good part because of the focus on
SBOMs. This is because:
1.
Without an SBOM, the
user of a software product, who is concerned about the product’s security,
would only need to have one identifier: that for the product (as well as the
version of the product) itself.
2.
If the user gets an
SBOM and wants to use that for vulnerability management, they suddenly need to have
an identifier for each of the components in that product, not just for the
product itself. Since the average software product has around 150 components,
you could say the identifier problem for them is now 150 times larger.
3.
If the SBOM doesn’t list
component identifiers that can be found in a vulnerability database, it isn’t
likely the user will ever be able to learn about vulnerabilities due to those
components. In other words, SBOMs will be useless for vulnerability management.
And this isn’t speculation: I’ve been told by multiple developers of software
and intelligent devices that only in a small percentage of cases can the
identifier for a component in an SBOM produced by an automated process be found
in a vulnerability database. Specifically, I’ve heard from a couple major software
suppliers that fewer than 5% of component identifiers meet that standard, in a
typical automated SBOM.
4.
This is why software
suppliers who wish to provide SBOMs to their customers that are usable for
vulnerability management (and if you look at articles and posts about SBOMs, a large
percentage of them – certainly the majority - focuses on vulnerability
management as the only use case) have to spend a lot of time finding useful component
identifiers. They need to utilize a whole grab bag of tools to do this: AI/ML,
fuzzy logic, guesswork, collections of documents like GitHub commits, prayer,
etc. Of course, none of this work can be fully automated. It’s like you rode a
high-speed train from Chicago to New York City (a guy can dream, right?), but you
had to travel the last mile to the station by oxcart.
Now that we know our use case is
vulnerability management, what do we need to find out first, in order to solve our
naming problem? That’s not a hard question; we need to find out what
identifiers are used in vulnerability databases today. It turns out there are
just two of them: CPE (found in the National vulnerability database or NVD) and
purl (found in almost every vulnerability database other than the NVD and
databases derived from the NVD. In fact, some very knowledgeable people –
including Philippe Ombredanne of nexB, the developer of the purl concept – have
told me they don’t know of a single public vulnerability database that is based
on anything other than CPE or purl, and the great majority of vulnerability
databases are based on purl). Of course, there are lots of other software
identifiers that are useful for payments to suppliers, licensing, etc. – but these
are all different use cases, and we don’t need to consider them any further
now.
At this point, let’s make the rest
of our job easier. Let’s agree that, in considering possible identifiers for
use in vulnerability databases, we confine ourselves to the ones that are
already in use. The only reason why we wouldn’t do this is if we examine both CPE
and purl, and decide that both of them suffer from serious problems, meaning we
should look elsewhere for identifiers.
Let’s look at CPE first. Both
Chris’ post and the CISA/DHS document do a good job of describing the concept
of CPE. I agree that CPE sounds great (well, good at least) in concept, but
what about in practice? Since CPE has been in existence for at least a couple
of decades, we don’t have to guess about how it will perform in practice; we
can look at its record. That record is laid out in some detail in the OWASP
SBOM Forum’s “Proposal
to Operationalize Component Identification for Vulnerability Management”, on pages 4-6 (I led development of that document,
although Steve Springett, Tony Turner, Kate Stewart and David Wheeler were
responsible for the ideas. Chris provided a very intelligent description of the
document in his post, which I greatly appreciated. It was definitely the best description
I’ve seen so far by anybody who wasn’t involved in writing the document).
Please read those pages for
yourself, but there’s one sentence that summarizes how suitable CPE is for the
use case we’re most concerned with: looking up components found in an SBOM in a
vulnerability database. The sentence is (p. 4): “Oracle Corporation estimates
they can identify CPEs for no more than 20% of the components in their software
products.” If you think of it, that’s quite an indictment. If Oracle can’t
identify CPEs for 20% of the components in their own products, what chance
does a poor end user like you have of identifying CPEs for the components
listed in an SBOM you receive from a supplier whose products you use? The
chance that a snowball has in the Infernal Regions? Not even that?
I hope you get the idea: CPE is a
big part of the problem. It’s definitely not part of the solution. Meanwhile,
it’s remarkable that literally every vulnerability database in the world
that isn’t the NVD, or one of the handful of direct derivatives of the NVD, uses
purl. Knowing that, I have to say “Case closed.” Purl has won the battle for
supremacy among software identifiers, although because of the huge base of CPE
data now, which is still quite valuable despite having a lot of errors, CPE
won’t go away anytime soon. However, what I would like to see is purl being
used as much as possible going forward.
For open source software
vulnerabilities, purl is the unquestioned king.
However, for proprietary software and for intelligent devices, there’s still
work to be done. Regarding proprietary software, in the paper we (i.e., the
OWASP SBOM Forum) proposed that SWID tags be made the basis for a new purl
type. Steve Springett – who is a purl maintainer and worked with Philippe
Ombredanne in some of the original development of the concept – has already
taken care of that, but what remains is to figure out the best way (or ways) to
make SWID tags (or more specifically, the information in SWID tags) available
to users of proprietary software, especially legacy proprietary software. There
are in fact many ways that could be done, but deciding which is/are best will
be a challenge. To be honest, we haven’t started to work on that yet.
Regarding devices, we proposed in
the paper that the existing GTIN and GMN naming conventions (which are
proprietary and licensed by GS1)
be used, since they are already being widely used for trade purposes. However,
I’m wondering how vulnerability reporting would work in that case, since the names
may be proprietary. I would like to explore the idea of developing new purl
types to handle devices.
It might seem strange that an
identifier that works well with open source software (OSS) would work well with
proprietary software, but especially with hardware devices. After all, purl for
OSS is based on where the software was downloaded from, and I don’t think
anybody has figured out how to download a hardware device yet (perhaps using
quantum teleportation?).
However, what is required for purl
to work is for the user to be able to construct the identifier based on information
they already know. For OSS, the user knows where they got the software. For
proprietary software, our proposal suggests that the user would know the
contents of a SWID tag for the product they’re using (although we haven’t done
the work to figure out the best way to make the contents available to the user –
they may need to get it from a pre-specified location on the supplier’s web
site).
For devices, I’m thinking there might
be an information source similar to a SWID tag – but I admit I haven’t talked
to anybody else about this yet.
The moral of this story is that
there’s no longer any question what the best identifier for software (and maybe
hardware) is when the use case is vulnerability management – that’s purl.
Fortunately or unfortunately, there is still a lot of work left to be done for
the vulnerability identification use case, including
1.
How to identify
proprietary software and devices using purl, as described above.
2.
How to report and track
vulnerabilities for hardware devices, since the vulnerabilities aren’t in – say
– the sheet metal or plastic that the device is made out of, but rather the software
and/or firmware installed in the device. I expect to write a post on that
question soon.
3.
There are peripheral
parts of the naming problem that need to be solved, including aliasing (which
applies primarily to proprietary products). Steve Springett has a nifty idea
for solving that problem known as Common Lifecycle Enumeration. If you’re
interested in working on that problem, I know he would love to hear from you.
If you email me, I’ll forward you to Steve.
Any opinions expressed in this blog post
are strictly mine and are not necessarily shared by any of the clients of Tom
Alrich LLC. If you would like to comment on what you have read here, I
would love to hear from you. Please email me at tom@tomalrich.com.
I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.
[i] The
post also focuses on Lindsey Cerkovnik of CISA’s presentation on naming at S4.
No comments:
Post a Comment