"He who defends everything defends nothing."
Frederick the Great
Steve Springett, leader of the
OWASP Dependency Track and CycloneDX projects, led off last Friday’s OWASP SBOM
Forum meeting (which didn’t have a fixed topic. This often happens, and as usual
led to a better meeting than we could ever have planned) by saying that the
biggest reason why suppliers aren’t regularly producing VEX documents for
customers is that it’s so expensive to do so.
This surprised me. I knew there
are multiple reasons why suppliers aren’t producing VEXes, but I had never even
thought about cost as a reason. Since there are already many open source tools
for producing and interpreting CycloneDX VEX documents (because the CDX VEX format
is based on the same base code that CDX SBOMs, HBOMs, OBOMs, MLBOMs, VDR, etc.
are built on), it certainly didn’t seem at first glance that there should be
much if any cost to producing VEXes. And since Steve’s day job is ensuring that
over 1,000 developers at ServiceNow follow best practices for software security,
it certainly seems that one of them might be assigned to producing VEXes on a
part-time basis. So, where does the cost come from?
Steve elaborated, saying that the
big problem is that nobody knows exactly what VEX is and there is no fixed specification
for it in either of the two primary VEX formats, CSAF and CycloneDX (the spec
will need to be specific to the format. The formats are very different, and
there’s no way to produce a common spec. Even the only SBOM spec – the 7 minimum
fields listed in the NTIA Minimum Elements document – while it kinda sorta
applies to both the SPDX and CDX SBOM formats, it is so minimal that it is literally
useless by itself. If you want to produce useful SBOMs, you have to go
beyond those 7 fields).
While Steve has no control over
CSAF of course, he can certainly find someone to develop a CDX VEX spec. But,
until someone can tell him exactly what a VEX is and – just as importantly –
what it isn’t, there’s no point in even trying to develop a spec. If you wanted
to address all the possible VEX use cases being discussed by the CISA VEX
working group, you would have to develop a bloated spec that covered all of
these use cases, which would then require complicated, bloated tools to produce
and interpret VEXes.
But why can’t somebody tell Steve
what a VEX is? Surely, he has friends that know. I’m his friend – he could ask
me. However, I’ve already told him that as far as I’m concerned, the term VEX
has no meaning anymore, other than that it’s a document that makes statements
about the status of vulnerabilities in software products. A good example of
this fact is an OpenSSF document
that was put in the chat at this week’s CISA VEX meeting. I haven’t read even
half of the document and don’t intend to read any more, but just by reading the
first 4 or 5 paragraphs, I’ve identified at least 5 separate use cases, all of which
the authors consider to be VEX.
You might ask, what’s the problem
with spreading a big tent and allowing a very diverse group in? I don’t have a
problem with diversity when it comes to human beings, but when it comes to a
format for machine-readable documents, it quickly leads to the need for tooling
(both to produce and consume the documents) that is hugely time-consuming to
develop – which, of course, is exactly what Steve was talking about.
Here's the problem. I’ve never
seen it articulated this way, and I would certainly like to hear from anybody
who says they see a problem with my logic (and is willing to articulate that
problem, of course. Idle sniping is not allowed in comments on my blog posts):
1.
The cost of developing
a software tool to produce or consume machine-readable documents depends on the
number of independent operations that need to be accommodated in the
tool. For example, if you develop a tool to add positive integers, but then you
decide to incorporate subtraction into the tool, you have doubled the number of
independent operations. And if you add to that a requirement to display the result
of the operation in red if it is over 100, you have tripled the number of independent
operations.
2.
It’s important to note
that having a bunch of mandatory fields in a document format does not in itself
increase the number of operations, as long as those fields just insert text at
various places in the document (which is all the same operation, even though
the location of the text changes).
3.
However, the cost of
developing the tool doesn’t go up linearly with the number of independent
operations; in other words, the cost doesn’t double when one new operation is
added, triple when two new operations are added, etc. Instead, the cost is proportional
to the factorial of the number of independent operations. The factorial
of X is the number of ways that a group of X independent (not identical)
objects can be arranged. 1 factorial (written 1!) equals one. 2 factorial (2!) equals
2 X 1 = 2. 3 factorial equals 3 X 2 X 1 = 6. 4 factorial equals 4 X 3 X 2 X 1 =
24. Etc.
4.
Why do the tool costs go
up according to the factorial of independent options? It’s because the
developer needs to take account of each possible arrangement of independent operations.
To take our example of 3 independent operations, namely addition, subtraction,
and a display rule that requires changing the result’s color to red, the tool
will have to be able to produce or consume a document that contains any combination
of those 3 operations in any order (there are three of these
combinations). It will also need to produce or consume the possible combinations
of 2 options, which is 2; and of course, there is only one possible combination
of 1 option. Adding these together, you get 6 possible combinations (please
check my math. It’s been a long time since Mrs. Clauser’s first grade class went
over addition).
Of course, handling 6 possible combinations
of operations doesn’t seem like a huge hurdle for most developers (although it
would be for me!). What about when the number is 5? Then it’s 120. How about
10? That’s 3.6 million. And how about 20? That’s 2.4 quadrillion, give or take
a hundred trillion. You get the idea…the cost of developing a tool to produce
or consume a machine-readable document will rapidly escalate as each new independent
option is added.
So here’s the question: If we want
to develop a tool that creates or consumes VEX documents, how many independent
operations does it need to perform? Another way to put that is to ask, “If I
were to develop a tool to produce a VEX document from scratch, for a user (e.g.,
a software supplier) that doesn’t know anything about VEX, how many independent
questions would the tool need to ask them, in order to produce a VEX in one of
the two formats? Note there should be no need to ask a question about a text
field, since the user can just fill it in themselves.
Let’s start with CycloneDX VEX. I
once asked myself how many questions would need to be asked, to produce a CDX
VEX document. To answer that question, I looked at the examples contained in
the CISA VEX
Use Cases document, which is the best document on VEX written so far. In
fact, I don’t recommend you even read any of the NTIA or CISA VEX documents
other than Use Cases and Status
Justifications. The answer was about nine. 9! is 362,880, meaning a tool to
produce or consume a CDX VEX document would need to be able to accommodate
362,880 independent use cases. Does that seem like a lot?
Not in comparison to CSAF. If you
look at the closest thing to a specification for a CSAF 2.0 VEX, the
VEX profile that Thomas Schmidt of the German BSI created (BTW, there was
no CSAF 1.0. CVRF, the predecessor to CSAF, was renamed CSAF 2.0 when it came
time to develop an update to CVRF 1.0. The OASIS committee that originally developed
CVRF was called CSAF, so they named the new version after themselves. Nothing wrong
with that, of course), you will think that this is a really simple format. In
fact, I believe that the profile only two or three independent operations. Any field that is mandatory in every VEX document just counts as a single operation, but there are a couple fields in the CSAF VEX profile that depend on the contents of another field - that's an independent operation.
However, there’s a huge omission
in the VEX profile: Every CSAF document requires the “product tree” and “branches”
fields. If these were simple text fields, that would be no problem; they would
add at most one new operation.
Unfortunately, these two fields
add a lot more operations than one. How many do they add? I have never even tried
to answer that question, since I have never felt like devoting the week or two
(I kid you not) that would probably be required to develop a good understanding
of those fields. In order to understand those two fields, you should open the 100-page
(or so) CSAF
2.0 specification and start reading at least with Section 2 (Design
Considerations); then read everything up to Section 4. At that point, you might
have an idea of how many independent operations are required adequately to
address all the possibilities in just those two fields. Of course, I have no
idea how many operations that is (I can’t even count the number of pages, since
they’re not numbered). But I’m sure there are at least 50 independent
operations.
What’s 50 factorial? 3.0414093e+64;
for comparison purposes, the number of atoms in the universe is between 10e+78
and 10e+82, although I admit I haven’t counted them lately. How many lifetimes
of the universe would it take for every software developer who’s ever lived, or
will live, to code every one of those operations in a tool? I don’t know that,
but I’m sure even that is a big number.
But even if we go back to
CycloneDX VEX, with its lowly count of 362,880 independent operations, you can
see it would be ridiculous even to try to account for all of them in any
tool. What needs to be done is to constrain both VEX specs (CDX and CSAF) so they
only include 1-3 independent operations each. This will of course mean that some
people will be disappointed that their favorite use case can’t be accommodated
in our spec; on the other hand, we will make sure we include the use cases in
the NTIA VEX One-pager, as well as in this
unpublished Google Docs paper, which Allan Friedman drafted and which I still
think is the best document (published or unpublished) about VEX.
Once we have the spec developed,
toolmakers will be able to begin developing both VEX production and consumption
tools – in fact, one of the members of our group, Anthony Harrison of
the UK, has already developed proof of concept tools for CSAF production and
consumption, so he might modify those to accommodate VEX.
At that point, we will hopefully
have a workable VEX spec!
Any opinions expressed in this
blog post are strictly mine and are not necessarily shared by any of the
clients of Tom Alrich LLC. If you would like to comment on what you have
read here, I would love to hear from you. Please email me at tom@tomalrich.com.
I lead the OWASP SBOM Forum. If
you would like to learn more about what that group does, please go here.
No comments:
Post a Comment