Thursday, October 19, 2023

Making VEX work

           "He who defends everything defends nothing."

                                                            Frederick the Great

Steve Springett, leader of the OWASP Dependency Track and CycloneDX projects, led off last Friday’s OWASP SBOM Forum meeting (which didn’t have a fixed topic. This often happens, and as usual led to a better meeting than we could ever have planned) by saying that the biggest reason why suppliers aren’t regularly producing VEX documents for customers is that it’s so expensive to do so.

This surprised me. I knew there are multiple reasons why suppliers aren’t producing VEXes, but I had never even thought about cost as a reason. Since there are already many open source tools for producing and interpreting CycloneDX VEX documents (because the CDX VEX format is based on the same base code that CDX SBOMs, HBOMs, OBOMs, MLBOMs, VDR, etc. are built on), it certainly didn’t seem at first glance that there should be much if any cost to producing VEXes. And since Steve’s day job is ensuring that over 1,000 developers at ServiceNow follow best practices for software security, it certainly seems that one of them might be assigned to producing VEXes on a part-time basis. So, where does the cost come from?

Steve elaborated, saying that the big problem is that nobody knows exactly what VEX is and there is no fixed specification for it in either of the two primary VEX formats, CSAF and CycloneDX (the spec will need to be specific to the format. The formats are very different, and there’s no way to produce a common spec. Even the only SBOM spec – the 7 minimum fields listed in the NTIA Minimum Elements document – while it kinda sorta applies to both the SPDX and CDX SBOM formats, it is so minimal that it is literally useless by itself. If you want to produce useful SBOMs, you have to go beyond those 7 fields).

While Steve has no control over CSAF of course, he can certainly find someone to develop a CDX VEX spec. But, until someone can tell him exactly what a VEX is and – just as importantly – what it isn’t, there’s no point in even trying to develop a spec. If you wanted to address all the possible VEX use cases being discussed by the CISA VEX working group, you would have to develop a bloated spec that covered all of these use cases, which would then require complicated, bloated tools to produce and interpret VEXes.

But why can’t somebody tell Steve what a VEX is? Surely, he has friends that know. I’m his friend – he could ask me. However, I’ve already told him that as far as I’m concerned, the term VEX has no meaning anymore, other than that it’s a document that makes statements about the status of vulnerabilities in software products. A good example of this fact is an OpenSSF document that was put in the chat at this week’s CISA VEX meeting. I haven’t read even half of the document and don’t intend to read any more, but just by reading the first 4 or 5 paragraphs, I’ve identified at least 5 separate use cases, all of which the authors consider to be VEX.

You might ask, what’s the problem with spreading a big tent and allowing a very diverse group in? I don’t have a problem with diversity when it comes to human beings, but when it comes to a format for machine-readable documents, it quickly leads to the need for tooling (both to produce and consume the documents) that is hugely time-consuming to develop – which, of course, is exactly what Steve was talking about.

Here's the problem. I’ve never seen it articulated this way, and I would certainly like to hear from anybody who says they see a problem with my logic (and is willing to articulate that problem, of course. Idle sniping is not allowed in comments on my blog posts):

1.      The cost of developing a software tool to produce or consume machine-readable documents depends on the number of independent operations that need to be accommodated in the tool. For example, if you develop a tool to add positive integers, but then you decide to incorporate subtraction into the tool, you have doubled the number of independent operations. And if you add to that a requirement to display the result of the operation in red if it is over 100, you have tripled the number of independent operations.

2.      It’s important to note that having a bunch of mandatory fields in a document format does not in itself increase the number of operations, as long as those fields just insert text at various places in the document (which is all the same operation, even though the location of the text changes).

3.      However, the cost of developing the tool doesn’t go up linearly with the number of independent operations; in other words, the cost doesn’t double when one new operation is added, triple when two new operations are added, etc. Instead, the cost is proportional to the factorial of the number of independent operations. The factorial of X is the number of ways that a group of X independent (not identical) objects can be arranged. 1 factorial (written 1!) equals one. 2 factorial (2!) equals 2 X 1 = 2. 3 factorial equals 3 X 2 X 1 = 6. 4 factorial equals 4 X 3 X 2 X 1 = 24. Etc.

4.      Why do the tool costs go up according to the factorial of independent options? It’s because the developer needs to take account of each possible arrangement of independent operations. To take our example of 3 independent operations, namely addition, subtraction, and a display rule that requires changing the result’s color to red, the tool will have to be able to produce or consume a document that contains any combination of those 3 operations in any order (there are three of these combinations). It will also need to produce or consume the possible combinations of 2 options, which is 2; and of course, there is only one possible combination of 1 option. Adding these together, you get 6 possible combinations (please check my math. It’s been a long time since Mrs. Clauser’s first grade class went over addition).

Of course, handling 6 possible combinations of operations doesn’t seem like a huge hurdle for most developers (although it would be for me!). What about when the number is 5? Then it’s 120. How about 10? That’s 3.6 million. And how about 20? That’s 2.4 quadrillion, give or take a hundred trillion. You get the idea…the cost of developing a tool to produce or consume a machine-readable document will rapidly escalate as each new independent option is added.

So here’s the question: If we want to develop a tool that creates or consumes VEX documents, how many independent operations does it need to perform? Another way to put that is to ask, “If I were to develop a tool to produce a VEX document from scratch, for a user (e.g., a software supplier) that doesn’t know anything about VEX, how many independent questions would the tool need to ask them, in order to produce a VEX in one of the two formats? Note there should be no need to ask a question about a text field, since the user can just fill it in themselves.

Let’s start with CycloneDX VEX. I once asked myself how many questions would need to be asked, to produce a CDX VEX document. To answer that question, I looked at the examples contained in the CISA VEX Use Cases document, which is the best document on VEX written so far. In fact, I don’t recommend you even read any of the NTIA or CISA VEX documents other than Use Cases and Status Justifications. The answer was about nine. 9! is 362,880, meaning a tool to produce or consume a CDX VEX document would need to be able to accommodate 362,880 independent use cases. Does that seem like a lot?

Not in comparison to CSAF. If you look at the closest thing to a specification for a CSAF 2.0 VEX, the VEX profile that Thomas Schmidt of the German BSI created (BTW, there was no CSAF 1.0. CVRF, the predecessor to CSAF, was renamed CSAF 2.0 when it came time to develop an update to CVRF 1.0. The OASIS committee that originally developed CVRF was called CSAF, so they named the new version after themselves. Nothing wrong with that, of course), you will think that this is a really simple format. In fact, I believe that the profile only two or three independent operations. Any field that is mandatory in every VEX document just counts as a single operation, but there are a couple fields in the CSAF VEX profile that depend on the contents of another field - that's an independent operation.

However, there’s a huge omission in the VEX profile: Every CSAF document requires the “product tree” and “branches” fields. If these were simple text fields, that would be no problem; they would add at most one new operation.

Unfortunately, these two fields add a lot more operations than one. How many do they add? I have never even tried to answer that question, since I have never felt like devoting the week or two (I kid you not) that would probably be required to develop a good understanding of those fields. In order to understand those two fields, you should open the 100-page (or so) CSAF 2.0 specification and start reading at least with Section 2 (Design Considerations); then read everything up to Section 4. At that point, you might have an idea of how many independent operations are required adequately to address all the possibilities in just those two fields. Of course, I have no idea how many operations that is (I can’t even count the number of pages, since they’re not numbered). But I’m sure there are at least 50 independent operations.

What’s 50 factorial? 3.0414093e+64; for comparison purposes, the number of atoms in the universe is between 10e+78 and 10e+82, although I admit I haven’t counted them lately. How many lifetimes of the universe would it take for every software developer who’s ever lived, or will live, to code every one of those operations in a tool? I don’t know that, but I’m sure even that is a big number.

But even if we go back to CycloneDX VEX, with its lowly count of 362,880 independent operations, you can see it would be ridiculous even to try to account for all of them in any tool. What needs to be done is to constrain both VEX specs (CDX and CSAF) so they only include 1-3 independent operations each. This will of course mean that some people will be disappointed that their favorite use case can’t be accommodated in our spec; on the other hand, we will make sure we include the use cases in the NTIA VEX One-pager, as well as in this unpublished Google Docs paper, which Allan Friedman drafted and which I still think is the best document (published or unpublished) about VEX.

Once we have the spec developed, toolmakers will be able to begin developing both VEX production and consumption tools – in fact, one of the members of our group, Anthony Harrison of the UK, has already developed proof of concept tools for CSAF production and consumption, so he might modify those to accommodate VEX.

At that point, we will hopefully have a workable VEX spec!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does, please go here.

 

No comments:

Post a Comment