Thursday, March 30, 2023

When will I be able to verify an SBOM? Probably never.


A topic that seems to come up a lot in the CISA SBOM meetings is “verification” of SBOMs. Of course, that could mean a lot of things, but this usually seems to mean that the software customer doesn’t believe their supplier can be trusted to accurately represent all the components of the software in the SBOM. For example, the supplier might not report a component that’s nine years old and is loaded with vulnerabilities, or they might list a component as version 4.5, which was released three months ago, when it’s in fact version 1.1, which was released in 2011.

Could this happen? Certainly it could. However, you need to keep in mind that, once you start receiving SBOMs on a regular basis for more than just one or two products, it’s inevitable there will be a lot of empty spaces and “NOASSERTION” statements. This is because there are so many problems with naming of components (although help seems to be on the way on this issue, I’m glad to report).

Might some of those empty spaces be the result of deliberate obfuscation by the supplier? That’s certainly possible. But, given that some large suppliers estimate that over 90% of components are either mis-identified or not identified at all in an SBOM that’s produced as part of their software build process (the most common scenario), how will you ever know if the lack of an identifier for a component is due to a deliberate act by the supplier, or just due to the normal wear and tear of the naming problem? Answer: you won’t.

But there’s an even more important reason why verifying an SBOM may never be possible: How could you ever do that, even in principle? Here’s the problem: What you find in an SBOM will vary widely due to when in the software lifecycle the SBOM is produced. One of the CISA workgroups recently finalized a document on this issue, which is awaiting final approval for publication. The document describes six SBOM Types. They’re all valid for specific use cases, but they’ll always differ from each other, sometimes radically.

In a large number of cases, the SBOM you get will be created during the final Build stage, when the software code (including components) will be “set in stone” – i.e. the code contained in the binaries delivered to you the customer is exactly the code that went into the final build. If you want to verify what the supplier did with the greatest accuracy, you will need another SBOM created at the final Build stage. However, a particular version of a product only goes through one final build. This means that, unless you can roll back time and persuade the supplier to let you produce your own SBOM during the final build of the version that you now utilize, you won’t have a completely comparable SBOM to compare with the one the supplier provided you.

If you want to produce your own SBOM and not have to time-travel, you could produce an Analyzed SBOM using a “binary analysis” tool. This is a tool that, starting with the binary files distributed to customers, attempts to decompile the supplier’s code[i] and create the SBOM using that code. Of course, this will never be a completely clean process and will usually result in more serious naming problems than occur with just a Build SBOM.

In other words, probably the only SBOM Type that will be within your power, as a customer, to produce will inevitably be substantially different from the one the supplier provided to you (unless the supplier themselves used binary analysis to produce their SBOM. In some cases, the supplier may have to do that, especially if they use older languages like C or C++. But even if they did that, the Analyzed SBOM produced by the supplier will differ a lot from the one that you produce, since they bring to it a lot of inside knowledge known only to the developer).

Of course, an SBOM produced at any stage of the software lifecycle is interesting. In fact, some people who know a lot more about this than I do (a low hurdle to clear, to be sure!) say the best SBOM is one that blends two or more of the different types. For example, the Deployed SBOM is unique among the six SBOM Types, in that it includes not just the software itself but everything that is deployed with it: the installer, a container, runtime dependencies, etc. Since every artifact that’s deployed in the user’s environment can be a source of risk, knowing what’s inside all of these items is almost as important as knowing what’s inside the software itself. On the other hand, since the Deployed SBOM depends on binary analysis, it will never provide as good a description of the software itself as the Build SBOM does. It might be best to combine the two of them, although that in itself requires a lot of skill.

I hope you get the idea: In order truly to verify an SBOM, you must have something comparable to measure it against. However, it’s not likely that, without closely cooperating with the supplier, you’ll ever be able to produce anything that’s comparable. But if verification requires cooperating closely with the supplier, that’s not exactly verification, is it?

This brings up an idea: Rather than taking an adversarial position vs. the supplier and pretending it’s possible for you to conduct an “independent” verification, you could utilize a binary analysis tool to build your own SBOM, then discuss the differences between the two SBOMs with the supplier. You both might learn something interesting from doing that.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] Doing this may violate the supplier’s license agreement.

No comments:

Post a Comment