Wednesday, June 29, 2022

Home on the (version) range

 

One nice thing about working in a new field of endeavor – SBOMs and VEXes – is that I get to worry about new problems that I never even considered to be problems before. Such is the case with version ranges.

Like probably 90% of people in the cybersecurity field, it never even occurred to me that there would be a problem with specifying a version range. For example, if I say, “Vulnerability XYZ is found in versions 2.2 through (and including) 3.0 of product ABC” and someone asks me if version “2.2a” or “2.5 patch 2” or “3.0 build 4” falls in that range, I would probably answer in the affirmative in all three cases. Moreover, I wouldn’t worry that some people would think differently. This is because I assume that any letter appended to a number can be ignored when determining whether or not the version falls within the range, and I also assume that any version string[i] that includes a patch or build number would also fall into the range, assuming the version number by itself would. Furthermore, I assume that everyone else thinks the same way.

However, computers are very literal-minded individuals. A computer can’t be expected to reason in the same way as I just described unless it’s been explicitly instructed to do so. In fact, there are probably an infinite number of other rules that humans apply all the time without thinking about them, that a computer will never apply unless it’s told to do this.

The problem came up when the CISA VEX committee (which was under the NTIA until the end of last year) was working on this document (which is the best document on VEX, by the way). The document includes (via links to GitHub) examples of about ten use cases for VEX, in both the CSAF and CycloneDX VEX formats. When we considered a use case that included a version range, we realized we couldn’t naively assume that all ranges would be correctly interpreted by the tool on the user end.

Why was the committee concerned with version ranges? Because a VEX document is a machine-readable vulnerability notification. Often, vulnerabilities apply to a range of versions of a product, since the vulnerable code didn’t change over a number of versions. In cases like this, being able to specify a version range can be a huge time saver vs enumerating every version in the range.

For example, you might want to say, “No versions of product ABC before version 4.0 are affected by vulnerability CVE-2022-12345.” There might be 150 or more versions in that range (both patch and build levels need to be included as separate versions, since a new SBOM should be issued whenever there has been any change in the software); enumerating every one of them wouldn’t be a lot of fun, to say the least.

What’s even worse, it’s common that nobody at the current supplier of the product can enumerate all the early versions, since the product was originally sold by another supplier. Thus, enumerating all versions before v4.0 may be somewhere between hard and impossible. It seems like a gift from God to be able to encompass all those versions with a simple statement like “all versions before v4.0”.

Of course, that’s simple for humans, but not for computers. Unless a computer sees an enumeration of all the versions within a range, it’s not at all certain that it will be able to determine whether every version string presented to it falls within the range or not, even if a human would have no problem deciding that question. Even worse, the computer might make the wrong decision and identify a version string (e.g., “v3.23 build 3”) as being outside of the range, even though almost any human would identify it as being within the range.

Here's how the CycloneDX VEX format treats version ranges: It supports any range that is listed in the Version Range Specifier. This document is widely followed by various projects that require version ranges, including the new CVE 5.0 specification. Rather than attempt the impossible task of stating a universal algorithm for interpreting all version ranges no matter how the version strings themselves are constructed, this document provides a generalized format for specifying a range in one of a number of versioning schemes.

These schemes are all relatively simple to make calculations in. For example, one of the most popular of the versioning schemes is “semantic versioning”. This scheme requires version numbers in the form “X.Y.Z”, where X = major version number, Y = minor version number and Z = patch number. X, Y, and Z all must be integers, and they must increment by 1. Thus, it is relatively easy to determine whether any version string in the “X.Y.Z” form falls inside or outside of a range with endpoints that follow semantic versioning.

A version range in the CycloneDX VEX format reads (for example) "vers:semver/>=2.9.0|<=4.1.0". “semver” refers to semantic versioning, of course. If the range is specified in a different versioning scheme listed in the Version Range Specifier, that scheme will be substituted for “semver”. This example can be read to mean “the range between and including versions 2.9 and 4.1, interpreted using semantic versioning”.

A supplier (which can be an open source project) that is using one of the versioning schemes supported by the Version Range Specifier will have no problem specifying version ranges in the CycloneDX VEX format. That’s the good news. The bad news is that, when it comes to commercial software, a large percentage (and perhaps the majority) of suppliers don’t follow one of the schemes in the Version Range Specifier.

For example, here’s a version string for Cisco IOS™: “12.2(33)SXI9”. Suppose someone asked you to develop an algorithm that would determine whether “12.2(35) SX17” is more or less recent[ii]. How would you do this? One algorithm might decide that the second version string is more recent, because 35 is greater than 33. But another algorithm might decide that the first string is more recent, because 19 is greater than 17. Obviously, this question can’t be answered in any way other than to say, “Ask someone from Cisco to explain this to you.” That doesn’t work too well if you’re designing a computer algorithm.

In other words, Cisco’s versioning scheme, at least for IOS, may not be amenable to algorithmic interpretation at all. And even if it is, there would need to be special code – probably provided by Cisco – in the software that interprets the IOS range. But Cisco is not unique at all. I’m writing this post on a laptop running Windows 11 Home version 21H2 build 22000.739, running Experience Pack 1000.22000.739.0. How would you figure out a version range, given all of these numbers? I wouldn’t even try to do that.

So, the CycloneDX VEX format supports version ranges for certain products, that are much more likely to be open source than commercial. If a product uses one of the supported versioning schemes, the party creating a VEX can be certain that any version string that falls within a specified range will be interpreted by the user’s tool to fall within that range. However, suppliers like Cisco and Microsoft are out of luck, unless they want to provide some custom code to include in tools that interpret CycloneDX VEX documents.

How does the CSAF VEX format support version ranges? It also supports the versioning schemes identified in the Version Range Specifier. But for all other ranges, the word is caveat emptor. The VEX creator can’t be certain that a range in any other versioning scheme will always be correctly interpreted by the tool on the user end.  For those schemes, it is better to enumerate every version in a range, rather than specify the range itself. In other words, the CSAF-based VEX format doesn’t differ much at all from the CycloneDX-based VEX format regarding versioning. Sorry, Cisco.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] This is the more general term for “version number”, since it includes the possibility of versions that are referred to by more than numbers. 

[ii] Of course, an algorithm to interpret version ranges would need to include this as one of its calculations.

No comments:

Post a Comment