Tom Alrich's Blog: Why I don’t like security risk scores very much

In some disciplines, risk is an ancillary consideration, meaning that mitigation of risk is more of a necessary distraction than the primary goal. For example, the main objective of a contractor building an office building is to follow best construction practices, basing their work on the architect’s design. However, there are always imperfections in a design, in the materials used, in the people charged with performing the work, etc.

These imperfections introduce risk into the construction process. The contractor needs to constantly identify these risks and take steps to mitigate them, e.g. by closely supervising less experienced employees. However, mitigating these risks isn’t the contractor’s main job; that is getting the building up on time and within the budget.

However, security (whether physical or cybersecurity) is all about risk mitigation. The world is full of security risks. A security professional’s main – in fact, their only – job is to mitigate those risks to the greatest extent possible. Of course, the professional is doing this to make other processes run more efficiently and with less danger to participants in the process.

For example, a cybersecurity professional at a bank is charged with ensuring that the daily activities of the bank are carried out without funds being lost in cyberattacks. If there were no cybersecurity threats to these activities – as would have been the case 100 years ago – there would be no need for this person to be employed by the bank.

Since there are almost an infinite number of cybersecurity risks to be mitigated, it turns out that one of the primary jobs of a cybersecurity professional is prioritizing those risks based on their magnitude. It stands to reason that big risks should be mitigated before small risks – and some small risks aren’t worth mitigating at all. This means it is hard for this person to do their job if they can’t easily measure the risks they face.

If cybersecurity were a physical science, measuring risk would not be a problem. For example, if a bank cyber professional wanted to determine the degree of risk that a particular type of account would be hacked, they would “simply” a) identify all the possible vectors by which a hacker could penetrate the account, b) determine exactly the degree of damage that is possible from exploitation of each of those vectors (measured in dollars lost), and c) add all those damage amounts together.

The sum would be the maximum possible loss from hacking attacks on that type of account. The cybersecurity professional would then design controls that would block each possible attack vector, keeping in mind that the cost of the controls can’t exceed the maximum possible loss (if it did, the bank would be better off buying insurance against cyberattacks, while implementing minimal controls).

However, it is almost never possible to identify all possible attack vectors, especially because hackers are creative people and discover new vectors every day. And even if you could identify all vectors, you could never determine the maximum loss from exploitation of each of the vectors.

Because of this uncertainty, security professionals need shortcuts to measure degree of risk. Perhaps the most popular shortcut is a security score. The score incorporates components that can contribute to the risk. For example, the CVSS Base Score for a software vulnerability includes attack vector, attack complexity, privileges required, user interaction, compatibility impact, integrity impact, availability impact, and scope. Each component is assigned a weight, which presumably can be adjusted (by the members of FIRST.org, which maintains CVSS) based on experience.

Other security scores are similar. For example, security scores for product vendors are based on previous attack history, beaconing activity from bots that are “phoning home” from the vendor’s network, controls that are in place or should be in place, etc.

All these components contribute to the vendor’s cybersecurity risk, but not in the deterministic way that, for example, the number of cars crossing a bridge every day contributes to the risk that the bridge will fail. It’s at least possible in principle to calculate exactly how much stress each car puts on the bridge and determine how strong the bridge’s supports need to be. Obviously, a bridge that has a small amount of traffic does not need to be built to the same specifications as one that carries thousands of cars and trucks a day; that’s just physics.

However, there’s no way, even in principle, to say that the values of the CVSS components listed above, and the weights assigned to each value by the CVSS formula, lead to a certain value on a 1-10 scale, which of course is what CVSS provides. This is because the CVSS formula is based just on prior statistics and a lot of guesswork. There is no way to precisely model how the degree of risk of a vulnerability is determined.

In other words, risk scores are inherently black boxes that ingest a bunch of inputs and output a score. Even if you know the weight assigned to every component in determining the score, there is no way you can determine that one weight should be X, while another should be Y. Risk scores are inherently not understandable. That’s what bothers me most about them. The fact that CVSS scores have two significant digits is, as far as I can see, just a way to show the creators had a sense of humor. And in fact, most users of CVSS just group the scores into critical, high, medium and low categories.

Of course, I’m certainly not advocating that risk scores be done away with, nor am I saying that I don’t advocate they should be used in most cases. But I do want to point out that it’s possible to develop a simple and understandable model that will track actual security (or lack thereof), if you’re OK with not having a numerical score.

I recently came across an example of this. It’s from a series of YouTube videos by Ralph Langner, who discovered Stuxnet. He was talking about how we finds the components of the CVSS score to be more helpful than the score itself (I’ve heard this from other people as well). He pointed out that tracking all three of the items below provides “a pretty good filter” for the likelihood that a CVE will be exploited in the wild:

1. Presence of the CVE in the CISA KEV Catalog, meaning the CVE has already been actively exploited;

2. Whether exploitation can be accomplished through a routed connection, or whether it requires local access. This is the “attack vector” metric in the CVSS vector string.

3. Whether exploitation does not require a complex attack. This is the “attack complexity” metric.

The difference between these three items and a CVSS score (or an EPSS score, which also measures exploitability) is that they’re all readily understandable. They’re not combined in a score and they’re not intended to be. I think what Ralph was saying is that it’s helpful just to look up all three items for every vulnerability that is of concern and consider them together – without worrying about normalizing them, determining weights, etc. Leave that to the scoremakers.

If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

My book "Introduction to SBOM and VEX" is available in paperback and Kindle versions! For background on the book and the link to order it, see this post.

Tom Alrich's Blog

Monday, February 24, 2025

Why I don’t like security risk scores very much

No comments:

Post a Comment

Get new posts by email: