Tom Alrich's Blog: November 2023

Wednesday, November 22, 2023

NERC CIP: The new SAR for cloud services

There is widespread agreement in the NERC CIP community that it’s time – in fact, way past time – to fix the biggest problem with the CIP standards: the fact that, because they’re based on the template established when CIP version 1 was drafted in 2006 and 2007, they unintentionally prevent medium and high impact BES Cyber Systems and EACMS from being installed in the cloud (note that low impact BCS – there are no low EACMS – have always been completely “legal” in the cloud).[i]

It is also important to remember that the current “prohibition” of medium and high impact BCS and EACMS has nothing to do with concerns about the security of cloud service providers, although probably nobody thinks that CSPs should get a “free ride” regarding cybersecurity, even if they have every certification known to man. The prohibition is in place simply because in 2006 when drafting of CIP version 1 began, use of the cloud was still nascent and a “computing system” always consisted of one or more physical devices. There’s no NERC definition of a physical computing device, but my operational definition is that it’s a computing system that, if you drop it on your foot, will hurt. In contrast, if you could even figure out how to pick up and drop a cloud-based system on your foot, it certainly wouldn’t hurt (physicists still don’t think that bits and bytes have mass, although bits are becoming more important in physics all the time).

Addressing this problem (or more specifically, three problems: BCS in the cloud, EACMS in the cloud, and SaaS in the cloud) requires some major changes to the CIP standards (or perhaps a new standard). Changing the standards requires that a Standards Drafting Team (SDT) be constituted from subject matter experts among the NERC entities (preferably ones who are still breathing and are willing to make the substantial time commitment required). They draft the new or revised standards and submit them to the NERC ballot body for multiple ballots, each one followed by a comment period in which the SDT members must respond to the questions submitted, followed finally by approval by the NERC Board of Trustees and ultimately FERC.

However, before the SDT can be constituted, a Standards Authorization Request (SAR) needs to be drafted by organizations or individuals with an interest in the changes and submitted to the NERC Standards Committee for approval. Recently, a SAR for BCS and EACMS in the cloud, titled “Cyber Security - Risk Management for Third-Party Cloud Services”, was submitted to the SC; it will be considered (although not necessarily voted on) at the SC meeting.

The main purpose of a SAR is to provide the SDT with “marching orders” that provide sufficient guidance on what the SDT should do. That is, the SAR can’t read, “Hey, please figure out how we’re going to solve the cloud problem and then write a standard to implement that.” It needs to lay out a specific direction for the SDT to follow, without prescribing for them how they’re going to proceed in that direction or what needs to be the result when they get there.

The new SAR (which I believe is the first one regarding the cloud and CIP that has been considered by the Standards Committee) was drafted by three parties, although two joined forces for this effort:

1. ISO New England and the ISO-RTO Council IT Committee, and

2. EDF Renewables

While I don’t think the SAR is perfect, I believe it provides enough of a roadmap that the SDT won’t wander in the wilderness when they set out to put pen to paper to develop the cloud standard(s). Moreover, I think the authors have described the most important elements of a solution to the problem, which I tried to describe in this post. These elements include:

1. A requirement for some sort of certification like FedRAMP or SOC II. The point of having this isn’t that certification is a cure for all problems, but that it almost certainly covers the main IT risks. What is not well addressed by certifications like FedRAMP is risks that apply only to CSPs.

2. A new scope for CIP. In my opinion (and I think that of the authors of the SAR), the big problem is what I said above: the CIP requirements are applicable to devices, not systems, and a cloud provider can’t track what data are on what devices; their whole model is based on systems. This is not my idea, nor that of the SAR drafters. I got it from the CIP Modifications drafting team, which in 2018 developed a proposal to do away with the concepts of Cyber Asset and BES Cyber Asset (which are the basis of BES Cyber Systems currently), and instead base the standards on BCS from the start. By making BCS the basis of compliance, that means it suddenly doesn’t matter whether the BCS is implemented in physical hardware on premises, in virtualized systems, or in the cloud – a single BCS remains the same if it’s implemented in any of the three environments, or even in all three at once. Unfortunately, that proposal got shot down because a number of large utilities didn’t want to throw away most of the software, training, and procedures they had developed for CIP compliance and go to something completely different. This leads to the next element of the solution:

3. Two “tracks” for CIP compliance, one for on-premises systems and one for cloud-based systems. In writing the blog post that I linked earlier, I realized it would be very easy to implement this bifurcation in CIP-002 R1 and Attachment 1. This is because the terms Cyber Asset and BES Cyber Asset are not used anywhere in CIP-002 and haven’t been used there since CIP version 5 (which introduced the concept of BCS in the first place) was implemented in 2017. Instead, CIP-002 starts and ends with BES Cyber Systems; very little of substance needs to change (the big change would be in the BCS definition, which would change from its current minimalist definition to one that incorporates most of what is today the BES Cyber Asset definition).

4. A new CIP standard applicable to BCS and EACMS implemented in the cloud. While the SAR doesn’t say this, I believe the new standard (which might be called CIP-015) needs to focus on the risks that apply specifically to CSPs, since “general IT” risks are already well covered by certifications like FedRAMP.

5. The standard should include a requirement for the NERC entity to develop a plan to identify, assess and mitigate risks attendant on implementing BCS (and other asset types like EACMS) in the cloud, including risks due to the cloud provider(s) it uses. This would be modeled on CIP-013 R1, although unlike that requirement, it would provide an itemization of areas of cloud risk (e.g., how does the CSP vet the security of third parties that serve as “access brokers” for services in their cloud?) that need to be discussed in the NERC entity’s plan. The plan would not have to mitigate any particular risk, if the entity can show that the risk is so low that it doesn’t apply to them.

6. The new “Cloud CIP” compliance regime will need to solve the problems with both medium and high impact BES Cyber Systems (BCS) and Electronic Access Control or Monitoring Systems (EACMS), both of which are currently not “allowed” in the cloud at all. However, the EACMS problem is easier to solve, since not as many CIP requirements apply to EACMS as apply to BCS, yet the negative impacts of the EACMS problem (especially on grid security, which of course is exactly what CIP is supposed to be protecting, not undermining) are almost as important as the negative impact of the BCS problem. The SAR suggests that the EACMS problem could be tackled first.

7. “(The) implementation plan is to allow the possibility for early adoption ahead of any proposed enforceability date.” This is very important. The SAR suggests that the SDT target completion of the new or revised standard(s) and submission to FERC within 12 to 18 months from the “start of SDT deliberations.” This is lightning fast by NERC standards, but after FERC approval (which itself can take up to a year) there is usually a 1-2 year implementation period (that period was two years in the case of the two revised standards that will allow BCSI in the cloud and will take effect on January 1, 2024). Given how long NERC entities have waited to be allowed to implement BCS and EACMS in the cloud, it seems especially cruel to make them wait another two years after FERC has approved all the required changes. “Early adoption” will allow NERC entities who are all ready to move some of their assets to the cloud to do so, while at the same time those who are in no hurry, or have no plans to move any assets at all, don’t have to change anything they’re doing now.

To conclude, I think the SAR now before the Standards Committee should be approved. It’s past time, nd frankly, the costs of not being able to use the cloud are growing all the time. No time like the present!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.

[i] Software-as-a-Service (SaaS) use in the cloud has also been “prohibited”, at least with medium and or high impact BCS. However, the only thing “illegal” about SaaS currently has to do with the fact that SaaS use with BCS will almost always require BES Cyber System Information (BCSI) to be transmitted to and stored in the cloud. For completely different reasons than BCS, BCSI has also officially been “prohibited” in the cloud – although, truth be told, there is at least one SaaS provider that has had customers with medium and/or high impact BCS for at least six years.

My expectation was that the two revised CIP standards that will come into effect on January 1, 2024 – CIP-004-7 and CIP-011-3 – would fix the SaaS problem once and for all. However, at the moment it seems that may not have happened, due to what seems to be an inadvertent omission in the wording of CIP-004-7 R6. I will write more about that soon.

Wednesday, November 15, 2023

How will we know when SBOMs take off?

In last week’s meeting of the SBOM Forum, toward the end of the meeting we somehow got onto the topic of whether SBOMs are succeeding in the software world – that is, whether they’re being used in volume. Whenever this topic comes up, I always point out that there are two SBOM “markets”. One is the market for SBOM use by software developers. The developers use SBOMs to learn about vulnerabilities in their products as they’re developing them. The second market is the rest of us: organizations whose primary business isn’t software development. It’s safe to say that the second market is potentially hundreds or thousands of times larger worldwide than the first.

The first market is doing very well, in no small part because the NTIA Software Component Transparency Initiative was composed almost entirely of developers or consultants to developers. The best indication of this fact is that Dependency Track is used over ten million times a day to look up vulnerabilities for components in an SBOM.

However, Steve Springett, who developed Dependency Track more than ten years ago and who leads the OWASP Dependency Track, CycloneDX and Software Component Verification Standard (SCVS) projects, has admitted to me that the great majority of those ten million daily lookups are due to developers trying to learn about vulnerabilities in their products (he bases this statement on the inquiries he’s seen on the CycloneDX Slack channel, which has over 1,000 subscribers). And even though I admit that one factoid doesn’t constitute proof in this case, I will assert (and I did on Friday) that there is very little distribution and use of SBOMs by organizations worldwide, whose primary business is not software development.

In the meeting, my opinion on this matter was universally shared by…me (with one or two others seemingly on the fence, unless they’d just fallen asleep). I’m used to this reaction, since even today, most of the people involved in discussions about SBOMs are developers. They’re not lying when they assert that there’s already substantial use of SBOMs for cyber risk management purposes: They hear discussions of SBOMs and see them being used all the time. Moreover, they’ve all seen a huge increase in SBOM usage in the last 4-5 years. But this is in the software developer community, not in the software user community, which is…just about every public and private organization on the planet today (even one-person companies in poor countries use cell phones, although software risk management is quite different in those cases).

It doesn’t bother me that I’m often out on my own limb in the SBOM Forum meetings. This is because I know I have the last word, since I write up the meeting notes after the meeting and since I can always write a blog post about the meeting (as I’m doing now).

So here’s my last word. Since there are no data available on worldwide use of SBOMs, these are observations that I consider to be good indicators that support my position. If anybody knows of contra-indications to these, please let me know:

· I know of only one industry in which SBOMs are being heavily used by “non-developer” organizations (a huge percentage of private and public sector organizations either develop some software or have it developed for them. But if their primary business isn’t developing software, I don’t call them developers). That is the major German auto makers (referred to as OEMs), who receive SBOMS regularly from their major suppliers like Bosch, for the intelligent devices produced by those suppliers. The OEMs need these to comply with tough German regulations regarding open source software component licensing. I believe the OEMs are not currently using SBOMs for cyber risk management purposes.[i]

· I know of no software developer of any size that is regularly distributing SBOMs for more than a handful of products. By “regularly”, I mean the developer produces a new SBOM with every major and minor release of a product; this is important, since it’s almost certain that an SBOM for any version of a software product will no longer provide a reliable description of even the next minor version of the product (if the user has upgraded to the new version, of course). Many developers have distributed (usually on a customer web portal) a small number of SBOMs for some of their products, usually because of pressure due to Executive Order 14028 – although there is still no “requirement” to produce SBOMs under that EO. However, as soon as a customer has upgraded to another version of the product, they might as well discard any previous SBOMs for the product.

· I have asked several major developers (or intelligent device manufacturers) who use SBOMs heavily on the development side of their organization whether the business side of the organization is making regular use of SBOMs to manage risk posed by components in the software they use to conduct their daily business (this requires receiving SBOMs regularly from the suppliers of the software and devices they utilize, of course). I have only received one answer of “Yes” to this question, and that was qualified by a statement that very few software suppliers to the business side of the organization were providing SBOMs at all regularly.

· I know of no low or no cost, commercially-supported software tools that ingest SBOMs, look up component vulnerabilities in the NVD or other vulnerability database(s), ingest VEX documents to identify the over 90% of component vulnerabilities that aren’t exploitable in the product itself, and make the output available in machine-readable form to vulnerability or asset management tools – and do this throughout the organization’s use of a product, meaning tracking vulnerabilities separately for different versions of the product and updating each version’s vulnerability list whenever a VEX indicates a change in the exploitability status of a vulnerability (see this draft use case from the OWASP SBOM Forum). I know there are open source tools like Dependency Track and Daggerboard that perform parts of this process, but I know of no single tool, especially a commercially supported one, that performs the whole process.

· However, the lack of consumer tools doesn’t mean there will be no large-scale – or even medium-scale – use of SBOMs until such tools are available. This is because third party services can (and will) appear that will be able to develop their own tooling (including stringing together various open source products) and spread the cost of designing, developing and operating that tooling (especially the cost of finding people knowledgeable enough to facilitate all of this) across a potentially large customer base. I believe these services could be developed very quickly once the next problem is addressed.

· Speaking of VEX, I know of no VEX documents that are being produced and used today in the use case just referenced – although the SBOM Forum is working now on specifications for that use case (which we consider to be the most important one) on both the CSAF and CycloneDX platforms. Once we have done that, it will be possible to fully automate production and consumption of VEX documents (with allowance for the fact that the naming problem is still an issue, and may require manual adjustments to the output of the automatic process). No supplier is going to distribute a VEX to users of a product, knowing that over 90% of the vulnerabilities applicable to components in the SBOM will not be exploitable – which will lead to large numbers of users tying up their help desk with calls and emails about non-exploitable component vulnerabilities.

To summarize, SBOMs are still a long way from where they should be. But this can’t be shrugged off by developers who say, “Well, we’re making progress…” Sure, “we” – meaning developers – are making progress in using SBOMs to manage vulnerabilities in their products. And this fact alone is benefiting their customers.

However, just about everything that’s been said in the public realm (including in EO 14028) about SBOMs is based on the assuming that non-developer organizations will start benefiting very soon from being able to use SBOMs to learn about vulnerabilities and other risks in the software they utilize every day. That isn’t happening yet, but it’s achievable through focused effort like what the OWASP SBOM Forum is doing. It’s not achievable through congratulating everybody about moving to the next square on the 1,000-square chessboard. It’s like we’re on a driving journey from New York City to LA. We’ve crossed the Hudson River and we’re now in Hoboken, NJ. That’s incremental progress, but we’re not going to reach LA in our lifetimes that way.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.

[i] There is also a very unique arrangement in place that facilities the development, exchange and use of SBOMs between the OEMs and the suppliers, which, based on the little information I have about the arrangement, would probably never pass antitrust muster in the US.

Friday, November 10, 2023

The Global Vulnerability Database won’t be a “database” at all

I have written about a Global Vulnerability Database before, by which I meant a database that would be funded and run internationally. However, I’ve come to realize more recently that “global”, in the context of vulnerability databases, means a lot more than simply “international”. It means a database that relates multiple types of vulnerabilities (there are a number of vulnerability types, although CVE is by far the dominant type) and multiple types of software names.

As far as I know at the moment, the only two software identifiers used in vulnerability databases are CPE, which is used in the NVD and other databases derived from the NVD, and purl, which is just about the only good option in the open source world. But there is room for more. Specifically, a new identifier is needed for proprietary software, since I (and others) regard CPE as a dead end, even though it was pioneering in its time[i]. The OWASP SBOM Forum’s paper from September 2022 describes the problems with CPE and the advantages of purl in great detail. However, I will be the first to admit that our idea for including proprietary software products in purl using SWID tags is still rudimentary and requires a lot more thought.

In any case, I have expanded on my GVD idea in the document below. I’d welcome any comments. I think the next step is to start an open source project to design the “database”. Now that I’ve done it once (with a lot of help from my friends), I think it would be easy to get that project going in OWASP. And it’s easy for organizations to make restricted donations to this project through OWASP (a 501(c)(3) non-profit corporation). If you think your company might want to help out with this effort (obviously, designing the database will not be an expensive effort at all), please let me know.

Toward a Global Vulnerability Database

Tom Alrich, November 2023

Currently, there is no easy way to identify vulnerabilities of all types (CVE, OSV, etc.) that apply to a single software product or component of a software product. Also, because of the naming problem, there is no easy way to identify all products affected by a particular vulnerability. Achieving either of these goals requires multiple database searches and manual correlation of the results; even after doing that, there is no guarantee that the user will be able to achieve either goal.

The solution to these problems is usually described as some sort of “harmonization” of vulnerability and/or product identifiers. In other words, “All we need is a single means of identifying products and a single means of identifying vulnerabilities. Then we can simply correlate the vulnerabilities with the products and create a database that’s searchable using both fields. What could be simpler?”

Unfortunately, an effort to harmonize either the different types of vulnerability identifiers or the different types of product identifiers, let alone both, is very likely to fail. This is because, in many if not most cases, vulnerabilities or product identifiers of different types simply can’t be harmonized. For example, since there can be only one CPE name for an open source project but there can be a separate purl for each repository in which the project’s code is found, directly mapping a CPE to each unique purl would make no sense (plus, there is no assurance that the code in each repository is exactly the same as in the other repositories, even though the repositories may all have the same project name).

There needs to be a globally accessible vulnerability database that incorporates all major vulnerability sources (including CVE, OSV, Python security advisories, etc.), as well as all major product identifiers (all product identifiers that are referenced by a major vulnerability source, an elite club that now includes – as far as I know - CPE and purl). The database should not even try to provide harmonized vulnerability and product identifiers, because this simply can’t be done now.

In fact, the data don’t even need to reside in a single database. The various constituent databases (NVD, OSV, OSS Index, etc.) can simply be referenced through a single smart query engine, which is titled the “Global Vulnerability Database” (GVD). A query could refer to any supported vulnerability or product identifier; for example, “What are all the vulnerabilities that apply to purl pkg:pypi/django@1.11.1?” or “To which products does CVE-2023-12345 apply?”. The query engine would decide which queries to make to which vulnerability databases (in some cases, performance considerations may dictate that at least parts of the databases - e.g., the CPE dictionary from the NVD - be downloaded regularly to a central location).

Of course, it would be more satisfying if every vulnerability type could reference every product identifier and vice versa, but trying to do that would require such a massive effort that it is effectively impossible. What is possible is to undertake particular improvement projects like adding purls to existing CVE reports; however, these may be expensive, and there will probably always be significant issues with the GVD data. The consolation is that the GVD will improve the current situation by providing a central location from which to query multiple vulnerability databases, without removing or degrading any currently available capability. Meanwhile, improvements like the CVE JSON 5.1 spec can be introduced, that will bring the GVD much closer to being a universal vulnerability database.

For example, currently no CVE report identifies a purl. When a user looks in the NVD for vulnerabilities applicable to a particular purl, they won’t see any CVEs at all. However, they will see them when CVE reports start including purls after the CVE JSON 5.1 spec is implemented and the NVD adopts that spec, but that is not likely to happen for at least the next couple of years. Perhaps the GVD might support the JSON 5.1 spec before the NVD does.

The best way to achieve the goal of a GVD is through a global effort, funded by private industry, nonprofit organizations and government. It is likely that, as long as one or two well-known organizations lead the initial effort, there will be substantial interest worldwide. Therefore, obtaining adequate funding may not pose a big problem.

The first step should be the high-level database design. When that is finished, a group will develop the detailed design, as well as a roadmap for implementing the GVD (implementation can be done in stages, with validation of each stage before moving on to the next one. While it would certainly be advantageous to obtain funding for the entire project from the beginning, that is probably unrealistic. Instead, the project team should assume that each stage will need to be funded separately).

Below are likely goals to be achieved by this project:

1. Access to the database needs to be free, although high-volume commercial uses may be tariffed in some way.

2. The database should be easily accessible worldwide, except in remote areas, etc. In general, no country should have their access to the database restricted, although there might be reasons to do so in some cases, like active support of terrorism.

3. The database needs to be able to scale easily, meaning it can be built out in stages.

4. Because there are errors in the current databases (e.g., off-spec CPE names), there should be an ongoing effort to clean up errors. There should also be an effort to make strategic enhancements to the database, such as adding purl identifiers to existing and new CVE reports. However, these efforts need to be undertaken as funds and time permit. It is possible that volunteers can be found to assist in these efforts, such as college cybersecurity majors.

The most important aspect of the GVD is that it needs to be truly global. While individual governments will be welcome to contribute both funds and human resources to the project, no government will exercise control over the GVD; governance will be by an independent board. Ultimately – once the GVD is operating smoothly and is being used heavily - the project might be turned over to an international organization like IANA, as the NTIA (part of US Dept. of Commerce) did with DNS in the 1990s (I believe the NTIA took over DNS from the universities in California where it was developed. NTIA was effectively the first domain registrar).

To sum this up, there needs to be a single searchable database of vulnerabilities worldwide. This will probably not be a single physical database implemented in a single facility. Instead, it might in effect be an AI-based “switching center”, through which searches would be coordinated among different vulnerability databases, using diverse identifiers for software and vulnerabilities. 20 years ago, the technology required for this probably wasn’t available. However, it is likely there are no significant technical obstacles to constructing this database today. In fact, rather than create a massive uber-database combining all other databases, this seems to be the approach that makes sense and is doable. We’ll leave the uber-database for another day, if ever.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.

[i] CPE will be around for a long time, since there’s so much information about vulnerabilities and the software products they apply to – all identified with CPEs – in the CVE reports. However, open source software should now always be identified using purls, but that still leaves the question of how we identify proprietary software products, if not with CPEs.

Steve Springett (leader of the OWASP CycloneDX and Dependency Track projects) has suggested that purl could be applied very easily to software in online “stores” like the Apple Store and Google Play, since purl is based on the idea of a download location. Given the huge amount of proprietary software available in those stores (and many other online software stores, of course), creating new purl types to incorporate them into the purl world would go a long way toward addressing the problem of proprietary software. Perhaps the idea in the SBOM Forum paper about SWID tags (which was also Steve’s idea) could be fleshed out, to accommodate proprietary software that isn’t found in online stores.

Tuesday, November 7, 2023

When will there be VEX tools?

Today, a very refreshing email was sent to the CISA VEX Workgroup mailing list:

We're a small startup from Germany trying to establish a vulnerability management procedure for our own product and we wrote down our "dream setup" in a document: <https://docs.google.com/document/d/1QB3EaimrS0KlL6wIpfY5-SlEEYV_Y-hfM8_SGKjGNz0>

……

Basically, we're struggling with the practical implementation bit of actually implementing a workflow to publish VEX statements. Everyone we talked to is building their own custom in-house solution (and we talked to a lot of companies by now).

I call this email “refreshing”, because, even though it didn’t point to any magic solutions, it at least pointed to the big problem with VEX: there are no standardized VEX production and consumption tools, because there is no standardized VEX specification. No vendor is going to produce a tool for VEX until they can build it to a specification (whether or not it’s an internationally recognized standard is irrelevant at the moment) that they’re sure will be followed by multiple VEX producers and consumers (it doesn’t have to gain universal acceptance, but at least be in use by multiple producer and consumer organizations).

The document (which I haven’t had time to go through in full yet) has already garnered a lot of good comments. I added one of my own, in reference to this entry in a list of problems with VEX: “Very little tooling around automatic generation and publishing of CSAF files exists.”

I commented:

It's that there's very little generation and publishing of CSAF files - at least, files that will be read as part of an automated process, as intended - but it's certain there's zero generation or publishing of CSAF VEX files. This is because there's never been a rigorous specification of what a CSAF VEX needs to contain (and specifically, there’s never been a specification of the minimum that's required in the Product Tree and Branches fields, which are mandatory in any CSAF document and offer an absolutely huge number of options). This means that software producers are free to produce VEX files using any spec they want (I’ve counted at least four separate VEX specs in use, each addressing a completely different use case, and this doesn’t cover the wealth of “private” VEX specs that must be in existence).

Until there is such a spec, there will never be production and consumption of VEX files in CSAF. While there are companies like Red Hat, Cisco and Oracle that publish CSAF VEX files now, none of them can point to any open source or commercial tool that ingests their VEX files and passes the results on to vulnerability management tools. And there never will be any such tool, until there is a rigorous VEX spec for CSAF (the same is needed for CycloneDX VEX, although the problem is much less severe in that case) that is followed both in production and consumption tools.

The OWASP SBOM Forum is now developing such a spec - first for CSAF VEX and then for CDX VEX. We may also develop prototype production and consumption tools, but in any case, tools should be easy to develop when this is done. If you would like to join this effort email tom@tomalrich.com

The author of the email, Lars Francke of Stackable, emailed me in minutes about joining this effort. The invitation is also open to you, Dear Reader!

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.

Saturday, November 4, 2023

CISA provides thoughtful answers to useless questions

In college, I learned three important principles (as well as an important corollary) regarding writing papers:

1. A paper must always ask and answer a question. Otherwise, it’s at best an interesting narrative. While it might get a B, it won’t get an A.

2. The paper will be considered a failure if it doesn’t answer the question it asks, no matter how well written it is.

3. If you get most of the way through a paper and you realize to your dismay that you aren’t answering the question you asked at the beginning, you have two options. The first is to throw out what you’ve written so far and answer the original question, even though that will inevitably require a lot more work than you had planned. The second option is to throw out your original question and ask one that you know you can answer easily; preferably, it will be the question that you in fact were answering in what you’ve written so far. That way, you can still finish the paper in the time you allotted for it. Doesn’t that sound more appealing?

But the most important lesson I learned was a direct corollary of the third truth: You should always make sure that the question you ask in the beginning of your paper is one that you’ll be able to answer easily. That way, nobody can accuse you of not achieving the goal you set out to achieve. Even though you will still have to spend a lot of time gathering citations, etc., you won’t have to spend much time…you know…thinking about what you’re going to write. If you ask the right question, the paper will write itself.

Which brings me to CISA’s recently published white paper on “Software Identification Ecosystem Option Analysis”. This paper is almost a textbook example of the above three principles, and especially of the corollary. You may know that CISA has been promising a white paper to address the software naming problem for at least a year. But the thing about the naming problem is that it’s a really…hard…problem. It’s not something that can be solved with a paper or two. Had the paper been titled something like “How can we solve the naming problem?”, it would have violated principle number 2 as well as the corollary: it would have set the team that developed the paper up for failure, since they could never have provided even a partial answer to that question.[i]

But the people at CISA are nothing if not savvy. By stating that the paper was an “option analysis”, they almost guaranteed it would be considered a success, since an analysis of options can never be wrong; moreover, it can never be considered a failure. If you say you produced an “option analysis” about naming, and somebody points out five years from now that the naming problem is still around in some form (which it inevitably will be, although hopefully in diminished form), you can just say to them, “Of course, the naming problem is still here. We just provided an analysis of the options, but we never said we were going to solve the naming problem. Other people need to look at our analysis and decide which options to pursue.” Or something like that.

Indeed, the last page of CISA’s document (page 22) includes the stirring statement, “…the options can serve as starting points to refine the merits of various operational models…” In other words, the white paper is already successful, because it will help researchers refine their models. Who dares to say this isn’t success?

However, I happen to think that CISA’s paper could have been much more successful if the writers had taken time up front to ask themselves, “Why does the naming problem need to be solved?” Obviously, the fact that there isn’t a consistent, universal naming scheme for software products by itself shouldn’t keep anybody awake at night. What is the real problem this causes?

While software naming issues show up in many areas – e.g., cataloguing software products of different types – there is one area where the naming problem is causing significant and ongoing harm: that is in software vulnerability management. Specifically, the naming problem makes it difficult – and often impossible - for a user organization to learn about vulnerabilities that are present in a software product it uses (whether in the product itself or in one of its components).

This is best illustrated in the case of CPE names in the NVD, discussed on pages 4-6 of the SBOM Forum’s (now the OWASP SBOM Forum’s) white paper on solving the naming problems in the NVD. If a software product can’t be accurately identified in a vulnerability database, the user will never be able to learn about vulnerabilities they need to remediate (most likely by regularly contacting the supplier’s help desk until they release a patch for the vulnerability).

Thus, if I had been asked, I would have suggested that the CISA paper ask and answer the question, “How can we make it more likely that users trying to learn about vulnerabilities in the software they use will be successful?” The answer to this question would certainly involve questions regarding the different identifiers available and how they can be properly utilized in vulnerability management, but also other problems like the structure and governance of vulnerability databases.

Unfortunately, this wasn’t the question that the CISA team asked – and answered – in their paper. What was the question they actually answered? While it was never stated directly, I would summarize it as the following:

“Any solution to the naming problem requires a single global uber-identifier, into which all other software identifiers can be mapped. What is that identifier?”

On the last page (page 22), they give their answer: There are three options that “can serve as starting points to refine the merits of various operational models.” They are:

1. OmniBOR, which used to be known as GitBOM. Ed Warnicke, co-founder of GitBOM, provided a really interesting presentation to one of the NTIA working groups in (I believe) 2021, and I got quite excited after seeing it. The idea behind GitBOM was really intriguing, although it was clearly focused almost entirely on open source software. I’m sure there was some way that proprietary software could be handled by GitBOM, but it’s hard to call an identifier “universal” if it treats the software that runs probably 99% of organizations worldwide as kind of a second-class citizen. And if OmniBOR/GitBOM is restricted to just open source software, it immediately runs into the problem that one identifier, purl, has already conquered the open source world.

2. CPE, the identifier on which the National Vulnerability Database (NVD) is based – as well as a small number of other databases that are based on the NVD but purport to make up for some of the NVD’s problems. To be fair, the CISA team doesn’t give CPE a whole-hearted endorsement. This is a good thing, since, far from being a solution to the naming problem, CPE is probably the biggest contributor to it.

3. purl, which is now undoubtedly the most widely used software identifier worldwide and is very unlikely to be dislodged from that post. This is evidenced by the fact that I don’t know of any vulnerability database, other than the NVD and its derivatives, that is not based on purl. On the other hand, the vulnerability databases that use purl are all 100% focused on open source software. Since probably at least 90% of software products worldwide are open source (including at least 90% of components in proprietary software), this shows that purl is already close to being a universal identifier. But there’s no denying that it doesn’t now address proprietary software[ii] and that it doesn’t even fit all open source software perfectly.

However, CISA’s paper doesn’t even ask the real question, which is whether a) it would ever be possible to have a truly universal software identifier (which I doubt, at least in most of our lifetimes), and b) whether it’s even necessary to have a universal identifier to address the naming problem.

Of course, b) is the really interesting question. Previously, I used to think it would be impossible to have multiple software identifiers in a single database. Thus, the NVD and its imitators are based on CPE, while the databases that focus on open source are based on purl. Yea verily, never the twain shall meet – or at least that’s what I used to think.

However, I now realize that a single vulnerability database can easily utilize multiple software identifiers. For example, the OWASP SBOM Forum’s 2022 paper on the naming problem advocated incorporating purl identifiers into the NVD, but it also acknowledged that CPE identifiers will need to remain in the database for years, since there is such a wealth of information embedded with the CPEs now (more specifically, embedded in the CVE reports that call out those CPEs). While it’s nice to fantasize about transferring information now in CPEs to whatever will replace CPEs later on, the resources necessary to do this on the large scale that would be required are simply not available. For the foreseeable future, both CPE and purl will remain in active use, often in the same database, each including whatever data is now included with them.

There’s another identifier that is also available in different flavors: vulnerability identifiers (e.g., CVE, Google OSV, GitHub security advisories or GHSA, etc.). As with software product identifiers, the different vulnerability identifiers will need to continue to be available, often in the same database.

Why do I say that both software and vulnerability advisories need to continue to be used as they are today? After all, the CISA paper repeatedly discusses the need to “harmonize” the different software identifiers, meaning (of course) that they should be consolidated into one of the three identifier options listed at the end of the paper.

I used to agree with this idea, since it seemed out of the question that it would be advantageous to combine multiple identifiers for the “same” thing (e.g., software products or vulnerabilities) in one database. Why not choose one uber-identifier and map each name in the other identifiers to that one?

This would make sense if the items identified by the different identifiers were truly interchangeable. For example, it would make no sense to have different identifiers for different types of animals; they can all have a name that fits into a single taxonomy, which was initially developed by Linnaeus.

However, there are reasons why the different software identifiers can’t be easily consolidated into one. For example, take the case of CPE and purl. They’re both software identifiers, but what do they identify? CPE is a centrally administered identifier. They are created by members of the NIST NVD team, when a CVE report is submitted that refers to a software product for which the organization submitting the report (usually a proprietary software supplier that is also a CVE Numbering Authority or CNA) does not know of an existing CPE name. CPEs were designed with proprietary software suppliers in mind, since most CVE reports are submitted by such a supplier.

On the other hand, purl isn’t centrally administered at all, and it would make no sense to change it to be centrally administered (as the CISA paper suggests should happen). The whole point of purl is that the person who wants to learn about vulnerabilities in an open source software (OSS) product that they utilize (or an OSS component of a product they utilize) just needs to know three things about the product: the package manager (or similar ecosystem) from which they downloaded the product, the name of the product in that package manager, and the version that they downloaded (other information may be included, but is optional).

If they have these three pieces of information, the user can create a purl that should always match the purl for that same product (from the same package manager) in a vulnerability database. The fact that no centralized name database is required makes purl the ideal identifier in the open source world, which changes very rapidly and doesn’t rely on paid maintainers. Obviously, if a centralized database were required, someone would have to come up with a huge chunk of change to finance that effort.

Since purl requires knowledge of the package manager from which the software was downloaded, and since one open source project can be available in multiple package managers with slightly different code, this means that the single project can have multiple purls. And if the project consists of multiple modules (e.g., a library), each of those modules can have its own purl as well. Yet there can be only one CPE for the project (product). This means there’s no good way to map a single CPE to a single purl, unless some arbitrary decision is made about which purl maps to the CPE[iii].

Let’s go back to the question I would have asked, “How can we make it more likely that users trying to learn about vulnerabilities in the software they use will be successful?” The answer to this question now seems simple to me: We need to develop a vulnerability database that can accept queries made with any major software identifier (e.g., CPE or purl) or any major vulnerability identifier (e.g., CVE or OSV), and return whatever results the user would receive today if they were to make a query to a database that was designed around that identifier (for example, a user that queries the database for CVEs that correspond to a particular CPE name would receive the same response they would have received if they had queried the NVD using that same CPE name).

In fact, the new central database might not, strictly speaking, be a database at all but more of a “switchboard” that would relay each query to an appropriate “client” database (or even multiple client databases). It would then return to the user whatever response it received from the other databases (with an AI-based front-end module that would determine how best to reformulate and re-route each query). While this approach would probably not initially yield any more information than the user would have received had they queried the client database individually, it would at least centralize (and perhaps standardize) vulnerability queries. As time went on and additional funding became available, more efforts to harmonize and clean up the data (including the CVE reports in CVE.org) could be made.

In past months, I’ve advocated the idea of a Global Vulnerability Database, meaning one that’s sourced and supported globally. However, I’m now expanding my understanding of “global” to include the ability to accept queries for multiple software and vulnerability identifiers. Also, I’m also giving up my idea that the GVD could be built on top of an existing database like the NVD; it will have to be built from scratch, but it can well incorporate data and features from the existing vulnerability databases – and, of course, the existing databases would continue to do what they do now, since they would now, at least for many queries, become clients of the GVD.

I lead the OWASP SBOM Forum. If you would like to learn more about what that group does or contribute to our group, please go here.

[i] The OWASP SBOM Forum – at the time just the SBOM Forum – produced a paper in September 2022 that unabashedly aimed directly at the naming problem. We obviously weren’t following the lesson I’d learned in college, because we called the document “A Proposal to operationalize component identification for vulnerability management”. This paper was a direct assault on the naming problem, or at least the most prominent manifestation of this problem: CPE (Common Platform Enumeration) names found in the National Vulnerability Database (NVD). Not surprisingly, the paper didn’t lead to the CPE problem being solved, but it has proven to be very useful in discussions with various groups like the NVD team at NIST and the team at ENISA that is building a vulnerability database from scratch - in compliance with Section 12 of the EU NIS 2 cybersecurity regulation, which came into effect in 2022.

[ii] The SBOM Forum’s paper includes a short description, on pages 12 and 13, of our idea for how to identify proprietary software using purl; there are certainly many other ways to do that. But it’s also true that purl identifiers for proprietary (or “closed source”) software will never be as robust as those for open source.

[iii] If the person that is mapping CPEs to purls knows from which package manager the software on which a CPE is based was downloaded, they could in theory map the CPE to the purl. But having that knowledge will always be the exception, never the rule.