Sunday, August 29, 2021

What’s the state of IoT cyber regulation? Part 2

In my most recent post, I described the first of two pending mandatory US cybersecurity regulations for IoT devices, the IoT Act of 2020. Now I’ll describe the second of these and the one that’s more important, the IoT “device labeling” requirement in the May 12 Executive Order on securing software sold to federal government agencies.

Paragraph (s) of section 4, on page 18 of the EO, reads: “The Secretary of Commerce, acting through the Director of NIST, in coordination with representatives of other agencies as the Director of NIST deems appropriate, shall initiate pilot programs informed by existing consumer product labeling programs to educate the public on the security capabilities of Internet-of-Things (IoT) devices and software development practices, and shall consider ways to incentivize manufacturers and developers to participate in these programs.”

Paragraph (t) on page 19 reads: “Within 270 days of the date of this order, the Secretary of Commerce acting through the Director of NIST, in coordination with the Chair of the Federal Trade Commission (FTC) and representatives of other agencies as the Director of NIST deems appropriate, shall identify IoT cybersecurity criteria for a consumer labeling program, and shall consider whether such a consumer labeling program may be operated in conjunction with or modeled after any similar existing government programs consistent with applicable law. The criteria shall reflect increasingly comprehensive levels of testing and assessment that a product may have undergone and shall use or be compatible with existing labeling schemes that manufacturers use to inform consumers about the security of their products. The Director of NIST shall examine all relevant information, labeling, and incentive programs and employ best practices. This review shall focus on ease of use for consumers and a determination of what measures can be taken to maximize manufacturer participation.”

Note that paragraph (u) mandates a labeling program for consumer software. This is similar to the IoT device labeling program and indeed, NIST is addressing these two programs together (e.g. they’ve scheduled a two-day conference in September that will address both programs).

What first struck me when I read these two paragraphs was that putting labels on devices seems to be an odd way to address cybersecurity. I normally think of a label as an indication that an organization has tested a product and finds it meets certain safety standards. In other words, the product is safe to use.

Cybersecurity (or security in general) can’t be reduced to a simple “safe/unsafe” decision. In fact, let’s be honest: Seldom (and perhaps never) in our personal or business lives do most of us decide to buy a software product or an intelligent device, based solely on a judgment of the product’s level of cybersecurity.

Instead, cybersecurity is at most a final gating factor: We make the decision to buy a particular product, and before we place the order (or put it in our shopping cart, whether physical or virtual), we may ask ourselves whether there are any cybersecurity skeletons in the closet of the supplier (a device manufacturer, a software developer, the online community that developed an open source software product, etc.) that are serious enough to warrant not going ahead with the purchase or procurement. But recent history tells us that even a serious cybersecurity breach that had a serious impact on customers (think Target or SolarWinds) in the end has very little impact on demand for what the breached company sells – and in fact the impact might be net positive if the company responded well to the attack (as did both Target and SolarWinds).

So what we’re really looking for regarding the security of an IoT device (or any other product containing or consisting of software or firmware) is an indication of the risks that we are going to undertake if we buy the product. If we know what those risks are, we can then take steps to mitigate at least some of them and accept the rest. And if we buy the product without even bothering to learn about the risks (which I’ll admit is my approach when purchasing products for personal use. My general reasoning is, “Given that well-known manufacturer XYZ develops this product and/or that well-known retailer ABC sells the product, it must be secure enough for my purposes.” Whether that’s good reasoning or not is left as an exercise for the reader), we’re accepting those risks (although we all presumably take some reasonable precautions with any product we buy, like at least set a somewhat secure password, when we can do that).

So the main purpose of supply chain cybersecurity in general (and not just for IoT devices) is to help us understand the risks we undertook when we made the decision to purchase the product, not to decide whether or not to purchase the product in the first place; we may or may not take steps to mitigate any of those risks (or even to learn about them), but if we want to do that, we can. And that’s why the idea of a device label for cybersecurity seemed strange when I read about it in the EO: Since a buy/no-buy decision very rarely depends on cybersecurity considerations, how can a label help you determine what the specific risks are that apply to the device, as well as what steps you could take to mitigate those risks? That’s too much information to fit on a label.

It turns out, although I didn’t know this until recently, that the idea of device labeling for IoT device cybersecurity has been around for a while in Europe. In fact, perhaps the best such program (albeit one that’s almost infinitesimally small compared with the program that will be required in the US) is one introduced by the Finnish government in 2019 (and no bad puns about whether or not the program is Finnished or whether or not it only applies to Finnished products. Bad puns are forbidden in this blog. I can assure you that I’ll never sink to using this low form of humor, such as asking whether the army fights to the (last) Finnish. After all, what kind of blogger do you think I am?).

There are three main components to the Finnish IoT device labeling program:

First, the manufacturer has to certify that the device complies with 18 of the 70-odd requirements, which are referred to as “Provisions”, in the ETSI 303 645 standard for IoT devices. This standard may become mandatory in Europe at some point (in the sense that the European Parliament will enact legislation requiring it). But more importantly, this will almost certainly become a de facto standard for commercial, industrial and home-based IoT devices in Europe.  

It may seem odd that the manufacturer has to certify compliance with just 18 of the requirements of ETSI 303 645. Why not all of them? For one thing, since Finland is part of the European Union, their IoT manufacturers will ultimately have to be in compliance with all of the standard anyway. But the real reason is that whoever drafted this law understood that the term “IoT device” covers a huge range of products, from baby monitors and doorbell cameras to protective relays and RTUs (remote terminal units).

This means that, for any particular IoT device, a large portion of the provisions in ETSI 303 645 won’t apply. The 18 requirements chosen by the Finnish were sufficiently general, that it can reasonably be expected that they’ll all apply to the great majority of IoT products. Examples of the 18 provisions are “a manufacturer should have a ‘public vulnerability disclosure policy’, ‘installation of updates should be secure’, ‘all unused interfaces must be disabled’, and ‘there should be no hard coded credentials in the system’.”

You might well ask, “How much good does it do for the user to know that the device they bought meets just 18 very general – and fairly easy to comply with – requirements?” My answer is “Probably not much.” However, the second step in the program directly addresses this gap. Instead of simply requiring that the organization comply with all of the ETIS 303 645 provisions or something like that, it requires that a third-party “inspecting body”, which is approved by a government agency called Traficom, should do a threat-modeling exercise for the device.

The term “threat modeling” is one of those terms that’s thrown around a lot and is used with a variety of meanings – so it’s almost meaningless (no pun intended here. Remember: This blog doesn’t do puns!) to discuss the meaning of threat modeling in general. However, in the context of the Finnish device labeling requirement, the term means that the inspecting body needs to consider both what the device will be used for (e.g. a baby monitor isn’t usually used for purposes that require high security, but a protective relay usually is. On the other hand, protecting personal information is a big concern for baby monitors, but it’s an almost nonexistent concern – except for login information, of course – for protective relays), as well as the setting in which it will be used (e.g. a room in a private home in the case of a baby monitor, vs. a Transmission substation where a cyberattack might affect thousands or even millions of people, in the case of the relay).

Using these two considerations, the inspecting body needs to identify all important cybersecurity threats that might apply to that device. Some of them will apply to many devices (like password strength), while others may be specific to the particular device. For example, one significant threat that applies to smart speaker devices like Amazon Echo™ is that, because they are continually listening to conversations for the magic word “Alexa”, if compromised they might be used to eavesdrop on conversations that occur in the home. In fact, DoD recently put out an advisory to employees working from home not to have smart speakers in the same room where they are working.

Third, when the inspecting body has identified the important cyber threats that apply to the product, they need to develop a test plan to determine whether each threat poses an actual risk. In my way of speaking, if a threat poses a risk, this means either the likelihood or impact of the threat being realized is high.

Since the device can have many uses, it’s very hard – nay, in my opinion it’s impossible – to determine the degree of impact of a threat being realized in all cases. Therefore, what’s important is likelihood, which can be high or low. If likelihood is high, the risk of the threat being realized is high, and mitigation measures should be taken to reduce the likelihood to low. And if likelihood is low, this means the risk is already mitigated (which might be by Mother Nature or other completely external forces, e.g. the risk that a forest fire will damage a region of the Sahara Desert is about as close to zero as it could be), and no further mitigation measures are warranted.

For example, in the case of the Amazon Echo threat discussed earlier, the inspecting body might decide that this threat is low, due to controls that Amazon has implemented to protect the audio stream the device “hears”. In this case, the body will deem this threat to pose minimal risk, meaning users need not take any further mitigation measures for it. So this threat will be deemed not to pose a risk in the Echo device, and it won’t be included in the test plan.

In the Finnish program, when the inspecting body has developed their test plan, they must submit it to Traficom (the government agency that operates the device labeling program). Traficom can disapprove the plan, in which case the inspecting body needs to revise it and resubmit it; otherwise, the plan is approved, and the inspecting body can go ahead and conduct the testing.

Finally, the testing body conducts the tests and submits the test plan and the results to Traficom. Traficom reviews these, then decides whether or not to grant the label. Traficom decides whether whatever residual risk remains (after the results of the device testing are considered) allows the device to be sold in Finland.

I think the Finnish model would be an excellent one for the US program, with two important modifications:

First, the US program has to operate with minimal government involvement, both for budgetary reasons and for general regulatory philosophical reasons (as long as human life isn’t directly at stake, the feds usually don’t want to be directly involved in auditing or testing. So the Consumer Product Safety Commission and the National Highway Traffic Safety Administration are directly involved in investigating cases that come up for them, since these could often have an impact on life safety. But it can’t be said that the cyber compromise of an IoT device would directly impact human life, except in rare cases (e.g. if a baby died because his or her parents couldn’t hear their distress cries, due to a cyberattack having disrupted communications with the baby monitor).

This means that the Office of Management and Budget, which is charged with implementing all of the EO’s provisions[i], won’t want to be in the business of certifying “inspecting bodies” – which may be called “laboratories”, as they are in Europe. Rather, they would certify the organizations that certify the labs, and might even go one step further: certify the organizations that certify the organizations that certify the labs.[ii]

The second difference between the Finnish IoT device labeling program and the likely US program is that the latter won’t be required for a device to be sold in the US, or even to the federal government. A manufacturer will be able to decide for themselves whether or not they want to participate in the program. If they decide not to, they won’t be shut out of the market, but they will have a harder time selling their product – especially to federal agencies, since the agencies will be required to ask their IoT device suppliers to participate in the labeling program, and they’ll have to justify to auditors any case in which they bought from a supplier that didn’t participate (i.e. the supplier doesn’t have a label at all, regardless of what it says).

So what will the “label” say, if it isn’t “This device is safe/unsafe to buy”? The label (which will probably be a document accessible by scanning a QR code or entering a URL found on the product or in its accompanying documentation) will list the threats that were identified through threat modeling and the degree of residual risk for each threat, after any mitigations that have been applied by the manufacturer. Additionally, the document will list mitigations that a user organization, which is concerned about one of the risks, can take to at least partially mitigate it. For example, if the organization (a federal agency, a private organization, etc.) wishes to partially mitigate the risk due to the fact that a device allows users to authenticate using weak passwords, they could locate the devices only in access-controlled rooms. 

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.


[i] Although the EO allows OMB to decide whether to bring in another agency to run any particular program. You’ll notice that paragraph 4 (t) quoted above mentions the Federal Trade Commission (FTC) as being involved with the device labeling program. This makes a lot of sense, and comports with my idea that, sooner or later, all government agencies will be involved with cybersecurity, and not just for securing their own systems. 

[ii] This is essentially how the EPA’s Energy Star program works: the EPA approves Certification Bodies. The CBs then approve Accreditation Bodies, who accredit the laboratories that test products for energy consumption. The Accreditation Bodies accredit based on ISO/IEC 17025, an international standard for testing laboratories (which is independent of the subject being tested). This is the basis for cybersecurity laboratory testing in Europe, and most likely will be the basis here as well, for the IoT device labeling program.

Wednesday, August 25, 2021

What’s the state of IoT cyber regulation?


The first time I ever wrote about IoT security was last December, when the IoT Cybersecurity Improvement Act of 2020 was signed into law by the former president. I realized it was something that would impact the energy industry, so I thought I’d let my readers know about it - not that I expected too many of them to get excited by it, of course. As it turns out, it was one of my most popular posts of recent years.

Then in May, the Executive Order (EO) included a very interesting provision on consumer IoT device security. This hasn’t received much attention, but it will next month, when NIST convenes a two-day virtual conference to discuss this and a related provision for consumer software.

Finally, in June I was engaged by a European company that deals with IoT security regulation to help them understand what’s going on in the US in that regard. This gave me the opportunity to learn about what’s going on in both the US and Europe (spoiler alert: Europe is ahead of the US in terms of having widely-followed guidelines for IoT, although the US is definitely now ahead in terms of mandatory requirements, at least ones that are on the books for future enforcement).

Since what’s going on in IoT cyber regulation will end up affecting all IoT users – home, commercial, industrial and government – very soon (including the energy industry), I’m going to summarize what’s most important (IMHO) to know about each set of regulations. This post is about the IoT Act, and my next post will be about the IoT provisions in the EO.

If you read my December post linked above, you’ll see I really like the way the IoT Act is structured; to be honest, I wouldn’t mind seeing all of the NERC CIP standards rewritten in this way (in fact, I think that if Medium or High impact BES Cyber Systems are ever going to be allowed to be deployed in the cloud – e.g. outsourced SCADA – rewriting the standards in this way will be required. Hopefully, I’ll have time to write a post on that topic in the near future, although I did touch on this topic in a recent post).

Here’s my summary of the Act, partly plagiarized from my own post:

1.      The Act starts with a great definition of IoT devices. They must “have at least one transducer (sensor or actuator) for interacting directly with the physical world, have at least one network interface, and are not conventional Information Technology devices, such as smartphones and laptops, for which the identification and implementation of cybersecurity features is already well understood.” Note that there’s no distinction here between “home” and “industrial” IoT. As long as there’s one transducer and one network interface and it’s not a smartphone or laptop, it’s an IoT device and it’s an IoT device. It’s safe to say that the great majority of organizations in the US, and a large percentage of households, are IoT users.

2.      But the Act doesn’t apply to device manufacturers, only consumers. And the only consumers it applies to are federal government agencies. Why is that the case? I’m sure this restriction made the bill much easier to write. Since there aren’t currently any mandatory federal cybersecurity regulations on any IT/OT/IoT suppliers, if the bill had been intended to apply to suppliers, it would have run into a firestorm of opposition and might have required years to get approved (as it is, the Act required two years for approval).

3.      But what about the fact that it’s restricted to federal agencies? Does it really do the general public a lot of good if nothing they use (unless they work for a federal agency) is directly covered by the Act? I think it definitely will, for the same reason that the Executive Order will do the general public a lot of good, even though it’s also restricted to federal agencies: Suppliers of any product, but especially an IT-type product, aren’t going to take the trouble to have two separate product lines – one for the Feds and one for the rest of us schmoes. It just doesn’t make for a great advertising tag line: “Sure, this IP camera is more likely to spy on you than the one the Feds use, but it costs $10 less!”

4.      So what are the federal agencies supposed to require of their IoT vendors? The Act wisely doesn’t prescribe anything (although see the next paragraph). It simply requires NIST to, within 90 days, develop “standards and guidelines for the Federal Government on the appropriate use and management by agencies of Internet of Things devices…including minimum information security requirements for managing cybersecurity risks associated with such devices.”

5.      But NIST doesn’t have a completely free hand in developing these guidelines. They have to address “secure development, identity management, patching and configuration management.” Of course, all four of these topics are ones that I would expect would be in any good cybersecurity framework.

6.      NIST, ever the eager beavers, met the 90-day deadline (give or take a month or two) with not one document but five: NISTIRs 8259B, 8259C and 8259D, along with NISTIR 8322 and draft SP 800-213. All of these documents are good. SP 800-213 is aimed at policies and practices of end user organizations (in this case federal agencies), while 8259D provides detailed guidelines for IoT device manufacturers who sell to the federal government. So I think these two documents will be the main two governing the IoT Act.

7.      The Act requires the Office of Management and Budget to develop implementation regulations within 180 days. That would probably have been sometime in July, but I haven’t seen anything yet.

While I think the IoT Act will have a positive impact on IoT security, I think the Executive Order will have a much greater one. In fact, it may work out that what the EO requires will end up superseding the IoT Act altogether. Is Tom right about this? Or is he full of it? You’ll be able to decide for yourself when the exciting conclusion to this series arrives in your inbox (or browser) in the very near future. Please try to contain your excitement.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Thursday, August 19, 2021

This might be what it takes to make me finally give up my BlackBerry. But dang, I love that keyboard…

Two days ago, Politico published a really chilling article about how BlackBerry - which is now a software company, after earlier being a pioneer (in fact, the pioneer, in my opinion) in internet-connected personal devices - gravely mishandled reports of serious flaws in its QNX operating system. QNX is found in all sorts of devices like cars (200 million of them, according to the article) and the International Space Station. This was called to my attention by Eric Byres of aDolus, who wrote an excellent blog post that elaborated on Politico’s point about how having SBOMs could have made this a much easier problem to deal with, whether or not BlackBerry had cooperated.

But the fact is that they didn’t cooperate. Microsoft first discovered this vulnerability in April in a whole host of device operating systems, not just QNX. The article continues, “In May, many of those companies worked with the Department of Homeland Security's Cybersecurity and Infrastructure Security Agency to publicly reveal the flaws and urge users to patch their devices. BlackBerry wasn’t among them.”

At first, BlackBerry denied that QNX even had the problem, even though CISA showed them it did (in some versions). Finally, they admitted the problem but said they weren’t going to publicly announce the vulnerability. Rather, they were going to work with their customers privately to fix the problem.

Of course, on the surface, BlackBerry’s desire to limit disclosure to their customers was reasonable. Software companies do that all the time, especially if they haven’t yet distributed a patch for a vulnerability. After all, if they announce an important vulnerability to the whole world before their customers have been able to patch it, they’re inviting ruin on those customers.

But it turns out that BlackBerry doesn’t have any idea who are all the customers of QNX. BlackBerry sells it to large organizations that incorporate it into products that they sell to other organizations, who embed those products into their own products, etc. Once you get down below at least the second tier of this, users of devices that run QNX almost certainly have no idea that they have it – since it doesn’t appear on any invoice they’ve ever received.

Just as in the case of the Ripple 20 vulnerabilities, the right course of action in this case was to first prepare the patch, then let the whole world know about it. That way, in principle all users would be able to figure out if they have a product that contains QNX, and if they do,  get a patch from whoever sold them that product. The good news is BlackBerry finally took this course of action. The bad news is they took it two days ago, after Microsoft had revealed the vulnerability in April.

Of course, Microsoft deliberately didn’t name names in their April announcement, in order to give the O/S vendors a chance to patch their systems and then announce the vulnerability and the patch at the same time. It seems that BlackBerry missed that memo.

You may have noticed that I italicized “in principle” two paragraphs above. When you hear that X is true in principle, what do you normally think? I normally think, “There isn’t a snowball’s chance in hell that X is actually true in real life.” Indeed that’s the case here. In principle, every company could take an inventory of all the devices they operate, check the current software bill of materials (SBOM) for each device type, and instantly know which ones are running QNX.

And even if the device they own doesn’t run QNX, the O/S might be running inside a component within that device (take your car. Even if it’s one of the 200 million cars running QNX, in most cases QNX is probably running inside some device that’s a component of another device, etc). But in principle that’s not a problem either, since you’ll not only have an SBOM for your car, but you’ll have an SBOM for the entertainment system, the transmission system, the passive safety system, etc. So if one of those systems is running QNX, you’ll learn that.

Now, I won’t say that principles are worthless. But I will say that all these “in principles” aren’t going to get you very far, because without a doubt you don’t now have SBOMs for all of these devices – in fact, it’s very unlikely you have a single SBOM for anything in your car, let alone the car itself. Or just about any other software or device that you operate.

But this is changing, of course. Federal agencies are going to be required to get SBOMs from their software (and device) suppliers as of around August 12, 2021 – according to an OMB memo that came out last week, based of course on the May 12 Executive Order on software security. And it’s highly unlikely that this will end with the Feds – it will spread to the private sector, even though it’s not mandatory for them (and besides, the federal government buys lots of cars!).

If you’d like to learn more about SBOMs, I believe I’ve mentioned them once or twice in previous posts (he says while smirking). You can find some of them by searching on SBOM in the search bar that you see when you go to the main page of my blog, https://tomalrichblog.blogspot.com/. And you might want to join the Energy SBOM Proof of Concept meetings that I help coordinate. We’ll meet next Wednesday from noon to 1 ET. To get the URL, drop an email to SBOMenergyPOC@inl.gov.

I do wish to point out that we have strict criteria for attending these meetings. You must be a user of electric energy. If you’re an off-the-grid kind of guy and you’re reading this in a dark cabin in the mountains of East Nowhere, you probably won’t find the meeting appropriate. In fact, I don’t think you’ll be able to attend it anyway (plus I don’t know how you’re reading this post in the first place).

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Monday, August 16, 2021

Sam Chanoski is a Certified Tim Roxey Interpreter (CTRI)


On August 1, I put up a post describing an email conversation I’d had with Tim Roxey, former VP and CISO of NERC, and someone whose words are always very interesting, but sometimes (sometimes!?) hard to understand. Fortunately, a number of people in the NERC security world agree that whatever words he does utter are worth pondering, since Tim doesn’t spout off just because he’s trying to fill up a page, as some unscrupulous bloggers have been known to do.

I’ll let you read that post, but one of its most interesting features was when Tim described a trip to Whole Foods:

I was in Whole Foods couple of weeks ago. Heavy storms moving in but I was in underground parking. 

 

I’m pushing about my cart when an announcement comes over the speakers. Please all shoppers stop shopping. We have lost our cash registers due to lightening in the areas. 

 

Me thinks. I have cash. I’m good. 

 

Me thinks wrongly. Somehow the Point Of Sale device can’t process the sales in cash cuz the credit side is down. 

 

Harumph. No, it was the people and a branch  point in their processing that broke. 

 

We are so dependent on our “usual processes” that we fail to see the alternatives. 

 

Colonial failed as well. 

 

If you are CIKR then this is Wrong. Be CIKR AND operate as such.

Last week, I received an email from Sam Chanoski of Idaho National Laboratories, who is someone I’ve known a long time and have a lot of respect for. He worked for NERC for eight years (including working with Tim, of course), the last 2 ½ of which were with the E-ISAC. During his period there, he was in the middle of almost everything the E-ISAC was doing at the time. After a stint at ABB, he joined INL last year.

Sam’s email to me focused on the above passage from my post (which was quoted pretty much verbatim from Tim’s email. Yes, Tim really writes that way!).

I may be able to help a little with some Tim-terpretation from Tim Roxey’s earlier response. He’s saying the same thing allegorically with his supermarket that can’t take his cash, that I’ve posited elsewhere: in any organization with a consequential mission, there are likely to be dependencies built into “normal” accomplishment of their critical functions – and while the people who implement these processes on technologies every day largely understand many of them, the organization as a whole is often blind to most of these critical dependencies.

Rooting out these dependencies and forcing the organization to appreciate them for the risks they present is the start of how we become more resilient to whatever life and the bad people throw our way, in whatever failure-of-our-imagination ways we experience it next, with the people, processes and technologies we have today. 

For the PPT we need for tomorrow, that’s where something like Cyber-Informed Engineering (CIE) might come in, to help us imagine, design, procure, build, operate, and maintain the energy systems of tomorrow with cybersecurity inextricably part of the DNA as much as safety is today. Even though CIE slightly predates the similarly named Consequence-driven Cyber-informed Engineering (CCE), it’s definitely less well known and less mature, but ultimately a lot more broadly applicable I think – https://inl.gov/cie/ is where we are now, with the next major parts coming out likely next summer when the national strategy on CIE is (hopefully publicly) published according to FY20 NDAA Section 5726. 

Noe of this is easy or fast or pleasant but it is necessary – as Gloria Steinem said, “the truth will set you free, but first it will p___ you off.” (Note: Since this is a family blog, I can’t quote Gloria exactly)

Sam makes a great point, and I must admit I didn’t see it when I wrote the previous post on what Tim said (now I do, though): People who work within a system day by day are probably the least able to tell you exactly how it works. They especially can’t tell you what’s needed in the way of “exogenous inputs” (as we used to say when I was working for an econometric modeling company, back in the days when people believed that computers were wonderful devices, rather than the instruments of the devil himself, as we all know to be the case nowadays). So the everyday workers need to have someone come in on occasion and tell them how their system really operates. That way, they can be prepared when one of those dependencies is lost.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Thursday, August 12, 2021

Just to be clear, you’re “allowed” to put BCS in the cloud now

In my last post, I made the case (although this wasn’t the main reason for the post) that the CIP standards, in order to solve the problems found in CIP today, must be rewritten as risk-based; this is a point I’ve made in a number of posts in the past year or so. One of the problems that I pointed out, that can be solved if CIP is rewritten, is the problem of BES Cyber Systems in the cloud, as in outsourced SCADA (note this is different from the problem of BCS Information in the cloud. A lot of NERC entities are storing BCSI in the cloud now, and the changes to the requirements that will make that completely “legal” have been approved).

Putting BCS themselves in the cloud is a very tricky question, mainly because doing so potentially puts the entity in violation of a whole host of CIP requirements (assuming the BCS is High or Medium impact. If it’s Low impact, there’s no restriction at all on putting BCS in the cloud today). The main reason why I say this is that there’s no feasible way to prove in a CIP audit that the cloud provider is actually performing all the acts required by the requirements in CIP-004, CIP-005, CIP-007, and CIP-010 (as well as some of the other CIP requirements).

Of course, this statement always seems strange to people who don’t understand how CIP is audited, since they will usually point out – with good reason – that if anything, the cloud providers have better security than any electric utility, as evidenced by the rigorous FedRAMP and SOC II audits that the major cloud providers have to pass. FedRAMP requires everything that CIP does and a lot more. You’ll get no argument from me on that point.

But what FedRAMP doesn’t require is that the cloud provider store evidence in a way that would enable them to pass a CIP audit; this is because NERC requires – for audits of all standards, not just the CIP standards – that the entity show evidence that they complied with a requirement in every instance. For example, CIP-004-6 R5.1 requires that the “individual’s ability for unescorted physical access and Interactive Remote Access upon a termination action” be removed within 24 hours of their termination (which includes termination for cause or just quitting). The cloud provider would have to retain evidence (e.g. screenshots) that this was done for every termination during the audit period, for any individual that had access to any system that contains “part” of the BCS (since the whole idea of the cloud is that particular functions are broken down into a lot of small functions, that can be executed almost simultaneously on many systems located in data centers that are probably all over the world - or at least all over the US).

But even this might not seem too hard – the cloud provider would just need to track everyone who is authorized to access the servers (physically or logically) that your BCS are stored on, and record the evidence that their access was removed within 24 hours when they left the provider. Is that so hard?

It is, when you consider that this applies to every data center where some part of the operations of your BCS  might have been stored during some period of time (even if that’s just a few minutes) during the audit period. Moreover, this applies to every employee who worked in one of those data centers during the audit period (which is usually three years) – since any employee who could have walked by a server containing some part of your BCS by definition has physical access to that server (unless it’s in a rack with a locked door, protected by a card reader that only provides access to a small number of data center employees).

In order to comply with this one requirement, the cloud provider would most likely have to provide evidence that they had removed access within 24 hours for any employee in the US who could have even walked by a system that housed some part of your BCS, for even the briefest amount of time during the three-year audit period. In other words, they would need to do this for a significant percentage of all of their US employees. And this is a walk in the park, compared to the evidence that would be required for compliance with CIP-007 R2 (patch management) and CIP-010 R1 (configuration management)!

So even though the cloud provider’s practices almost certainly far exceed what the NERC CIP standards require, it’s literally impossible that they could ever document that compliance. Could that problem be fixed by amending the CIP standards, so that cloud providers would just be able to point to their FedRAMP certification as the only evidence they need?

Absolutely it could (although it would very likely take at least 3-4 years before this change would actually come into effect, from the day that a Standards Authorization Request was written for it). And what would happen then? Literally every NERC entity with Medium and/or High impact BCS would immediately outsource everything they could to the cloud, since their compliance documentation from then on would just consist of one sentence: “See XYZ Cloud Provider’s FedRAMP certification.” Of course, I would hope those entities would find a way to employ the large number of people who were previously doing nothing but document CIP compliance – I’d hate to see so many people out on the street at once.

I agree that allowing cloud vendors to just point to FedRAMP as evidence of compliance with the various CIP requirements would be an extremely popular move, but I also believe some people might object that this could very well put the BES at much more serious risk than it’s ever been in before. In fact, I’d have to withdraw what I’ve said previously, that there’s no way a cyberattack (or even a coordinated set of cyberattacks) could bring down the “whole” US power grid or even a substantial portion of it. If almost all electric utilities put most of their BCS in the cloud, it would just take attacks on the two or three major cloud providers to literally shut down a lot of the US power grid. But other than that, I don’t see anything wrong with this idea…

So what does this mean for the idea of having BCS in the cloud? Should you try it? Like a lot of things, it depends on your risk appetite. You might find that the auditors are very sympathetic to this idea – especially if you’re say a renewables producer who’s just starting out, and you have a heart attack when you see the cost of running your Medium impact Control Center on outsourced SCADA vs. the cost of building everything yourself in a facility that you own. But it’s probably more likely that they’ll tell you to drop the outsourced SCADA and create your own Control Center in say one year – and if you don’t do that, they’ll throw the book at you.

But the choice is yours. Nothing in the CIP standards says you can’t outsource BCS to the cloud.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Sunday, August 8, 2021

What does Colonial Pipeline mean for NERC CIP?


Sometimes it seems that all questions regarding OT security ultimately come down to questions about NERC CIP. Of course, this is because NERC CIP is the oldest set of cyber regulations – outside of the nuclear and military domains – that directly addresses OT, and is still one of only a handful of cyber regulations that focuses solely on OT.

So far, I’ve tried to focus on the general OT security implications of the Colonial Pipeline ransomware incident, but the fact is that there are some very important implications for CIP in the incident. This week, I received an email from someone who’s been working in CIP for quite a while, but who wishes to remain unidentified. This person asked what CIP changes would be needed, based on the lessons learned from the Colonial attack. I think it’s time that we discussed this.

The most important part of the Colonial Pipeline attack, from a CIP perspective, is that the ransomware was – according to Colonial – confined to the IT network. But even then, the attack ended up shutting down Colonial’s entire pipeline system. And according to news reports, it seems that the vehicle for this happening was that Colonial’s customer billing system (which was on their IT network, of course) had to shut down; this somehow entailed the shutdown of the OT network.

Two questions inevitably arise in the mind of anyone (such as me) whose mind has been forever warped by spending too many hours pondering recondite NERC CIP issues:

1.      If the pipeline industry had been subject to compliance with the NERC CIP standards, would this incident even have happened?

2.      If something like this did happen in an electric utility – namely, that the utility had to shut down some or all of their transmission and/or distribution systems due to loss of a system on their IT network – would it have been because of a CIP violation, or could this have happened even in a utility that was completely CIP compliant? If the answer to the latter question is Yes, this implies that the CIP standards can’t prevent a successful cyberattack on the IT network from ever shutting down power operations. In turn, this would imply the CIP standards need to be expanded in some way to cover IT systems, as well as OT ones.

I brought this issue up (really for the second or third time, but for the first time as the subject of a whole post) in July, when I reported on a webinar I’d attended the day before. In that webinar, my longtime friend Jodi Jensen of WAPA had wondered how the loss of Colonial’s billing system could possibly have shut down pipeline operations. Why couldn’t operations have continued, on the understanding that Colonial would generate all outstanding bills once the billing system was back online?

Why was Jodi asking this question? I think it’s because NERC CIP (as well as good SCADA security practice, which Jodi knows a lot about) requires that every system whose loss or misuse could “affect” the Bulk Electric System in some way be included in the OT network, which is protected by an Electronic Security Perimeter – and at least in the power industry, the ESP protections (for assets that are classified as Medium or High impact) are quite strong; I have never heard of an OT compromise that began with a frontal assault on the ESP (which isn’t to say it’s never happened, of course) – although there have definitely been OT compromises that didn’t start with a breach of the “wall” between IT and OT. In fact, I cited two of those cases in my most recent post on the Colonial issue.

And how does an electric utility decide whether or not a system’s loss or misuse might affect the BES? It’s by running the systems through a bizarre methodology for classifying systems in operation at a NERC entity, which – if I didn’t know better, because I attended a number of the drafting team meetings when it was designed, and I engaged in not a few arguments with the then-chairman of the drafting team about this issue – I would guess was designed by a team consisting of Jack the Ripper, Suleiman the Magnificent and Rube Goldberg; this “system” is known commonly as CIP-002 R1 and Attachment 1.

The keystone of this classification system is the definition of BES Cyber System (well, really BES Cyber Asset, but don’t pay attention to details if you want to understand CIP-002 R1. You’ll go mad). I won’t bore you with the full definition of BCS, but its two main points are:

1.      If the system were “rendered unavailable, degraded, or misused”, it would “affect the reliable operation of the Bulk Electric System”.

2.      That effect needs to occur within 15 minutes. That is, a system isn’t a BCS if its loss or misuse won’t usually result in a BES impact within 15 minutes. Of course, when you’re dealing with electric power, if there’s any impact at all it’s usually going to occur within a second or two. The massive power outage in Florida in 2008 was detected literally within a couple seconds in Alberta.

Let’s say the Colonial Pipeline billing system were magically transformed to do electric power billing (although the two types of billing are very different) and installed at a large electric utility. Would it be classified as a BES Cyber System and thus be required to be installed within the ESP (i.e. on the OT network)? Or would it be just another IT system like payroll? Of course, if it were actually a BCS (again, at a Medium or High impact facility), the utility would be in a lot of trouble if a subsequent audit discovered it was on the IT network.

But how could a utility’s billing system (not the metering system needed to produce the data for bills. Metering is always part of the OT network, and of course metering is found everywhere power is distributed, including at your home) have a 15-minute on the BES? Ultimately, if an electric utility’s billing system is unavailable for an extended period of time, there might be some BES impact, but it certainly wouldn’t be within 15 minutes.

So the billing system wouldn’t be a BCS in a NERC CIP environment. This means it would be totally exempt from all CIP requirements. But does this mean the billing system doesn’t pose any risk at all to the BES? Does it really deserve to get off Scot free from CIP, even if it shouldn’t be subject to the same set of requirements as for example the utility’s Energy Management System, which has an undeniable 15-minute BES impact?

Rather than deal with a hypothetical system, let’s look at a real one that is found in literally every electric utility (in fact, it’s found in just about every industrial facility worldwide): the historian. This is a system that records what goes on inside the utility’s power network, so if there’s some sort of adverse event, the utility can go back through the data to trace what actually happened. Since the BES can run quite nicely, thank you, when the historian isn’t fulfilling this purpose, it’s normally not classified as a BCS.

But in some cases – as Kevin Perry pointed out in this post – the historian is used to provide a real-time view of what’s going on in the utility’s operations, meaning that the operators in the Control Center might make a split-second decision based on something the historian is telling them. Ever the auditor, Kevin correctly pointed out that, if the historian is used for that purpose, it is a BCS and needs to be installed inside the ESP, not on the IT network.

In the same post, Kevin described one case where he decided during an audit that a historian really was a BES Cyber System (actually a Critical Cyber Asset, the somewhat equivalent term used in CIP versions 1-4). He didn’t say whether that utility was fined or not (or whether they even received a violation), but at the least they had a very bad day when they found out they had to do all the re-engineering required to move their historian from the IT network into the ESP.

But what if the historian doesn’t perform any real-time monitoring function, so it’s correctly classified as not a BCS? Does that mean that its loss or compromise doesn’t affect the BES at all? Of course not. For example, if the historian malfunctioned (due to a cyberattack) at the same time a serious event occurred on the utility’s power network, the utility might not be able to determine the cause of the event, say a relay misconfiguration in a substation. Since the cause wasn’t detected, the misconfiguration might remain in place and cause another serious event. That’s a BES impact, although it’s not within 15 minutes. 

So at least with the historian, we have an example of a system whose loss or misuse will usually impact the BES, but which doesn’t have a 15-minute impact (Kevin mentions ICCP servers as another example). This means the historian isn’t a BCS and isn’t in scope for NERC CIP at all. But is that right? Shouldn’t systems whose loss or misuse can impact the BES – even though the impact won’t always or even usually be within 15 minutes – be in some way in scope for CIP?

In the same post referred to a couple times above, I quoted from an email Kevin sent me, in which he said, “I would argue that any ‘IT’ system or system component that is essential to (sustaining OT operations) needs to be considered OT and kept isolated from the rest of the IT world.” In other words, Kevin is suggesting that systems whose loss or misuse can affect the BES, but not within 15 minutes, should be included in the OT network, not the IT network.

Does this mean that such a system should be subject to the same requirements as an actual BCS? One might initially be inclined to argue “No, such systems should be required to be on the OT network – that is, within the ESP – but they shouldn’t have to comply with all of the requirements that BCS have to comply with.”

But this ignores an important security fact and an important CIP fact. The security fact is that, if a system is connected to an IP network and it gets compromised, it can be used as a launching point for attacks on other systems connected to the same network, which might have higher intrinsic value than the system that was first compromised. The CIP fact is that every system that is on the same network as a BCS needs to be declared to be a Protected Cyber Asset; and PCAs are subject to almost exactly the same set of requirements as BCS are.

So if we want to have a smaller set of CIP requirements that applies to these “intermediate systems” (i.e. systems whose loss will affect the BES, but not in 15 minutes. Of course, this is my name for them, and note that if Intermediate System is capitalized, it has a different meaning in CIP – so my name would never be usable in practice), we would probably need to put them on another network, separate from the IT and ESP networks and protected from both. At that point, we could apply a different set of CIP requirements just to these systems.

But which of the current CIP requirements should apply to the “intermediate systems”? I don’t know about your answer, but here’s mine: None of them. That is, I don’t want the current CIP standards extended any further than they already are. I totally agree that these intermediate systems do need to be included in a general OT cybersecurity compliance regime, and I’m fine if that regime is still called NERC CIP, but it needs to be a completely risk-based system.

I outlined the “new CIP” system I’d like to see in an article for a British publication in 2019 (this isn’t available online, but if you email me, I can send you a PDF of it). I also did a webinar on this topic for that same publication in 2018. Note my views have changed somewhat since then, but the general framework I laid out is still what I believe is needed.

And I’m not saying that what I’m proposing is the only way to do this. In fact, the current CIP Modifications drafting team proposed another idea in 2018, which I really liked. I wrote three posts on it: no. 1, no. 2, and no. 3. What happened to that idea? That’s a sad story: Briefly, the SDT was proposing some radical changes to CIP. These would have required NERC entities to revise a lot of their CIP compliance documents, as well as follow some new procedures. And it seems too many NERC entities simply ruled this out as impossible.

The result of this? The current CIP compliance regime (i.e. the standards and how they’re interpreted and enforced) is preventing CIP from being extended to intermediate systems, as well as to the cloud. The latter is especially unfortunate, since, even though I’m sure there are a few hardy souls who have actually outsourced Medium impact, and maybe even High impact, BCS to the cloud (e.g. in outsourced SCADA), they’re only doing this because they’re willing to live with the constant fear that an auditor will throw the book at them and make them shut down all BCS in the cloud, forcing these entities to reproduce their BCS in the nice, safe confines of a Control Center or substation that you can actually point to and walk into.

Folks, if you want to be able to put Medium or High impact BCS in the cloud and feel safe doing so (as well as other improvements, like much faster response to new threats and a much more efficient allocation of compliance resources, as well as of course incorporation of the intermediate systems we’ve just discussed), you’re going to have to accept that the CIP standards have to change. This means you’ll have to change your current procedures and documentation, period. Which do you want? BCS in the cloud or your current CIP procedures? You can’t have both.

Is it time to review your CIP-013 R1 plan? Remember, you can change it at any time, as long as you document why you did that. If you would like me to give you suggestions on how the plan could be improved, please email me.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Thursday, August 5, 2021

Have I mentioned that SBOMs are a good thing to have?


We keep getting more reminders of why SBOMs are needed. One of the big reasons for having SBOMs is that, when a serious vulnerability is found in a software component that’s widely used, the first task of most organizations is figuring where this component is actually found in their environment – remember Ripple20? This can be a very time-consuming exercise, plus you’re not likely to have 100% - or even 25 or 50% - success. Having SBOMs for at least some of the software on your network could significantly reduce the amount of time you have to spend talking with your suppliers, scanning all your systems since you have no better way to find the vulnerability, etc.

Coincidentally, the Ripple20 vulnerabilities were announced a little more than a year ago. Then in December, Forescout Technologies announced the Amnesia:33 vulnerabilities. One big similarity between Ripple and Amnesia is they they’re both vulnerabilities in IP stacks that were developed a long time ago (the Ripple20 vulnerabilities were in an IP stack developed by Treck, a company still operating in Cincinnati, in the 1990s). So the vulnerable components have had plenty of time to be embodied in software or embedded devices that were later incorporated into other products, which were later incorporated into still other products, etc. The result is that, in many cases, the maker of a device that contains the vulnerability has no idea that the component is even included in their device – this knowledge has passed out of the collective memory, just like the identity of the person who wrote the book of Genesis.

This week, Forescout (with JFrog) again announced serious vulnerabilities in an IP stack released in the mid-1990s, although this time the stack is used in a lot of OT systems (thanks to Kevin Perry for forwarding me the story); Forescout has named these vulnerabilities “Infra:Halt”. The original developer of the vulnerable software component, called NicheStack, was a company called InterNiche; this company is now part of HCC Embedded, although the product isn’t nowadays sold as a standalone component.

Of course, to truly learn about every instance of any software in your network, no matter how deeply embedded it is in other products, you would need a multi-level SBOM for every intelligent device or “standalone” software product on your network. When will you have this? Probably never, to be honest. In fact, if in three years you have just a one-level SBOM (i.e. a list of the components directly included in the software or device) for the great majority of the software you operate, I’ll consider that to be a big achievement (I also think it’s achievable).

However, here’s the thing about SBOMs: their benefits are entirely incremental. Having an SBOM for just one product is better than having none; having them for ten products is a lot better than having just one; etc. The way things are moving today (partly as a result of the May Executive Order), you’ll definitely be able to have a lot more SBOMs for products you use in one year, than you do now.

How can you take the first step into the world of SBOMs? You can join the Energy SBOM Proof of Concept, jointly sponsored by the National Technology and Information Administration (NTIA) and Idaho National Labs. Moreover, you can join our next biweekly meeting, on Wednesday August 11 at noon Eastern Time. To get an invitation and receive our mailings, drop an email to sbomenergypoc@inl.gov. See you then!

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.

 

Sunday, August 1, 2021

Tim Roxey tells us what the real problem is. Now we have to interpret what he says.


Last Monday, I wrote a post about the comments that Kevin Perry, former Chief CIP Auditor of SPP Regional Entity, made on this post, which discussed why it actually made sense for Colonial Pipeline to shut down their operations, due to a loss of their billing system in a ransomware attack.

The whole point of that post, as well as the previous posts I’d written on Colonial (which BTW you can find by searching on the main page of my blog. Until last summer, there was no way to search the blog, making it hard for me to find previous references to a subject, and close to impossible for readers to. I was quite glad when search was added) – starting with this one – was that a “purely IT” incident can affect the OT network, even if there’s no direct connection.

In the case of Colonial, the loss of their billing system meant that they couldn’t track who put how much gasoline into their pipeline and when, and who withdrew how much and when. For an electric utility, the loss of this capability wouldn’t require shutting down power transmission and distribution, since the utility can always bill for power used later (i.e. the meters will keep operating); and if the utility can’t bill later for some reason, they still need to provide power, because they’re…well, a utility.

But Colonial doesn’t own the gasoline in their pipeline; they’re transporting it, just as a mover transports your household goods to a new city. If  the mover loses your goods on the way, they’re on the hook for the entire value of those goods. By the same token, if Colonial keeps shipping gasoline while their billing system is down, they’ll literally lose track of what any one shipper has put into the pipeline, and will end up owing every shipper the entire value of their gasoline.

In last Monday’s post, I started by saying there were three questions that needed to be answered:

1.      How can we identify systems that don’t directly control operations, yet can have a huge impact on operations just the same (i.e., IT systems that perform functions required for operations)? And when we’ve identified them, what measures can we take to protect them better than other systems on the IT network that clearly have no direct operational impact, like say the systems that run the utility’s retirement plan?

2.      Should those systems be regulated by OT-focused cybersecurity compliance regimes, such as the dreaded…(and here I cross myself, despite not being Catholic)…NERC CIP?

3.      Or maybe we need to go beyond all this talk about regulation and protecting systems, and think about what the real problem might be?

To summarize what I think Kevin said in that post, he answered the first question by in effect saying, “Any system on the IT network whose loss or misuse can impact operations, like Colonial’s billing system, should be protected like OT systems are, including being isolated from other IT systems.”

Kevin answered the second question by in effect saying, “Any system whose loss or misuse can affect Bulk Electric System operations within 15 minutes (essentially, the BES Cyber Asset definition) should be classified as a BES Cyber System (BCS) and placed within the Electronic Security Perimeter (if the asset at which it’s installed is classified as Medium or High impact).”

An example he gave of this is a mistake he saw more than once in his ten-year NERC CIP auditing career: a NERC entity didn’t classify their historian as a BCS and installed it in the IT network, not the ESP. However, in the cases Kevin discusses, the historian was used for real-time monitoring purposes, and therefore should have been classified as a BCS. So it should have been installed in the ESP to begin with.

This is stretching what Kevin said a little, but one might draw the implication that, if a system’s loss or misuse doesn’t directly impact the process being controlled (which, in the case of an electric utility subject to the NERC CIP standards, is the smooth and uninterrupted operation of the BES. In the case of Colonial Pipeline, it’s the smooth and uninterrupted transport of natural gas in their pipeline system), then a) it’s OK to install it on the IT network, and b) it doesn’t need to be subject to special regulation, beyond a general obligation to follow good cybersecurity practices.

However, there are two cases I can identify in which the shutdown of the IT network directly required shutting down OT, even though there were no systems on the IT network that directly impacted the process being controlled by OT. One case is from 2018, when a serious ransomware attack on a very large electric utility’s IT network required shutting down the control centers as well – even though the ransomware never spread there.

The other case was cited by Tim Conway of SANS in a webinar earlier this year (which was quoted in Utility Dive). In 2017, the Not Petya malware (which was based on the Petya ransomware, except that not Petya didn’t even bother to save the encryption key after encrypting the victim’s systems – it simply threw the key away. The purpose of Not Petya was to cause havoc, pure and simple. And it did; about $10 billion worth of havoc – for which, naturally, Russia has never been held accountable. Do you notice a pattern here?) brought down the entire global operations of the Swedish shipping giant Maersk.

Tim pointed out in the webinar (reported in this post) that no operational systems like cranes were affected by the attack on Maersk’s IT network. However, because of the loss of its IT systems, Maersk no longer knew what was in the containers it was shipping – meaning it really couldn’t guarantee that a container shipped to Company A was actually picked up by the correct recipient, rather than somebody else. This is very close to the situation that Colonial Pipeline faced when they lost their billing system. In both cases, the company shut down operations (although in the case of Maersk, operations were down for two weeks, vs less than a week for Colonial. On the other hand, given the devastation that Maersk suffered, the fact that it only took them two weeks to get up and running again isn’t much short of a miracle).

In other words, these two cases show us that the security of the IT network can be essential to the correct operation of the OT network, and – at least in the case of a complete loss of the IT network, as happened with Maersk and the utility in the 2018 incident – some IT incidents can require shutting OT down, even when there’s no particular system on the IT network whose loss requires the OT shutdown (as was the case with Colonial).

So we’re fooling ourselves if we think that our OT network is protected from all disturbances on the IT network, even though we may have made it impossible for an attacker to penetrate the OT network from IT – just like the French were fooling themselves when they built the Maginot Line after World War I, to prevent another German invasion – even though there was no way the Germans could have crossed the line to enter France. And this is just as true with Electronic Security Perimeters. True, CIP-005 R1 and R2 provide formidable protections against an “invasion” that comes through the IT network. But they don’t protect against all compromises, especially ones that magically bypass the ESP, like in the 2018 ransomware case.

So is the solution to apply the full NERC CIP requirements to IT systems, as well as OT systems? God forbid! I wouldn’t wish the current NERC CIP requirements – in all their prescriptive glory – on my worst enemy. However, if and when the NERC CIP standards are rewritten as risk-based, and when there are important changes made to NERC’s CIP compliance regime (as I discussed in this webinar in 2019), then it will be possible to regulate both IT and OT systems, but in different ways, commensurate with the risks posed by both types of systems.

To go back to my three original questions, Kevin and I answered the first two. But what about the third? That is, instead of just talking about regulating and protecting IT vs OT systems, maybe we need to think beyond that silo? What’s the real problem we need to address?

Fortunately, there’s someone who thinks about what the real problems are: Tim Roxey, who has appeared in this blog before. He replied to the same post that Kevin did, saying (in the inimitable English dialect known as Roxey-speak):

I was in Whole Foods couple of weeks ago. Heavy storms moving in but I was in underground parking. 

 

I’m pushing about my cart when an announcement comes over the speakers. Please all shoppers stop shopping. We have lost our cash registers due to lightening in the areas. 

 

Me thinks. I have cash. I’m good. 

 

Me thinks wrongly. Somehow the Point Of Sale device can’t process the sales in cash cuz the credit side is down. 

 

Harumph. No, it was the people and a branch  point in their processing that broke. 

 

We are so dependent on our “usual processes” that we fail to see the alternatives. 

 

Colonial failed as well. 

 

If you are CIKR then this is Wrong. Be CIKR AND operate as such. 

This was of course quite interesting, but it wasn’t…how can I say this?...definitive. So I wrote back to Tim and asked him two questions: “Do you think some sort of regulation of these systems is necessary? Or are you saying that changing the utility’s (or pipeline company’s) whole modus operandi is required to fix these problems?”

Tim replied:

Actually if we look at this differently, we see opportunity. 

Apply regulations that address People, Processes, and technology. Stop concerning ourselves with IT/OT as the technology of applicability.  If you can have the People pull the plug because their Processes (Recovery) or Technology (IT bleeding into OT) has led to a condition of uncertainty (The function of CEO is RISK) then the regulations were not so much fantastic. 

The regs in Colonial Pipeline simply do not exist. Their Issue was IT not OT and hence most NERC Regs would not apply even if they existed in TSA world. 

Requiring Baseline Regulations that hit all three factors;

  • the People that operate inside
  • Processes that control CI Functions that employ
  • Technology to perform the Critical Infrastructure functions (National Security Functions)

Good Regulations address all three.  

Bottom line – Regulations tend towards baselines. Centers of excellence (Think INPO) tend towards Ceilings of excellent performance (best practices). Ceilings tend to include a better, more mature understanding of Risk. Not just the usual Vulnerabilities, Threats and Consequences stuff but also internal risks of how the People and Processes Parts and Technology parts interact. The People being unduly influenced by their knowledge of the processes (or lack thereof ) and the misunderstandings of the technology (IT really can touch OT) leads to enough uncertainty that conservative calls to pay Ransom are made.

As with all oracular statements (i.e. statements that a true oracle makes. And no, that’s not Larry Ellison), these are subject to many interpretations. I’ve reproduced Tim’s exact words (with a couple minor grammar corrections), so that each of us can draw our own interpretation from them. Here’s mine:

·        You’re missing the boat if you focus all of your attention on the question of IT vs. OT. That’s not the issue.

·        The real issue – for both cyber regulations and best practices – is people, processes, and technologies. Get those right, and you won’t have to worry about IT vs. OT.

·        Don’t just pay attention to PPT in three silos, but look at how people, processes and technologies actually interact – as in the case of Whole Foods, where a needless dependence of cash payment systems on credit card payment systems made it impossible for this Whole Foods store to sell anything at all.

·        And just as important, make sure that people understand how the processes and technologies actually work, since for example a belief that OT exists safe behind its Maginot Line defenses can lead to a pretty rude awakening, just like in France in 1940.

Any opinions expressed in this blog post are strictly mine and are not necessarily shared by any of the clients of Tom Alrich LLC. Nor are they shared by the National Technology and Information Administration’s Software Component Transparency Initiative, for which I volunteer as co-leader of the Energy SBOM Proof of Concept. If you would like to comment on what you have read here, I would love to hear from you. Please email me at tom@tomalrich.com.