Tom Alrich's Blog: Getting to the Root of the Problem

Harvard dropout Henry David Thoreau, when told “They teach all the branches of knowledge at Harvard”, famously replied “Yes, but none of the roots.” In a similar fashion, I feel I have been dealing with the branches of the problems with NERC CIP; I now wish to get to the root.

Up until now, when I have referred to the “fundamental issues” in CIP v5, I am talking mainly about the ambiguities and contradictions in the asset identification process, namely CIP-002-5.1 R1 (and Attachment 1) and the definition of External Routable Connectivity (which is tied to CIP-005-5 R1). I have written fairly recently that these fundamental issues can be fixed through rewriting CIP-002 R1 and Attachment 1 (including definitions associated with them, primarily the Cyber Asset and BES Cyber Asset definitions); in addition, the ERC definition needs to be rewritten.

However, I no longer believe the real fundamental problems with NERC CIP can be fixed through the above changes; in fact, I don’t believe any rewrite of CIP-002-5.1 R1 and the ERC definition could fix these problems. Even if NERC followed my previous suggestions and constituted a new drafting team to rewrite these items, and even if the team were composed of the greatest minds of the century, I no longer believe this would address the root of v5’s problems, or for that matter of NERC CIP’s problems in general. I believe that only a complete rewrite, in a completely different format, will fix those problems.

What are the problems in CIP v5? I have written close to 100 posts (maybe even more) on this issue; the first was this one and the most recent was this one. To briefly summarize, the problems include:

There are ambiguities and outright contradictions in CIP-002-5.1 R1, the “fundamental requirement” of CIP v5. Until fairly recently, I thought that the solution to this problem was to rewrite the requirement, as well as Attachment 1 of CIP-002. I no longer believe this will help. I have come to believe that any attempt to have some sort of purely objective process for determining what needs to be protected by NERC CIP is bound to fail. There is far too much variability in the electric power industry for NERC or any other organization ever to develop such a process.
The concept of External Routable Connectivity is essentially part of the CIP v5 asset identification process, but I think it’s been well proven[i] by now that this is a black hole. Just as in CIP-002 R1, I don’t think there is any way to come up with a definition of ERC that will magically solve the problem: namely, how to treat cyber assets (primarily relays in substations) that are serially connected to a device like an RTU that is itself routably connected to a control center. Since about ten requirements in v5 only apply to devices with ERC, this leaves a big hole in the standards. And by the way, I've also come to the conclusion that "programmable" is a black hole. No amount of meetings could ever identify a "definition" of this term that would satisfy most of the NERC community.
Because the standards development process is so long (the Standards Drafting Team that developed CIP version 5 first met in 2008, so it will be eight years before their effort reaches fruition this April. What they were originally aiming at was what became CIP v5, although they ended up having to develop CIPs v2, v3 and v4 before they could get to v5), the threats that scare everybody at the beginning of the process are no longer as important as those at the end of the process; on the other hand, there are newer threats that were not considered important originally, but are now huge. One example of this is the threat of phishing, which was never considered to be a big threat by the SDT (and I would have agreed on that point if asked at the time), but – as shown by the very recent cyber attack on the Ukraine power grid, which seems to have been facilitated by malware spread through phishing – now needs to be one of the top threats to the grid, in my ever-so-humble opinion.
Another big problem is that the Distribution grid is completely out of scope for NERC CIP, while it clearly constitutes a huge percentage of the number of power sector assets in the US.[ii] This brings to mind the Maginot Line, the seemingly impregnable line of fortresses that France built along their border with Germany after World War I. The problem wasn’t that the line itself was vulnerable – the Germans never even tried to break through it. Rather, the problem was that it stopped at the Belgium border. So guess how Germany invaded France? And if you were a cyber attacker, how would you “invade” the US grid? Would you attempt a frontal assault on the Transmission grid, knowing that it has been subject to NERC CIP for some time? Or would you attack the Distribution grid, which currently is subject to few if any cyber security regulations? [iii]
The biggest problem I now see with CIP – and this problem existed in the previous CIP versions, although it has been greatly compounded in CIP v5 – is the fact that, of the large amount that most NERC entities need to spend on compliance, a significant percentage of that goes to compliance paperwork, not cyber security.[iv] Admittedly, a lot of documentation is needed for cyber security purposes, such as documentation of security procedures. But there’s also a lot of documentation that is there simply so an entity can prove it did something. If CIP had been proven to be the ideal set of cyber security standards for all BES asset owners, there might be some justification for this. However, don’t be shocked if I tell you that I don’t think CIP constitutes that ideal; nor do I think that an ideal set of standards could ever be developed, even if we gathered all of the Founding Fathers – along with Abraham Lincoln, Mahatma Gandhi and Mother Theresa - together to write it.

These are the problems with CIP v5[v] – i.e. the branches. What is the root of these problems? I think it’s fairly simple: The NERC standards format is a very prescriptive one. I don’t think cyber security standards can ever be successful when written in such a format. This is because cyber security protection is a statistical process. For example, if you don’t update antivirus signatures on one system for one day, it’s not very likely that anything bad will happen. If you never update antivirus on that system, or even more on any system at all, it is almost inevitable that something bad will happen.

Why are NERC standards prescriptive? Because that is how NERC standards work. They are written to prevent entities from either taking or not taking particular acts, the consequences of any one of which may lead to a catastrophic outage of the Bulk Electric System. If there is a misunderstanding between control centers due to a failure to follow the COM standards, there could well be an immediate grid event. But as I’ve just said, cyber security protection is statistical, not deterministic like most of the other NERC standards. It is highly improbable that a grid event will be caused by an electric utility not applying a particular patch, within 35 days of its release, to one system. However, it is certainly possible that the same utility’s failure to apply a patch to any of its systems for say three months after the patch’s release could lead to a grid event.

Over the next few months, I will be discussing how I think NERC CIP should be rewritten to address these problems. Before I leave, I do want to make the point that rewriting CIP the right way isn’t likely to happen anytime soon – there is right now no constituency for this move that I know of. Nevertheless, while I've been a big advocate of having NERC rewrite at least CIP-002, I now don't think that is worth the effort. I honestly think the best approach is for NERC to write a SAR to put the CIP v5 standards on a risk-based basis, like CIP-014 is now. Anything else is a waste of time.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte Advisory.

[i] My last post, of the maybe ten that I’ve done on ERC in various forms, was this one. I want to put together a post summarizing where I think things now stand on this topic, but I now feel the discussion has gone way beyond usefulness. If you look at the very fine-tuned “definition” of ERC in the post just linked, you may agree with me that there’s no way this could actually be enforced on a wide scale. Bottom line: In cases where there is a mixture of routable and serial connections in one communications stream, it is up to the entity to decide whether this constitutes ERC or not, period. I’m fine if NERC wants to try to rewrite the definition to try to address these cases, but I predict that whatever they come out with will never be more enforceable than the current definition is.

[ii] A paper by the California Public Utilities Commission in 2012 estimated that 90% of the grid assets in California at the time were Distribution assets, not Transmission ones. This is a very good paper – primarily discussing how a set of risk-based standards for cyber security could be applied to the distribution grid - and is still quite relevant. But it seems to have been removed from the CPUC’s website. If you would like me to send you a copy, email me at talrich@deloitte.com.

[iii] I will readily admit that this particular problem can’t be addressed by NERC at all. There would need to be literally an act of Congress to bring all of the Distribution grid under cyber security regulations – regulations that would perhaps still be developed by NERC, or perhaps by DHS or DoE, or maybe some other entity entirely. But in the end, it won’t do any good simply to throw up our hands and say, “NERC doesn’t have any authority over the Distribution grid”, just like one wishes with hindsight that the developers of the Maginot line had tried to figure out a way to protect Belgium despite the fact that it was a different country.

[iv] I haven’t done any sort of scientific study, but when I’ve asked some CIP compliance professionals what percentage of their organization’s CIP v5 expenditure actually goes to cyber security, the highest number I’ve received is 70%; the lowest is 30%.

[v] I’m sure there are a few others that aren’t coming to mind at the moment. In case you haven’t realized it yet, this will be the first of many posts discussing how to (once and for all) solve the problems of NERC CIP.

Tom Alrich's Blog

Monday, January 11, 2016

Getting to the Root of the Problem

No comments:

Post a Comment

Get new posts by email: