Thursday, June 26, 2014

The CIP-002-5 RSAW: Waiting for Godot


I have a confession to make: I often start writing a post without knowing clearly what I’m going to say.  This is because I don’t start really analyzing a topic until I write a post on it – like most people, I’m fine with using my initial impression as a guide going forward.  You can’t analyze everything in depth, or you’d never be able to get out of bed in the morning.  This has led many times to a post that started out by saying one thing but ended up saying something quite different.

And so it is with this post.  Last week, NERC released drafts of the RSAWs (Reliability Standard Audit Worksheets) for CIP Version 5.  I of course immediately downloaded the CIP-002-5 RSAW and quickly reviewed it.  I was eager to see whether and how it might shed some light on the ambiguities and inconsistencies in CIP-002-5 R1 that I’ve written so much about.  My initial impression was this: I didn’t expect much from the RSAW, and sure enough it didn’t disappoint me.  If you’re looking for guidance with the problems in the language of CIP-002-5 R1, you won’t find it here.  Meanwhile, it doesn’t even do a good job of accomplishing its limited purpose: translating the words of R1 and Attachment 1 into a format that auditors can use for audits.[i]

The good news is that, as I started working on this post in earnest, I moved away from this initial impression of mild disappointment.  The bad news is that I moved to something much more like deep despair.  Not only does the RSAW not do a good job of accomplishing its very limited purpose, but it creates huge holes which – if NERC does nothing to fill them – will result in chaos when it comes time to audit compliance with CIP-002-5 R1.  And as I’ve said many times before, R1 is the foundation for all of CIP Version 5 (and v6, for that matter).  If the foundation is rotten, the house will fall.  I believe the whole v5 house will ultimately fall unless something is done about this.

Before I continue, I need to repeat the warning I’ve used a couple times before.  It was never more needed than for this post:

Warning: Reading this post may cause unintended side effects in NERC compliance professionals, including loss of sleep, depression, excessive consumption of alcoholic beverages, and thoughts of suicide.

Now to the argument.  In a post in March, I tried to look at the different possibilities for help with the problems with CIP-002-5 R1, and concluded that the RSAWs weren’t likely to provide much assistance (in spite of the fact that NERC staff members had hinted at the March CIPC meeting that this might be the case).  I said this because I know the RSAWs are intended to provide guidance to auditors based on the literal wording of the standard, and the NERC lawyers will never let them do anything more than that.  Anyone who believes (or implies) that the RSAWs can provide interpretation of the requirements is deluding themselves, others, or both.

But there’s a bigger problem, even if you accept the idea that the best an RSAW can do is simply tell an auditor how to audit based on the literal wording of the standard:  Because of the many problems with the wording of CIP-002-5 R1 and the fact that – as I’ve said multiple times before – there can be no consistent interpretation of R1 that doesn’t contradict at least some of the wording, just telling auditors how to audit based on the literal wording (if this is even possible) doesn’t in fact give them a consistent, clear methodology for conducting the audit.  And it also doesn’t give the NERC entities a consistent, clear methodology for complying with R1 in a manner that will keep them free of violations.  The inevitable result of this, unhappily, will be a HUGE increase in the use of auditor discretion in audits – and supposedly one of the goals of v5 was to decrease or even eliminate this problem.  The auditors will have to use some methodology, and this RSAW doesn’t provide it.  More on this below.

Also, I’m not at all sure this RSAW even does a good job of reproducing the wording of the standard. As I said, there are lots of ambiguities and contradictions in that wording, and you have to do some sort of interpretation even in order to just translate the wording in a consistent manner (since it isn’t consistent in the first place).  I’m sure the people writing the RSAW have tried their best, but some of what they’ve come up with is as contradictory as the requirement itself. 

I will first discuss three specific issues, in the order in which I encountered them in the RSAW.  Note that these were all discussed in mind-numbing detail in my recent 7,000-word post on the problems in CIP-002-5 R1.  Finally I will conclude with a discussion of a huge overall issue that pretty much trivializes everything else.

Issue 1: Assets/Facilities
First, there’s the issue of what the criteria in Attachment 1 apply to.  I had originally thought that the six asset types listed in R1 were what you applied the criteria to, in order to identify High, Medium and Low impacts; in fact, I know there are to this day many in NERC and the Regional Entities that believe this.  However, an Interested Party convinced me earlier this year that the six asset types are simply the locations at which BES Cyber Assets/Systems can be found, and the Attachment 1 criteria really refer to other things.  Indeed, when you go through the criteria, you will see that they apply to a lot of things that aren’t included in the list of six asset types. 

Most importantly, criteria 2.3 – 2.8 apply to something called “Facilities”, which in a substation means each line, each transformer, each bus, etc.  If a Transmission entity assumes that Facilities is just another word for the six types of assets in R1, they may well end up over-identifying BES Cyber Systems.  This is because there could be both Medium and Low impact Facilities at a substation, but if you assume that “Facility” means the substation itself, this would indicate that all of the BCS are Medium as well[ii] – and this could well not be the case. 

However, I know at least some of the Regional Entities are saying that “Facility” means the substation  itself in criteria 2.4 – 2.8.  And since the substation is clearly Medium impact in those criteria, all of the BCS at the substation have to be classified Medium as well.[iii]

How does the RSAW deal with this issue?  It doesn’t.  Consider this wording under Evidence Set 1 on page 5:

2. A list of all high impact BES Cyber Systems identified, and the asset(s) with which the BES Cyber System is associated.
3. A list of all medium impact BES Cyber Systems identified, and the asset(s) with which the BES Cyber System is associated.

These clearly imply that BES Cyber Systems are only classified by their association with assets.  Of course, “assets” isn’t a NERC defined term, but in this context I believe it means a member of one of the six types of assets listed in R1.  And there is definitely no mention of Facilities.  The upshot of this is that a Transmission entity won’t be able to have both Medium and Low BCS at a Medium substation; all BCS will be Medium – unless the RSAW is changed, of course.[iv]

Issue 2: “At” / ”Associated with” / Whatever
I think I could write a small book by now on the question of the use of “at” vs. “associated with” in Attachment 1.  I did write a whole post on this in February.  Unfortunately, there is a lot of confusion on this issue in the RSAW.

The first example is item 2 under Evidence Set 1, which I quoted above.   Here, “associated” is used to refer to High BCS, when the wording in Attachment 1 Section 1 is “used by and located at”.  A BCS is classified as High because it is used by and located at a High control center, not “associated with” it.  If the SDT had used “associated”, this would have meant that every RTU in a substation controlled by the control center could be a High impact BCS – and entities would have to spend a fortune protecting substations as High impact assets.   

Fortunately, the writer(s) of the RSAW wasn’t consistent in this regard, since line 1a under Evidence Set 2 (page 5) says

a. A list of high impact BES Cyber Systems, for which this entity has full or partial compliance responsibility, used by and located at the asset

Here, they got it right.  But don’t celebrate yet, since near the top of page 7 we find this line:

Verify that the high and medium impact BES Cyber Assets associated with the sampled asset have been correctly identified.

Now they’re back to using “associated with” for both High and Medium BES Cyber Assets (and this should read “BES Cyber Systems”, of course, not “BES Cyber Assets”.  The BCAs themselves aren’t rated, which I would hope the NERC staff understands).

There’s now a new issue related to the “at / associated with” question.  That is what I am now officially dubbing the “Whitney Doctrine”.  This doctrine states that “Physical location IS a determinant factor for impact classification.”  It has the effect of solving the “transfer-trip” relay question in a way that made a lot of Transmission entities quite happy, as described in my previous post.  It has no basis in the wording of CIP-002-5 that I can see, but as I said in that post, I’m perfectly fine with this – somebody has to push the words of R1 around so that there is a consistent interpretation.[v]

However, it does mean that “associated with” now has no meaning, since because of the Whitney Doctrine both High and Medium BES Cyber Systems need to be located at the asset (vs. just High BCS in the dark days before the Doctrine was promulgated, as described in my last post).  And it also probably means that you won’t be able to rate a BCS at a criterion 2.5 substation according to the line (i.e. Facility) it’s associated with.  You can’t say that a relay is “at” a line, but you can say it’s "at" a substation.  So the relay will have to take the classification of the substation (Medium in this case). 

However, maybe Tobias will figure out a way to make the Whitney Doctrine compatible with the Facilities principle, thus saving his “transfer-trip” ruling (and possibly his life, since the TO/TOP’s wouldn’t be pleased to hear he was going back on what he said at the CIPC meeting, just for the sake of consistency).  He’s a smart guy.[vi]  But this does show the whack-a-mole effort that is required when trying to fix problems in the Attachment 1 criteria; you solve one problem (to actual applause, in the case of Tobias) and another pops up somewhere else because of your “solution”.

Issue 3: “The Phantom Low Impact BES Cyber System” and other Tales of Mystery and Horror
I have several times discussed the fact that there is an inherent contradiction in CIP-002-5 between Attachment 1 and Requirement 1.  The former assumes you start out with an inventory of all your BES Cyber Systems; you subtract out the Highs and Mediums, and voila the rest are Lows.  Meanwhile, R1 makes clear you don’t have to inventory cyber assets at Lows.  So how would you ever get the comprehensive list that Attachment 1 clearly implies you need?

One result of this contradiction is that there is a big asymmetry between R1.1 and 1.2 on the one hand, and R1.3 on the other:

1.1. Identify each of the high impact BES Cyber Systems according to
Attachment 1, Section 1, if any, at each asset;
1.2. Identify each of the medium impact BES Cyber Systems according to
Attachment 1, Section 2, if any, at each asset; and
1.3. Identify each asset that contains a low impact BES Cyber System
according to Attachment 1, Section 3, if any (a discrete list of low impact
BES Cyber Systems is not required).

1.1 and 1.2 require you to identify BCS, but 1.3 just requires you to identify an asset that “contains” a low BCS.  It immediately goes on to tell you that you don’t need a list of those low impact BCS.  So how on earth can you know whether an asset contains one of these mythical beasts?  Of course, what would make much more sense would be if 1.3 read something like “Identify each Low impact asset”; in fact, every NERC entity I’ve talked to about CIP v5 asset identification says this is exactly how they’re interpreting 1.3.

So why did the SDT twist their words into knots in R1.3?  It’s because the entire premise of R1 and Attachment 1 is that assets (i.e. the “big iron”) are never classified; just the BES Cyber Systems (the “little iron”) are.  The only way they could say something that means “Low impact asset”, without using those dreaded words, was to use the circumlocution in R1.3.   This in spite of the fact that nobody is interpreting the words this way – in fact, nobody could interpret them that way, given the contradiction between the first part of R1.3 and the parenthetical expression at the end of it.

All of this might just be amusing, but the fact that this wording is used in the RSAW (which it is) could potentially cause trouble.  I refer now to two lines on page 7:

Verify that the asset has been correctly identified as containing a low impact BES Cyber System.
and
Verify that the sampled asset does not contain a BES Cyber System.

In the first line, the auditor is being asked in effect to make sure the entity has properly identified Low impact assets; in the second line, he/she is being asked to make sure the entity isn’t lying when they say an asset isn’t even a Low[vii].  But because “Low asset” is a verboten term in CIP v5, the RSAW uses the language of R1.3.

But how is the auditor going to verify this?  If I were an auditor new to CIP v5 and I saw these two lines, the first thing I would ask the entity is to show me their list of all Low BCS; of course, that doesn’t exist.  So how does the auditor verify that the asset contains or doesn’t contain a Low BCS?  The answer is that the auditor needs to know that saying an asset contains a Low BCS is the same thing as saying it is a Low asset – wink wink, nudge nudge.  So he verifies whether or not these are Low assets by doing what every entity is now doing anyway: assuming the criteria in Attachment 1 refer to assets (or Facilities) and identifying Low assets as those that aren’t High or Medium.[viii]  But shh…don’t tell anybody they did this.[ix]

I hope I don’t have to tell you that requiring knowledge that is nowhere stated in the requirements isn’t a wonderful way to have auditors audit a standard with million-dollar-a-day penalties.

Issue 4: The BIG One
I really shouldn’t call this just one of four issues.  That’s like saying there were a number of issues on the Titanic the night of April 14 1912, including a shortage of party hats and also an iceberg they’d just hit.  And like that iceberg, this issue could literally sink CIP Version 5, and probably will if nothing is done to address it.

Unfortunately, this isn’t just a problem found in the wording in one or two places in the RSAW; it’s found just about everywhere.  Let’s start with Evidence Set 1 on pages 5 and 6.  There are three phrases that contain the words “the process required by R1”, for example

Evidence that the process required by R1 was implemented to determine the list of high and medium impact BES Cyber Systems.

What’s wrong with these words?  Tell me, what is the “the process required by R1?”  You might say, “That’s simple, it’s the methodology you use to comply with Requirement 1”.  I say, “Great, now tell me what that methodology is.”  Then what do you say?  I know that I certainly couldn’t tell you that methodology.  I recently wrote a post where I started out thinking I was going to finally nail down this methodology.  I ended up breaking it off unfinished, and finally came back to admit that I couldn’t finish it.  This is because I have come to the conclusion that there is no consistent, complete methodology that can be stated for compliance with R1, which will at the same time not do some violence to the wording of either R1 or Attachment 1.[x]

So why does the RSAW blithely imply that “the process required by R1” is something that any child could recite?  The whole point of the RSAW is to tell the auditors how to audit the requirements, not to leave the hardest part as an exercise for the reader.  It’s as if you read a cake recipe for the first time.  It first carefully lists all the ingredients – that’s good.  Then it says, “Now bake the cake according to the required process.”  Wouldn’t you expect the recipe to tell you what that process was?

Let’s go on to page 6.  There we read several phrases with the words “can be reasonably expected”, as in

Verify that the process can be reasonably expected to identify all high and medium impact BES Cyber Systems at each asset.

This is really great.  The most important parts of R1 (classifying BCS in this case) are being judged not on the basis of whether they were properly followed by the entity, but on whether they’re “reasonable”.  And whose reason is going to decide this?  Why, the auditor’s!  In the end, the way you will be judged on how well you classified BCS (as well as Low impact assets) is by whether your auditor thinks what you did was reasonable.

Does anyone else think this just might be a tiny little problem?  Like perhaps the sound of that iceberg hitting in 1912?  I have nothing against the auditors – they’re probably much more “reasonable” than the rest of us.  But I thought one of the cardinal principles for CIP v5 was that it was going to eliminate the need for auditors to use their discretion – and now we’re not only not eliminating that, we’re requiring them to do so!  I know a number of auditors, and the last thing they want to do is have to use their own reason to bridge a wording gap in the requirements.  They sometimes do have to do this, but for a fairly well-defined issue like what “routable” means.  Here, they’re being asked to each read the great works of Western philosophy and regulatory compliance, so that they can each come to their own understanding of what CIP-002-5 R1 means.

Let’s go on.  In this same section, we now find multiple uses of the words “correct” or “correctly”, for example

1. Verify that the high and medium impact BES Cyber Assets associated with the sampled asset have been correctly identified.
2. Verify that the impact rating of each identified BES Cyber System is correct.

And what exactly is the “correct” way to identify BCAs, or the “correct” way to assign impact ratings to BCS?  Of course, there’s no clue provided.  The auditors are clearly required to use their “reason”.  Wonderful.

Finally, we come to the “Notes to Auditor” section at the end and read this blockbuster:

Results-based Requirement: The auditor should note that this is a results-based Requirement. As such, the entity has great latitude in determining how the result is achieved. The auditor should focus on verifying that the result is complete and correct.

This makes sense on the surface.  After all, wasn’t that one of the big problems with the previous CIP versions?  The requirements dictated how you should do something rather than just saying what you should do and letting you figure out how to do it? 

This would make sense here, too, if there were some clear definition of the results that need to be achieved.  But what are those results?  They're things like the “correct” classification of BCS, a “reasonable” methodology for complying with R1, etc. – and these are left totally undefined. 

Here’s an analogy.  Let’s say you were being audited on your navigation abilities.  The auditor might say, “Drive to City Hall.”   It would make sense that she wouldn’t care how you got to City Hall, just that you found the place.  However, now suppose she is told to audit your navigation abilities, but not tell you where to go.  She is just to tell you to go “somewhere reasonable”, and make sure you did a reasonably good job of getting “somewhere reasonable”.  So you drive wherever you want, and as long as she’s sure you got somewhere, she has to pass you.  Obviously, this would make the entire auditing process meaningless, since there would be no objective standard for deciding you had gotten “somewhere reasonable”.

(June 29: I have written a long footnote to this discussion that I decided to make a new post. You can find it here).

And friends, that is exactly the problem here.  The RSAW spends most of its time addressing the simple wording problems of CIP-002-5 R1 (and doesn’t even do a good job of that, as shown in Issues 1-3 above), and hides the really thorny problems behind words like “reasonable” and “correct”.  The result is a requirement that can’t be objectively audited at all.  The only two ways an auditor can audit R1 are a) Simply give everybody a pass, or b) Decide on their own what R1 means, and apply that meaning mercilessly in their audits. 

To be honest, I don’t think there’s too much danger of b); I doubt there’s a single auditor who wouldn’t commit suicide or change careers if he had to be continually telling people that they were getting PV’s because his “reason” told him they should get them.  Instead, what will happen is that nobody will ever be judged wrong on how they identify and classify their BES Cyber Systems. This is exactly like the RBAM situation in CIP v1-3, where there was virtually no way an entity could have been found to have identified its Critical Assets incorrectly – except perhaps by turning in a game of Hangman and saying that was the RBAM.[xi]

And as I’ve said previously, if CIP-002-5 R1 is rendered unauditable by a lack of objective criteria to guide an audit, then the rest of the v5 requirements become unauditable as well.  I’m more convinced than ever that, if nothing is done about this and an entity receives a PV for R1 and challenges it in court (which they can do – CIP is regulatory law), the judge will take 15 minutes to read the requirement, exclaim “What is this ___ (stuff)?”, and throw CIP-002-5 out.  That will of course also invalidate CIP-003-5 through CIP-011-1.  This might actually be a good thing were it to happen say in the next six months.  Unfortunately, there’s no way it can happen before about five years from now, at which point there will be a huge investment in complying with CIP v5.  For that to all be put into question…well, it won’t be pretty, that’s for sure.

And now, Our Conclusion
I have been writing about this fundamental problem with CIP Version 5 for over a year; others have written and spoken about this as well.  But since human beings are averse to contemplating great upheavals, people have been waiting for Godot to come and fix everything.  I myself hoped FERC would do that in Order 791, and later hoped (against hope) that the new SDT would decide to address the problem.  More recently, people have been looking to the RSAWs as their hope; and I’m sure there are many who think the upcoming Transition Study Lessons Learned cases will be the answer.

Folks, I’m telling you: Godot ain’t coming.  This problem has to be solved by the NERC community.  The best solution is for NERC to step in and do something.  The second best is for FERC to demand NERC do something (or do it themselves, although that would be a very radical step that I don’t think anyone wants to see, since it would set a very bad precedent).  I also used to think that the regions could get together and fix this, but I no longer see that as possible or likely.  Who knows, maybe even an industry group like the EEI or the Transmission Forum could pull this off.

But I do know this is a disaster waiting to happen.  We’re heading full speed toward the cliff, smiling all the way over.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.



[i] I will point out that this failure to even accomplish the limited purpose of the RSAW isn’t ultimately the fault of the people who wrote it, but of the wording of R1 and Attachment 1.  The many inconsistencies and contradictions in their wording make it literally impossible to craft an RSAW that accomplishes even its minimal purpose.

[ii] An exception to this would be BCS associated with an SPS that might be located in the substation.  If the SPS is Low and the BCS is only a BCS because of its association with the SPS (not with the substation itself), then the BCS would also be Low impact.

[iii] This post discusses this point in more detail.

[iv] I should point out that this is actually an interpretation of CIP-002-5 R1.  It seems the decision has been made by NERC that, no matter how the Attachment 1 criteria read, the only things they can apply to are the six asset types.  So “Facilities” means “assets”, and this means the different lines in a substation (and their BCS) meeting criterion 2.5 are all Medium impact.  And it means that criterion 2.3 now applies to an entire plant, not just to a single generating unit (a “Facility”).  So if a single unit at a plant has been designated “Reliability Must Run”, the entire plant will have to be Medium impact, not just the unit.  There are other consequences of this as well.  The upshot: I can’t fault NERC for making an interpretation in the RSAW, since I’ve been urging them to do this.  But I think a number of entities will be unhappy with what this interpretation means for their compliance costs. 

And I really don’t think this was a deliberate interpretation, either.  It is interesting that Tobias Whitney’s presentation at the NERC CIPC meeting two weeks ago quite clearly states that BCS at substations meeting criterion 2.5 are classified according to the Facility (i.e. line) they’re associated with (see slides 29 and 30).  I think this is the correct interpretation, but somehow this insight wasn’t passed on to the person writing the RSAW, even though they probably work for Tobias. 

[v] One of my favorite quotes from Lewis Carroll’s Through the Looking Glass is this one (I know, I’ve already used it in a previous post.  Hey, I’m all about sustainability):

"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean- neither more nor less."
"The question is," said Alice, "whether you can make words mean so many different things."
"The question is," said Humpty Dumpty, "which is to be master, you or the words.  That's all."

[vi] I wish to point out another anomaly in the use of “associated with” in the RSAW.  This seems to be just the result of someone’s laziness, but it needs to be corrected.  In the discussion of Evidence Set 2 on page 5, item 1c says

c. A list of all BES Cyber Assets associated with each high or medium impact BES Cyber System identified in a. or b. above.

Can you see what’s wrong with this?  BES Cyber Assets are what makes up a BES Cyber System; they certainly aren’t “associated with” the BCS in the sense that Medium BCS are associated with assets or Facilities that meet one of the criteria in Section 2 of Attachment 1.  Fortunately, the writer found the right term on page 7, where we find this:

Verify that the BES Cyber Assets (and Cyber Assets, if any) comprising the BES Cyber System are identified.

So “associated with” needs to be replaced with “comprising” in the previous quote, and I’ll be happy – about this one point, anyway.

[vii] An example of such an asset is a generating station or Transmission substation that doesn’t contain any cyber assets (in a control capacity) at all.  It clearly won’t contain any BCS, either.

[viii] Unfortunately, even this procedure doesn’t work to identify assets that aren’t High, Medium or Low, such as those that contain no control cyber assets at all.  

[ix] I was told that at least two auditors say they will require a list of Low BCS, precisely so they can make sure the entity correctly identified assets “that contain a Low impact BES Cyber System.”  At this point, the problem has moved beyond being simply amusing.

[x] Actually, I knew that when I started to write the post in question.  But I did think I could write a consistent methodology that would conform to what the wording of R1 would be, if the SDT had taken more time to get it right.  I don’t even believe that anymore.

[xi] I realize there were some entities that did get assessed for having an incorrect RBAM in CIP v1-3 (I believe these mostly happened in the later years).  And I also realize that most NERC entities are going to do the right thing for BCS identification, even if they are given a free pass on CIP-002-5.  But the big problem is that, if everybody gets a pass on CIP-002-5 R1, it undermines the legitimacy of the rest of CIP v5.  How can you really be penalized for violations of let's say CIP-007-5, when everybody knows that you could have simply said you didn't have BES Cyber Systems?  If CIP-002-5 R1 is unauditable, then that spreads to the rest of the standards as well.  As I said, R1 is the foundation of the CIP v5 "house".  If the foundation is rotten, the house falls.

2 comments:

  1. In the meantimethe v5 clock is ticking away. Oh my!!!!

    http://www.timeanddate.com/countdown/to?p0=0&year=2016&month=4&day=1&hour=0&min=0&sec=0

    ReplyDelete
  2. That's very true, Wally. Not only do I think NERC and FERC need to address this now, but I think they need to consider pushing the compliance date back. There are a number of entities - some quite large - that still aren't sure what will be in scope for v5.

    ReplyDelete