Monday, March 23, 2015

Night of the Living RBAM

Parental Warning: Young children may find this post very disturbing.  For that matter, older children will as well – especially those who have been involved in NERC CIP compliance for a few years.

There was very little lamentation or gnashing of teeth when NERC CIP Version 5 did away with the RBAM - Risk-Based Assessment Methodology – and replaced it with the bright-line criteria.   Of course, FERC hated the RBAM because it allowed the entity to develop their own criteria for identifying assets that were critical to the BES (which is why they ordered NERC to develop bright-line criteria in Order 706).  And while I think some entities liked the – ahem – freedom that the RBAM gave them in identifying their Critical Assets, I think the majority were happy that there were now definite (or at least that was the idea) criteria for identifying “big iron” in scope for CIP.   There can be no more complaints about utilities that shirked their duty to secure their important assets; from now on, the entity just has to say, “We are just following the criteria in Attachment 1”.

However, I have some disturbing news: It seems we should have not only buried the RBAM, but driven a stake through its heart and sealed it in a lead-lined coffin.  For it appears the RBAM may not be really dead after all and may have returned from its grave, ready to wreak havoc on the living as they try to comply with CIP Version 5.  Below is the text of a frantic call I received late one night from a compliance person at a major electric utility.  The call ended very abruptly, and I have not been able to contact that person since then.  I am quite worried for him, and have been double-locking my door in the fear that whatever happened to him might happen to me as well.

“I have been reading your posts religiously, and I have taken to heart your constant repetition of the idea that, while there are two main approaches to BES Cyber System Identification – top-down and bottom-up - the only method that is actually mandated[i] in CIP-002-5.1 R1 is the bottom-up one.  In that method, the entity needs to start by identifying which Cyber Assets meet the definition of BES Cyber Asset; they then need to identify BES Cyber Systems that include one or more BCAs – and every BCA must be included in at least one BCS.  Note there is nothing mentioned about the BES Reliability Operating Services (BROS) in this approach.

“The other approach is the top-down one, which starts with identifying the BROS that are fulfilled by the asset or Facility in question, then identifying the systems that fulfill one or more BROS for the asset or Facility.  The systems on this list that have a 15-minute impact on the BES are BES Cyber Systems. The only problem with this approach is that it is in no way required by CIP-002-5.1 R1, while the bottom-up approach is.

“Unfortunately, it seems that many in the industry – and probably the majority of auditors, at least at the moment – believe that the top-down approach is the only way to identify BCS.  I have tried to challenge these people by asking them where in R1 it says to use that approach; until recently, nobody has been able to do this (although in the first draft of CIP v5 in 2011, the BROS were included in the BCA definition, and thus were the only approach allowed by R1; they were moved to the Guidance section with the second draft and were no longer part of R1).  But at the same time, nobody wants to stop believing that use of the BROS is required by R1.

“Until recently, I thought these people’s refusal to agree with me was caused simply by stubbornness.  However, I recently had a discussion with someone who is quite knowledgeable about R1, yet still holds this view.  He referred me to the definition of BES Cyber System: ‘One or more BES Cyber Assets logically grouped by a responsible entity to perform one or more reliability tasks for a functional entity.’  He pointed out that the second part of this definition seems to indicate that performing a ‘reliability task’ is what defines a BCS; and, of course, the BROS are definitely ‘reliability tasks’.

“I must confess that I have been following your lead on reading this definition, Tom.   You have always focused on the first part: that a BCS consists of one or more BCAs ‘logically grouped’ by the entity.  Neither you nor NERC has considered the second part to be important.  For example, in NERC’s Lesson Learned on ‘Grouping  BES Cyber Assets’, nothing is mentioned about the second part of the BCS definition; that is, nowhere does the LL say that a system needs to perform a reliability task in order to be a BCS.  Yet that is what my friend is saying is the case.

“And I must admit, I can’t counter this argument.  After all, his interpretation is there in the definition of BCS.  You can see that more clearly if you reword the definition: ‘Every BES Cyber Asset must be included in one or more BES Cyber Systems that perform one or more reliability tasks.’  A further implication of this is: ‘A BCS that includes BCAs must perform a reliability task.’  Otherwise, the system isn’t a BCS and the BCA can’t be included in it – rather, the entity needs to find another system that includes the BCA in question, that does perform a reliability task.  This system is a BCS.

“I see no way to refute what my friend says.  The definition of BES Cyber System includes the requirement that the BCS fulfill a ‘reliability task’ (which could be one of the BROS, but could be other tasks as well).  But the first part of the definition includes a different requirement: that all BCAs must be included in one or more BCSs.  This leads to the question: Is there a contradiction between these two ‘requirements’?  In other words, will a Cyber Asset that meets the BCA definition always fulfill a reliability task, so it can be included in a BCS?  Given the wording of the BCS definition, there doesn’t seem to be a way that a BCA that doesn’t fulfill a reliability task could ever be part of a BCS – that is, assuming there are Cyber Assets that meet the BCA definition but don't perform a reliability task.

“At this point, I remembered that you have identified two types of systems that would be identified as BCS if the entity performed the full bottom-up approach, but that don’t themselves fulfill any reliability tasks.  The first is an example used in an SPP workshop last year:  A Stack Emissions Monitoring System (SEMS) in a 1500MW+ coal plant will immediately alert operators when there has been an environmental excursion – that is, when emissions of certain gases have exceeded the levels permitted by the EPA for that plant.  Suppose the plant’s management has laid down a hard-and-fast rule that, if an excursion can’t be addressed within 10 minutes, the plant needs to shut down (this may be dictated by the plant’s EPA permit).

“Were a cyber attacker to penetrate this system and falsely make it appear to the operator that an excursion has occurred and that it has lasted ten minutes, the operator would shut the plant down.  This means that one or more components of this system would meet the definition of BES Cyber Asset – i.e. their misuse would adversely impact one or more BES Facilities (here meaning the units monitored by the SEMS, which will presumably be the ones shut down), which if rendered unavailable would affect the reliable operation of the BES within 15 minutes.  Using the bottom-up approach, the entity would need to create a BCS called SEMS that included the components of this system, in order to fulfill the first part of the BCS definition – that all BCAs be included in a BCS.

“However, would the top-down approach identify the SEMS as a BCS?  In other words, does the SEMS fulfill a BROS?  The purpose of this system is emissions monitoring.  While this is of course an important function, it is not a reliability function (at least as I understand it – there is of course no NERC definition of ‘reliability function’ or even ‘reliability’).  The BES would be just as reliable if there were no emissions controls at all; of course, we might all be dead from breathing the gases emitted, but the lights would stay on nevertheless.  The SEMS doesn’t fulfill a BROS[ii], although its components definitely meet the BCA definition.

“Your other example of a system that would meet the BCA definition but doesn’t perform a reliability function is the fire suppression system in a substation.  Obviously, if that system fails to operate when needed (i.e. in the event of a fire), there will be an immediate impact on the BES.  But fire suppression isn’t a reliability function (again, as I understand the phrase), and it certainly isn’t one of the BROS – no matter how important it may be to suppress a fire when it occurs.

“I’m sorry to be long-winded on this (and I’m taking my cue from you, Tom), but what I’m saying with these two examples is there is a contradiction between the two parts of the BCS definition.  The first part says that every BCA must be in a BCS, but the second part says that a BCS must fulfill a reliability function (which my friend is interpreting to mean ‘BROS’).  Since I’ve just shown that there can be BCAs that don’t fulfill a BROS, this means they wouldn’t be part of a BCS due to the second part of the definition - even though they are required to be included in a BCS by the first part.

“But another friend pointed out that the phrase ‘reliability function’ in the BCS definition doesn’t have to be limited to the BROS; there could well be other reliability functions than just the ones that the SDT identified as BROS in the Guidance to CIP-002-5.1.  In fact, since ‘reliability function’ isn’t a defined term, you could simply say it means anything that can have a BES impact if misused, etc. – in other words, you could say that the definition of BCA defines what it means to be a system that performs a reliability function.  This solves the contradiction in the BCS definition, since there will no longer be any possibility that a Cyber Asset could meet the definition but not also perform a reliability function – they’re the same definition!  So the SEMS and the fire suppression system are now BCS.  I must admit this is a pretty neat trick, and not something that can be refuted purely by logic.

“However, can my friend’s assertion be refuted in the context of CIP v5?  First, one can assume that, if the SDT had really wanted the definition of reliability function to be the same as the BCA definition, they would have said that it was; of course, they didn’t.  But let’s say NERC were to put out a Lesson Learned saying this is how “reliability function” should be defined.  In that case, I would have to agree that entities would be well advised to follow that LL.

“So what does this mean for how a NERC entity should identify their BES Cyber Systems, and how they should be audited on that task?  It’s clear that, if NERC does put out this LL, every system that performs a “reliability function” (i.e. meets the BCA definition) should be identified as a BCS.  And the auditors will determine whether the entity has properly identified its BCS by making sure there are no systems performing reliability functions (in the expanded definition) that haven’t been designated as BCS.

“But the problem with going beyond the BROS is that there are no longer any objective criteria for identifying a reliability function.  So let’s suppose that, if the A/C failed in a generating station on a very hot day, the temperature in the plant would rise to a level where it was dangerous for the workers to continue working; they would therefore have to shut the plant down immediately and leave (this is just one example, of course).  Do you need to declare the HVAC system a BCS?  More importantly, if you decide it’s not a BCS but your auditor thinks differently, who’s to say who is right?  No other body is going to come in and make a final decision on this matter – you basically have to fight this out with your region and NERC. 

“And the even bigger issue is: How is the auditor going to know whether or not you have done a complete assessment of the risks to the BES posed by your facility – so that he/she can determine whether or not you have missed some potential BCS?  I don’t think NERC is going to come up with a comprehensive “expanded BROS” list that includes every possible function that could be performed at an asset/Facility.  The only good way to address this problem is to do what has been done in similar situations with both the previous CIP versions and with CIP v5: The entity needs to develop a methodology for assessing these risks and identifying the systems that are critical for mitigating those risks, then implement that methodology.  The result of this process then constitutes the ‘final’ list of BCS, since once this is done there is no more need for further steps for BCS identification (i.e. this methodology will combine the top-down and bottom-up approaches, and every BCS that would be identified by either approach will be identified in this one).

“So what should we call this methodology for assessing risk to the BES?  Let me see, how about…I know, how about Risk Based Assessment Methodology! 

“To summarize what I’ve just said, there is a fundamental contradiction between the two parts of the BES Cyber System definition.  The best way to fix this problem (and many others with CIP-002-5.1) is to rewrite CIP-002-5.1 R1.  However, even if this is done, it will take two to three years, meaning there needs to be some intermediate option.  The intermediate option would be for NERC to a) issue a Lesson Learned saying that a Cyber Asset that performs a ‘reliability function’ is the same thing as a Cyber Asset that meets the BCA definition; and b) require every entity with Medium or High impact assets or Facilities to develop an RBAM and apply it to each of their assets/Facilities.  

“However, my guess is that nothing official will be done about this problem at all; it will be one of a myriad of problems with CIP-002-5.1 R1 (and Attachment 1) that will be left up to the ‘free market’ to resolve – in other words, it will be up to each region and really each auditor to determine how they will address the fundamental contradiction in the definition of BES Cyber System.  Some auditors will simply ignore the contradiction (and I have never seen it mentioned in any document before – from NERC or a region), of course. 

“But I’m also sure that some auditors will require that the entity demonstrate to them that they have considered the different risks that a facility poses for the BES, and identified all the systems that mitigate those risks as BCS.  And this, folks, is an RBAM.  The RBAM has indeed risen from the dead, and is now stalking the land, waiting to attack unsuspecting NERC CIP compliance professionals.  I urge all who read this to be wary, for at any minute…..NO!  How did you get in here!?  What do you want?...Please don’t hurt me – I’ve always liked RBAMs.  Some of my best friends….”

And here the call broke off.  I immediately tried to call back, but just got voicemail.  In fact, I have gotten nothing but voicemail since then.   I tried to call the police in the city where this person lives, but I just got a message saying that, due to a sudden and severe power outage, there could be no further communication with that city – and this was a week ago.

So let me say what I want, and I’ll be brief (there’s a first for everything, I guess).  Mark my words well: The RBAM has come back and has already produced lots of offspring.  They are lumbering into NERC compliance departments and NERC Regional Entities across North America.  There is no way to stop them, except for NERC to develop a definitive clarification of how the BCS identification and classification process should work in R1 (developing a SAR for a new version is also required longer term, although that won’t stop the immediate RBAM invasion).

And since I see about a zero chance of this happening, the only thing I can say is…They’re coming!  Warn your friends that….OMG, something is breaking in...where’s the Upload button?...They’re coming!

Editor’s Note: It seems Mr. Alrich did find the upload button just in time.  We have repeatedly tried to contact him since this post was uploaded, but have had no success.  Our efforts are hindered by the fact that a complete electrical blackout seems to have coincidentally occurred in his home town of Evanston, Illinois within a minute after he uploaded this post; the blackout persists today, four days after he initially uploaded it.  We can make no contact with anybody in that town.  We are deeply concerned about him, as he was about his friend who dictated this post. Meanwhile, we have heard of several other strange cases that seem to match these two.  We have no idea when or if posts on this blog will be resumed.

We regret any inconvenience this may cause you.  We suggest you peruse this list of other Blogspot blogs that you may find take the place of Mr. Alrich’s interesting, yet decidedly long-winded, posts.  Have a nice day.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] Of course, to say this approach is ‘mandated’ by R1 is something of a stretch.  The fact is that nowhere in CIP-002-5.1 R1 or Attachment 1 is the entity told to identify BES Cyber Systems.  The word ‘identify’ is used in R1.1 and R1.2, but its real meaning there is ‘classify’ – i.e. the entity is told to use Attachment 1 to classify those BCS that are High or Medium impact.  It is simply assumed that the entity will figure out beforehand which systems are in fact BCS.  The entity needs to use the BCS and BCA definitions to do this – which is the bottom-up approach.
Note from Tom: I think it’s quite remarkable that this caller was able to include footnotes in his call.  To be honest, I don’t know how he did it.

[ii] Another party indicated that the SEMS could really be said to be fulfilling the Situational Awareness BROS, since it is monitoring a condition of the plant.  However, providing awareness of the plant’s emissions, which the SEMS is doing, doesn’t have anything to do with reliability, but rather environmental compliance.  It’s true that, given the rule put in place by the plant manager, the plant will be shut down if there is an excursion for ten minutes.  However, if you removed this rule the SEMS would operate the same as it always has.  Its function doesn’t depend at all on whether or not the rule is in place; yet the only way in which the SEMS can impact the BES is through this rule.

No comments:

Post a Comment