Parental
Warning: Young children may find
this post very disturbing. For that
matter, older children will as well – especially those who have been involved
in NERC CIP compliance for a few years.
There was
very little lamentation or gnashing of teeth when NERC CIP Version 5 did away
with the RBAM - Risk-Based Assessment Methodology – and replaced it with the
bright-line criteria. Of course, FERC
hated the RBAM because it allowed the entity to develop their own criteria for
identifying assets that were critical to the BES (which is why they ordered
NERC to develop bright-line criteria in Order 706). And while I think some entities liked the –
ahem – freedom that the RBAM gave
them in identifying their Critical Assets, I think the majority were happy that
there were now definite (or at least that was the idea) criteria for
identifying “big iron” in scope for CIP.
There can be no more complaints about utilities that shirked their duty
to secure their important assets; from now on, the entity just has to say, “We
are just following the criteria in Attachment 1”.
However, I
have some disturbing news: It seems we should have not only buried the RBAM,
but driven a stake through its heart and sealed it in a lead-lined coffin. For it appears the RBAM may not be really
dead after all and may have returned from its grave, ready to wreak havoc on
the living as they try to comply with CIP Version 5. Below is the text of a frantic call I
received late one night from a compliance person at a major electric
utility. The call ended very abruptly,
and I have not been able to contact that person since then. I am quite worried for him, and have been
double-locking my door in the fear that whatever happened to him might happen
to me as well.
“I have been
reading your posts religiously, and I have taken to heart your constant repetition
of the idea that, while there are two main approaches to BES Cyber System
Identification – top-down and bottom-up - the only method that is actually
mandated[i] in
CIP-002-5.1 R1 is the bottom-up one. In
that method, the entity needs to start by identifying which Cyber Assets meet
the definition of BES Cyber Asset; they then need to identify BES Cyber Systems
that include one or more BCAs – and every BCA must be included in at least one
BCS. Note there is nothing mentioned
about the BES Reliability Operating Services (BROS) in this approach.
“The other
approach is the top-down one, which starts with identifying the BROS that are
fulfilled by the asset or Facility in question, then identifying the systems
that fulfill one or more BROS for the asset or Facility. The systems on this list that have a
15-minute impact on the BES are BES Cyber Systems. The only problem with this
approach is that it is in no way required by CIP-002-5.1 R1, while the
bottom-up approach is.
“Unfortunately,
it seems that many in the industry – and probably the majority of auditors, at
least at the moment – believe that the top-down approach is the only way to
identify BCS. I have tried to challenge
these people by asking them where in R1 it says to use that approach; until
recently, nobody has been able to do this (although in the first draft of CIP
v5 in 2011, the BROS were included in the BCA definition, and thus were the
only approach allowed by R1; they were moved to the Guidance section with the
second draft and were no longer part of R1).
But at the same time, nobody wants to stop believing that use of the
BROS is required by R1.
“Until
recently, I thought these people’s refusal to agree with me was caused simply
by stubbornness. However, I recently had
a discussion with someone who is quite knowledgeable about R1, yet still holds
this view. He referred me to the
definition of BES Cyber System: ‘One or more BES Cyber Assets logically grouped
by a responsible entity to perform one or more reliability tasks for a
functional entity.’ He pointed out that
the second part of this definition seems to indicate that performing a ‘reliability
task’ is what defines a BCS; and, of course, the BROS are definitely ‘reliability
tasks’.
“I must
confess that I have been following your lead on reading this definition, Tom. You have always focused on the first part:
that a BCS consists of one or more BCAs ‘logically grouped’ by the entity. Neither you nor NERC has considered the
second part to be important. For
example, in NERC’s Lesson Learned on ‘Grouping
BES Cyber Assets’, nothing is mentioned about the second part of the BCS
definition; that is, nowhere does the LL say that a system needs to perform a
reliability task in order to be a BCS.
Yet that is what my friend is saying is the case.
“And I must
admit, I can’t counter this argument.
After all, his interpretation is
there in the definition of BCS. You can
see that more clearly if you reword the definition: ‘Every BES Cyber Asset must
be included in one or more BES Cyber Systems that perform one or more
reliability tasks.’ A further
implication of this is: ‘A BCS that includes BCAs must perform a reliability task.’
Otherwise, the system isn’t a BCS and the BCA can’t be included in it –
rather, the entity needs to find another system that includes the BCA in
question, that does perform a
reliability task. This system is a BCS.
“I see no way
to refute what my friend says. The
definition of BES Cyber System includes the requirement that the BCS fulfill a
‘reliability task’ (which could be one of the BROS, but could be other tasks as
well). But the first part of the
definition includes a different requirement: that all BCAs must be included in
one or more BCSs. This leads to the
question: Is there a contradiction between these two ‘requirements’? In other words, will a Cyber Asset that meets
the BCA definition always fulfill a reliability task, so it can be included in
a BCS? Given the wording of the BCS
definition, there doesn’t seem to be a way that a BCA that doesn’t fulfill a reliability task could ever be part of a BCS –
that is, assuming there are Cyber Assets that meet the BCA definition but don't perform a reliability task.
“At this
point, I remembered that you have identified two types of systems that would be
identified as BCS if the entity performed the full bottom-up approach, but that
don’t themselves fulfill any reliability tasks.
The first is an example used in an SPP workshop last year: A Stack Emissions Monitoring System (SEMS) in
a 1500MW+ coal plant will immediately alert operators when there has been an
environmental excursion – that is, when emissions of certain gases have
exceeded the levels permitted by the EPA for that plant. Suppose the plant’s management has laid down
a hard-and-fast rule that, if an excursion can’t be addressed within 10
minutes, the plant needs to shut down (this may be dictated by the plant’s EPA permit).
“Were a
cyber attacker to penetrate this system and falsely make it appear to the
operator that an excursion has occurred and that it has lasted ten minutes, the
operator would shut the plant down. This
means that one or more components of this system would meet the definition of
BES Cyber Asset – i.e. their misuse would adversely impact one or more BES
Facilities (here meaning the units monitored by the SEMS, which will presumably
be the ones shut down), which if rendered unavailable would affect the reliable
operation of the BES within 15 minutes. Using
the bottom-up approach, the entity would need to create a BCS called SEMS that
included the components of this system, in order to fulfill the first part of
the BCS definition – that all BCAs be included in a BCS.
“However,
would the top-down approach identify the SEMS as a BCS? In other words, does the SEMS fulfill a
BROS? The purpose of this system is
emissions monitoring. While this is of
course an important function, it is not a reliability
function (at least as I understand it – there is of course no NERC definition
of ‘reliability function’ or even ‘reliability’). The BES would be just as reliable if there
were no emissions controls at all; of course, we might all be dead from
breathing the gases emitted, but the lights would stay on nevertheless. The SEMS doesn’t fulfill a BROS[ii],
although its components definitely meet the BCA definition.
“Your other
example of a system that would meet the BCA definition but doesn’t perform a
reliability function is the fire suppression system in a substation. Obviously, if that system fails to operate
when needed (i.e. in the event of a fire), there will be an immediate impact on
the BES. But fire suppression isn’t a
reliability function (again, as I understand the phrase), and it certainly
isn’t one of the BROS – no matter how important it may be to suppress a fire
when it occurs.
“I’m sorry
to be long-winded on this (and I’m taking my cue from you, Tom), but what I’m
saying with these two examples is there is a contradiction between the two
parts of the BCS definition. The first
part says that every BCA must be in a BCS, but the second part says that a BCS
must fulfill a reliability function (which my friend is interpreting to mean ‘BROS’). Since I’ve just shown that there can be BCAs
that don’t fulfill a BROS, this means they wouldn’t be part of a BCS due to the
second part of the definition - even though they are required to be included in
a BCS by the first part.
“But another
friend pointed out that the phrase ‘reliability function’ in the BCS definition
doesn’t have to be limited to the BROS; there could well be other reliability
functions than just the ones that the SDT identified as BROS in the Guidance to
CIP-002-5.1. In fact, since ‘reliability
function’ isn’t a defined term, you could simply say it means anything that can
have a BES impact if misused, etc. – in other words, you could say that the
definition of BCA defines what it means to be a system that performs a
reliability function. This solves the
contradiction in the BCS definition, since there will no longer be any
possibility that a Cyber Asset could meet the definition but not also perform a
reliability function – they’re the same definition! So the SEMS and the fire suppression system
are now BCS. I must admit this is a
pretty neat trick, and not something that can be refuted purely by logic.
“However, can
my friend’s assertion be refuted in the context of CIP v5? First, one can assume that, if the SDT had
really wanted the definition of reliability function to be the same as the BCA
definition, they would have said that it was; of course, they didn’t. But let’s say NERC were to put out a Lesson
Learned saying this is how “reliability function” should be defined. In that case, I would have to agree that entities
would be well advised to follow that LL.
“So what
does this mean for how a NERC entity should identify their BES Cyber Systems,
and how they should be audited on that task?
It’s clear that, if NERC does put out this LL, every system that
performs a “reliability function” (i.e. meets the BCA definition) should be
identified as a BCS. And the auditors
will determine whether the entity has properly identified its BCS by making
sure there are no systems performing reliability functions (in the expanded
definition) that haven’t been designated as BCS.
“But the
problem with going beyond the BROS is that there are no longer any objective
criteria for identifying a reliability function. So let’s suppose that, if the A/C failed in a
generating station on a very hot day, the temperature in the plant would rise
to a level where it was dangerous for the workers to continue working; they
would therefore have to shut the plant down immediately and leave (this is just
one example, of course). Do you need to
declare the HVAC system a BCS? More importantly,
if you decide it’s not a BCS but your auditor thinks differently, who’s to say
who is right? No other body is going to
come in and make a final decision on this matter – you basically have to fight
this out with your region and NERC.
“And the
even bigger issue is: How is the auditor going to know whether or not you have
done a complete assessment of the risks to the BES posed by your facility – so
that he/she can determine whether or not you have missed some potential
BCS? I don’t think NERC is going to come
up with a comprehensive “expanded BROS” list that includes every possible
function that could be performed at an asset/Facility. The only good way to address this problem is
to do what has been done in similar situations with both the previous CIP
versions and with CIP v5: The entity needs to develop a methodology for
assessing these risks and identifying the systems that are critical for
mitigating those risks, then implement that methodology. The result of this process then constitutes
the ‘final’ list of BCS, since once this is done there is no more need for
further steps for BCS identification (i.e. this methodology will combine the
top-down and bottom-up approaches, and every BCS that would be identified by
either approach will be identified in this one).
“So what
should we call this methodology for assessing risk to the BES? Let me see, how about…I know, how about Risk
Based Assessment Methodology!
“To
summarize what I’ve just said, there is a fundamental contradiction between the
two parts of the BES Cyber System definition.
The best way to fix this problem (and many others with CIP-002-5.1) is
to rewrite CIP-002-5.1 R1. However, even
if this is done, it will take two to three years, meaning there needs to be
some intermediate option. The
intermediate option would be for NERC to a) issue a Lesson Learned saying that
a Cyber Asset that performs a ‘reliability function’ is the same thing as a
Cyber Asset that meets the BCA definition; and b) require every entity with
Medium or High impact assets or Facilities to develop an RBAM and apply it to
each of their assets/Facilities.
“However, my
guess is that nothing official will be done about this problem at all; it will
be one of a myriad of problems with CIP-002-5.1 R1 (and Attachment 1) that will
be left up to the ‘free market’ to resolve – in other words, it will be up to
each region and really each auditor to determine how they will address the
fundamental contradiction in the definition of BES Cyber System. Some auditors will simply ignore the contradiction
(and I have never seen it mentioned in any document before – from NERC or a
region), of course.
“But I’m
also sure that some auditors will require that the entity demonstrate to them
that they have considered the different risks that a facility poses for the
BES, and identified all the systems that mitigate those risks as BCS. And this, folks, is an RBAM. The RBAM has indeed risen from the dead, and
is now stalking the land, waiting to attack unsuspecting NERC CIP compliance
professionals. I urge all who read this
to be wary, for at any minute…..NO! How
did you get in here!? What do you
want?...Please don’t hurt me – I’ve always liked RBAMs. Some of my best friends….”
And here the
call broke off. I immediately tried to
call back, but just got voicemail. In
fact, I have gotten nothing but voicemail since then. I tried to call the police in the city where
this person lives, but I just got a message saying that, due to a sudden and
severe power outage, there could be no further communication with that city –
and this was a week ago.
So let me
say what I want, and I’ll be brief (there’s a first for everything, I
guess). Mark my words well: The RBAM has
come back and has already produced lots of offspring. They are lumbering into NERC compliance
departments and NERC Regional Entities across North America. There is no way to stop them, except for NERC
to develop a definitive clarification of how the BCS identification and
classification process should work in R1 (developing a SAR for a new version is
also required longer term, although that won’t stop the immediate RBAM invasion).
And since I
see about a zero chance of this happening, the only thing I can say is…They’re coming! Warn your friends that….OMG, something is
breaking in...where’s the Upload button?...They’re
coming!
Editor’s Note: It seems Mr. Alrich did find
the upload button just in time. We have
repeatedly tried to contact him since this post was uploaded, but have had no success. Our efforts are hindered by the fact that a
complete electrical blackout seems to have coincidentally occurred in his home town of Evanston, Illinois
within a minute after he uploaded this post; the blackout persists today, four
days after he initially uploaded it. We
can make no contact with anybody in that town.
We are deeply concerned about him, as he was about his friend who
dictated this post. Meanwhile, we have heard of several other strange cases
that seem to match these two. We have no
idea when or if posts on this blog will be resumed.
We regret any inconvenience this may cause
you. We suggest you peruse this list
of other Blogspot blogs that you may find take the place of Mr. Alrich’s
interesting, yet decidedly long-winded, posts.
Have a nice day.
The views and opinions expressed here are my
own and don’t necessarily represent the views or opinions of Honeywell.
[i]
Of course, to say this approach is ‘mandated’ by R1 is something of a
stretch. The fact is that nowhere in
CIP-002-5.1 R1 or Attachment 1 is the entity told to identify BES Cyber Systems.
The word ‘identify’ is used in R1.1 and R1.2, but its real meaning there
is ‘classify’ – i.e. the entity is told to use Attachment 1 to classify those
BCS that are High or Medium impact. It
is simply assumed that the entity will figure out beforehand which systems are
in fact BCS. The entity needs to use the
BCS and BCA definitions to do this – which is the bottom-up approach.
Note from Tom: I
think it’s quite remarkable that this caller was able to include footnotes
in his call. To be honest, I don’t know
how he did it.
[ii]
Another party indicated that the SEMS could really be said to be fulfilling the
Situational Awareness BROS, since it is monitoring a condition of the
plant. However, providing awareness of
the plant’s emissions, which the SEMS is doing, doesn’t have anything to do
with reliability, but rather environmental compliance.
It’s true that, given the rule put in place by the plant manager, the plant will be shut down if there is an excursion for
ten minutes. However, if you removed this rule the SEMS would operate the same as it
always has. Its function doesn’t depend
at all on whether or not the rule is in place; yet the only way in which the
SEMS can impact the BES is through this rule.