Tom Alrich's Blog: April 2015

Thursday, April 30, 2015

Just When You Thought You Had BCS Figured Out….

As I mentioned in a recent post, I was able to listen to almost the entirety of RFC’s day-and-a-half CIP v5 workshop two weeks ago, and found it really worthwhile (I hope the recording will be posted on NERC’s v5 Curriculum site, but it’s not there yet). Scott Mix of NERC made four – count ‘em – presentations, all good. But a subject he brought up in his last presentation was particularly striking, and made me realize there could be a whole new dimension of CIP v5 compliance that I hadn’t realized.[i]

Scott’s presentation (it wasn’t on the original workshop list, and the links to his slides weren’t provided with the other slide links) was about the draft Lesson Learned on “Grouping BES Cyber Assets”. The first paragraph of the Guidance reads:

“Registered entities may choose to create different groupings of BES Cyber Assets to comply with individual CIP Version 5 standards. For example, all the Energy Management System (EMS) servers at a Control Center and the associated backup Control Center could be grouped together as they are categorized at the same impact level. Alternatively, it may be best to group Microsoft Cyber Assets, Linux Cyber Assets, and other Cyber Assets (e.g., network or disk servers) according to the software patching requirements (as the patch sources may be different and released on different release cycles) for compliance purposes.”

In other words, there’s no reason why the entity needs to just create one set of BES Cyber Systems while complying with CIP-002-5.1 R1, and stick with that same set through all of the remaining standards; you can create BCS in one way for the purposes of one standard and in another way for another standard. In fact, even though the LL doesn’t say this, I don’t see that v5 even prohibits you from using different BCS groupings for different requirements within a standard.

The idea that you could do this has been around for a while, since before the LL; it was clear from the start that there was no wording in the v5 requirements that prohibits this. An auditor and I had discussed it last year, but at the time it seemed obvious to both of us that trying to track this would be a nightmare. If you wanted to have let’s say five different BCS groupings, you would have to document which grouping you were using for each requirement, and make sure you didn’t mix BCS between groupings. For example, say you have one grouping you use for CIP-005 and another for CIP-007. You need to make absolutely sure you don’t forget and use a few of the 005 BCS while complying with 007, or vice versa. And you would need to make it impossible for people in the field to make this mistake as well. Of course, you also have to manage this as long as CIP v5 is in effect, through personnel turnovers, etc. It will be quite a job.

Going through an audit would be even more of a challenge. You would have to show the auditor that you were consistent in your use of different BCS groupings for different requirements. Just as important, you would have to prove you had never left any BES Cyber Assets outside of a BCS, since the definition of BES Cyber System makes it quite clear that every BCA must be part of a BCS. And you would need to show the auditor a “map” linking each BCA to the BCS it was part of in each BCS grouping.

Because of these complexities, the auditor and I agreed it wasn’t likely that any entity would want to change BCS groupings like this. Moreover, it seemed clear that no NERC region would allow this to be done, because of the complexity it would introduce into the audit process.

So I was surprised to hear Scott Mix not only say it was possible to group BCS differently for different standards, but that it might offer entities some real advantages, since different standards/requirements lend themselves better to different groupings. For example, as the Lesson Learned suggests, complying with CIP-007-6 R2 (Patch Management) would obviously be much more efficiently done if BCAs were grouped into BCS by operating system – Linux systems in one, Windows 7 in another, Windows 95 in a third (OK, this one is a joke. I hope there aren’t any Windows 95 boxes anywhere on the BES!). Other requirements that would be much more efficiently addressed with an OS-based grouping would be CIP-009-6 R1-R3, Backup and Restore.

On the other hand, CIP-005-5 R1 (ESP) will be much easier to comply with if all of the BES Cyber Systems are grouped as I suspect most entities are grouping them now: by function. Every BCS that is connected to a network must be within an ESP. Yet suppose the entity wanted to comply with this requirement by grouping BCAs by OS as I just described; and say the asset in question was a large generating station with multiple buildings. There might be Linux BCAs scattered among different buildings, yet if they were all to be in one ESP, this would require having the ESP span multiple buildings. While this isn’t forbidden, it might cause some logistical challenges that the entity would rather avoid.[ii] So a functional grouping – where all the BCAs within a BCS would be on the same network – might be a better approach for this requirement.

Am I saying that every entity subject to CIP v5 (with High or Medium impact assets) should go back and consider whether it might be better for them to group BCAs into BCS differently for different v5 standards or requirements? Yes, I am. I can see some potentially huge savings in ongoing compliance costs resulting from this.

But I’m not saying that all entities actually have to use more than one BCS grouping in their CIP v5 compliance program. For one thing, just because of the number of BCS that are involved, this is a strategy that will benefit large entities much more than smaller ones. Yet for the large entities, that hopefully started their v5 compliance effort in earnest sometime last year, it may well be too late to implement this strategy. However, given the possible payoff, I think every entity should at least give this idea serious thought.

BTW, this is another reason why I’m advocating that the CIP v5 compliance date be pushed back by a year. It would have been wonderful if NERC had made it clear – say – last summer that having different BCS groupings was a possibility. Unfortunately, at this point I’m not sure anyone will be able to take advantage of what Scott so eloquently suggested in Cleveland. It will take a lot of work to implement.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte & Touche LLP.

[i] Unfortunately, there was an outage in the webcast audio for about 20 minutes, but that came after Scott had convinced me of his point – the subject of this post. Since the outage seemed to be communications-related, it hopefully won’t appear in the recording.

[ii] This discussion wasn’t included in Scott’s presentation, as far as I can remember. This is my own reasoning, and Scott can’t be blamed for it.

Wednesday, April 22, 2015

I've Moved!

I am pleased to announce that I’ve moved my corporate home to Deloitte & Touche LLP. I will be part of the Energy & Resources industry group in the Cyber Risk Services market offering. And of course – as you can probably guess – my primary focus will be NERC CIP compliance. Deloitte has a large and active practice helping organizations prepare for compliance with CIP Version 5; I'm sure I'll have more to say about that in the near future.

What will my title be? Well, I've requested “All-High Grand CIP Wizard”. We’ll see if I get that.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Deloitte & Touche LLP.

Sunday, April 19, 2015

Flailing Away

I had the pleasure to attend – by webcast – almost the entire RFC CIP v5 Workshop, which was held in Cleveland last Thursday and Friday morning. I strongly recommend that anyone working on CIP v5 compliance view as much as they can of the presentations.[i] They were all good, although my personal favorites were the ones by Scott Mix of NERC (he did three or four, all worth listening to[ii]), Felek Abbas of NERC (who did two, both excellent), and Lew Folkerth of RFC (whose presentation on the “CIP Version 5 Core Requirements” included, for each requirement discussed, a list of “implicit requirements” – ones that aren't specifically stated but which become apparent when you carefully consider what needs to be done to actually comply with the requirement as written. This is of course a big problem with CIP v5 – the fact that so much of what you need to do to comply with it isn't actually explicitly stated in the requirements. One implicit requirement I’ve often pointed out, which Lew didn't mention, is a result of the fact that CIP-002-5.1 R1 never tells you to identify BES Cyber Systems in the first place, only to classify them, although R1.1-1.3 use the word “identify” when “classify” should have been used. This is why I have done several posts on BCS identification).

However, there was one aspect of one of Scott Mix’s presentations that I found rather depressing. In one of his presentations (I believe the one on “CIP Standards Modifications”), he discussed NERC’s ongoing efforts to address all of the questions about interpretation of the requirements of CIP v5, and mentioned that NERC is unveiling next week (at a webinar on Wednesday - you can register here) a brand spanking new approach to providing guidance, above and beyond the Lessons Learned and FAQs (which are continuing as well).

The basis of this new approach seems to be a fairly recent realization on NERC’s part (I had heard about it and mentioned it in this recent post) that there is a more authoritative trove of guidance already available; this is a 1000-plus-page section of NERC’s original filing of CIP v5 with FERC on January 31, 2013 (I won’t include the link to this, because it’s a huge file). This section contains the primary comments that were provided to the SDT by NERC entities as v5 was developed, as well as the SDT’s responses to those comments.

Scott implied that, because the responses to the comments were written by the SDT, and because at least one person at FERC presumably read through these as the staff was deciding whether or not to recommend to the Commissioners that they approve v5, they therefore have a higher “authoritative” status than do the Lessons Learned and FAQs. Essentially, it seems NERC has decided to “mine” this document for whatever pieces of interpretive wisdom can be gleaned, and publish these as separate documents (not yet named, although see below for more on that subject).

This might sound wonderful – here’s a whole treasure trove of guidance from the SDT that might address lots of v5 problems that have been brought up, both by unscrupulous bloggers only out for personal gain, as well as by NERC entities who have been uncovering them as they struggle unsuccessfully to understand what the v5 requirements mean. However, while I don’t think this is necessarily a bad thing, I also don’t see that it will provide much benefit – especially given that the effort put into this task would be better spent trying to accelerate the rate of production of Lessons Learned (which, given that only two have been finalized in the time since the LLs were announced last September, can’t be said to be super-fast). Here are my reasons for saying this:

It is a stretch to say that the SDT’s responses to comments were something official from the SDT, at the same level as the Guidance and Technical Basis in each of the v5 standards. The latter were debated by the SDT before being finalized with the requirements themselves. While I’m sure the responses to comments were ultimately voted on by the SDT, they were prepared by individuals. The SDT really had no other choice but to do this. I remember one of the v5 drafts drew about 2,000 pages of comments – and that was only one of the four official drafts. The SDT had to respond to every comment, and the only way to do that was to parcel them out among the different members to respond. I didn't attend a lot of the SDT meetings, but I don’t think the members spent a lot of time debating responses to comments. How could they possibly do that, given their otherwise huge workload? So these responses need to be taken as primarily the work of individual SDT members, not the SDT itself.
Since FERC Order 791 (which approved v5) didn't specifically refer to these comments or the SDT’s responses, I think it’s a stretch to imply that FERC in some way “approved” the responses – just because they didn't take issue with any of them. I know Scott didn't state that FERC had approved them, but by even bringing up FERC he was implying something like, “FERC didn't have objections to the responses”. As it is, the fact that FERC didn't refer to any of the responses in 791 could just as well be taken to mean they didn't think they had any real relevance.
The whole idea that, in trying to understand what the v5 requirements mean, it would be beneficial to learn the “intent of the SDT” is fallacious. I wrote a post on this question last year, so you may want to read that. The conclusion of the post is that there is no way to definitively discern the “intent of the SDT” on any particular issue having to do with v5; in fact, it’s really a meaningless concept.
I haven’t read the section of the NERC v5 filing that's in question (it’s on my reading list, but it’s behind Finnegan’s Wake. Since I first tried to tackle that in college and I've never gotten even to the end of the first chapter, it’s likely to be a while before I get to the SDT’s document), so in general I can’t say anything about the SDT responses in that document. However, I recently wrote a post on the meaning of “adversely impact” in the BCA definition; the post took as its starting point one of the sections in NERC’s April 1 FAQ document. That section repeated the SDT response to the same question, which was included in the v5 filing. The SDT response was basically that the meaning of “adverse impact” should be obvious and nothing more needs to be said about it. If this is exemplary of the nuggets of wisdom to be mined from the filing, I recommend those nuggets be left unmined.

During Scott Mix’s presentation, one person raised the question whether these new NERC “guidance” documents were really just another try at the CANs and CARs, previous unsuccessful NERC efforts to provide some sort of mandatory guidance to the auditors on the meaning of particular requirements. Scott said no, and I agree that isn't the issue I’m concerned about. As I've said many times over the past year, there is no longer a way NERC can provide any definitive clarification of v5, other than to rewrite the standards or go through the formal RFI process. Both of these will take years to bear fruit, so they don’t do any good for the run-up to v5 compliance next year. NERC has already tried to imply that the Lessons Learned will provide mandatory guidance (in some way) for the auditors, but that has run into opposition from a lot of NERC entities and at least one region (at NPCC’s CIP v5 workshop that I attended in Albany in March, it was stated unequivocally that the LLs aren't mandatory, for the auditors or the entities).

So I think it’s a waste of time, although perhaps not pernicious, for NERC to pursue this new type of document, rather than doing what they should be doing – thinking about the different questions on v5 and coming up with well-reasoned Lessons Learned and FAQs, which can provide good non-mandatory guidance to entities. It's as if NERC has decided that basing their new documents on the SDT filing relieves them of the burden of having to think about what's reasonable and what's not; I'm afraid that's not the case.

In the meantime, I’ll keep writing my Lessons Learned; I've done three so far (in just over one week), although I won’t declare them final for another month, to give people a chance to comment on them. And I’m more convinced than ever that both I and NERC (as well as anyone else who wants to try their hand at writing Lessons Learned) have our hands full in writing these things. In the RFC workshop, they kept a running log of all of the v5 questions that were raised, that couldn't be answered on the spot; these will be turned over to NERC to address. Can you guess how many questions they logged in a day and a half? 64.

To give some perspective on this number, I estimated in February that there are over 500 questions that need to be answered before the 4/1/16 compliance date for v5 (and they need to be answered not just on March 31, 2016, but anywhere from three months to two years before the compliance date. Of course, there are now only 11 and a half months ‘til that date, so the two year part will be pretty hard to meet without a time machine). But with 64 questions coming up in a day-and-a-half workshop in just one of the regions – a workshop whose purpose wasn’t even to come up with questions but to try to explain the standards – I’d say my estimate is definitely on the low side. I’m sure that a full list gathered today would include probably 1,000 questions, and that – since the questions are growing metastatically as entities try in earnest to understand CIP v5 – by next April there will be well over 1,000 questions left unanswered, no matter how many are answered between now and then.

NERC, with efforts like the one just described (and the SGAS, discussed at the end of this post), it seems you’re flailing away, desperately trying to do something – something – to answer all of these questions in time. However, I said in January that it was already too late. My opinion hasn't changed since then: The ship has sailed. There is no longer any chance that CIP version 5 can be made fully enforceable on April 1, 2016. The only thing you can do now is to admit this, try to pick up the piece, and figure out a course that will get you on a path to having a truly enforceable version in a year or two. To reiterate (and update) the steps I said you need to take in the January post:

1. You need to push back the compliance dates for v5 by a year. So April 1, 2017 will be the date for the Highs and Mediums, and all the other dates will be a year later. Note this doesn't mean you need to leave v3 in effect until 2017; you can still say v5 will be the law of the land on 4/1/16. However, 4/1/16 to 4/1/17 should be a "free" period during which no PVs will be assessed for any of the v5/v6 standards, provided the entity is making a good faith effort to comply.

2. You need to really get cracking on the Lessons Learned, etc. – with the goal of having all important questions about CIP v5 answered by April 1, 2016. This will give entities a year to put their compliance programs in place, with some assurance that they understand what is required of them.

3. You need to declare CIP-002-5.1 R1 an “open” requirement, meaning there will be no PVs issued (even after 4/1/17) for entities that make a good faith effort to comply with it – reading everything available about it, “rolling their own” definitions where needed, etc. There are simply too many contradictions and inconsistencies in this requirement (and in Attachment 1) for it to be fixable with Interpretations, Lessons Learned, etc. It needs to be rewritten from scratch (while trying to preserve what is good about the current version, which is actually a lot).

4. You (or one of the entities) need to issue a SAR to rewrite R1 to make it consistent and unambiguous. When that is done – say in three years – this can then become an enforceable requirement.

And what happens if you don’t take my advice (and I don’t think you will)? Every month you delay taking these steps only increases the embarrassment you will suffer when you finally have to admit that v5 can’t be enforceable on 4/1/16. The fallout from this will be severe, the closer we come to the compliance date.

I’d like to make another suggestion based on the RFC meeting, NERC. When Scott Mix was discussing the new guidance documents that you’ll be putting out, he said their name hadn’t been decided on (indeed, that it was changing hourly), but that “Compliance Application Memo” was the leading candidate - at least at the moment he spoke.

Let me suggest that you not use this term. It seems to me that, if you’re trying to erase the memory of the Compliance Application Notices (CANs) and the Compliance Analysis Reports (CARs) from people’s minds, the last thing you want to do is come out with a new document that has a similar name, and whose acronym (CAM) sounds almost identical to CAN. But maybe I’m over-thinking this. What could possibly go wrong?

Postscript: The SGAS

There is another recommendation I made to NERC recently – that they make public the compliance advice they give in the Small Group Advisory Sessions (SGAS), currently being held in Atlanta. I still stand by every word in that post, but I realize my analysis was too narrow. Steve Parker of EnergySec did a much better analysis of the problems raised by the SGAS in one of their NERC CIP newsletters in March. He raised three main issues, which I’d like to elaborate on here.

“The possibility (or perhaps likelihood) that NERC will be providing specific, non-public advice to individual entities jeopardizes the independence of the ERO with respect to future audits.” This means that NERC is essentially tying their own hands on particular issues. If they tell one entity that the method they've chosen to comply with a particular requirement is correct, how could they later issue any guidance that said anything else?
“The non-public nature of the meetings creates doubt that determinations made during such a meeting will be properly vetted and published for other entities to reference. This essentially creates a two-class system in which entities with the ability to attend an SGAS potentially receive compliance information (or determinations) on a preferential basis.” This is part of my argument in the post referenced above.
“It creates a likely scenario in which Regional auditors will be pressured, or at least unduly influenced, to rule one way or another based on the advice given to an entity in such a session.” This is really important. The entities are audited by the regions, and an auditor from the entity’s region will usually be in the room for the SGAS. If NERC says that what the entity is doing to comply with a particular requirement is correct, how could the region possibly find any differently when they go to conduct the audit? Remember, the regions are part of NERC. If your boss tells you that something is correct and that it’s a settled matter, how can you possibly go against this? Of course, this basically destroys auditor independence, one of the principles of GAGAS, the rules that supposedly govern NERC auditors.

What will be the likely effect of the SGAS? I don’t think it will be immediate, since it will only be felt when an entity gets a PV they don’t agree with and takes it to court – this is likely to be four or five years from now. But at that point, I believe all of CIP v5 will be deemed unenforceable (I describe my reasoning for this conclusion in the post referenced above). And I frankly don’t know what will happen after that.

But until then, don’t worry – the SGAS will be deemed a great success. It’s like the guy who jumps off the top floor of the Sears Tower. As he passes the 50^th floor he yells out, “So far, so good!”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I would include a link if it were available. I know the presentations were recorded, so I imagine they will be posted on the NERC v5 Curriculum site.

[ii] His very last presentation on Friday really opened my eyes to an aspect of CIP v5 I’d never realized before. I hope to have a post on this soon.

Saturday, April 18, 2015

Tom’s Lessons Learned No. 3: Phone Systems

I have now done two posts (both of which I’m calling Tom’s Lessons Learned) about the meaning of “adversely impact” in the definition of BES Cyber Asset. The second post went further to show how my “definition” (really a procedure for determining whether or not a Cyber Asset can adversely impact the BES) could clear up the question whether HVAC and UPS systems need to be considered as BCA/BCS. I also gave two other examples of systems where my “definition” provides a way to answer this question: the SEMS system in a power plant and the fire suppression system in a substation.

The day after this post, an auditor emailed me to say that phone systems should be included in this analysis as well (we’re talking about electronic phone systems here, since others wouldn’t be Cyber Assets in the first place. If your current phone system requires you to ring up the operator and ask for “Ravenswood 4229”, you’re already off the hook – so to speak[i]).

Of course, the reason that phone systems would even be an issue in the first place is that they are sometimes a backup for system-to-system communications, e.g. when a control center dispatches a generating station. And some have wondered to me whether, in cases where the communications needs to happen within 15 minutes and the SCADA system could fail, the phone system might have to be declared a BCA/BCS (since as we well know, redundancy isn’t in itself an argument against declaring it such).

So let’s apply the analysis from the previous post, which at its heart consists of two questions. Both of these questions need to be answered affirmatively in order for the Cyber Asset to be considered to have adverse impact on the BES, if lost or misused.

1. Does the loss or misuse of the Cyber Asset adversely impact the asset/Facility?

2. Does this adverse impact on the asset/Facility necessarily[ii] translate into an adverse impact on the BES within 15 minutes?

To answer the first question, I think it can be said there would be some sort of adverse impact on the control center if the phone system were down. But what about the second question?

Let’s say the SCADA system in a control center is down (and the backup SCADA has failed to kick in for whatever reason); meanwhile, the ICCP system (which isn’t down) shows that the ISO needs a peaker plant dispatched immediately. If the control center’s phone system happens to be down as well, are they simply SOL? Will there be an inevitable BES impact? That’s hard for me to believe, since probably everybody in the control room has a cell phone in their pocket or purse. My guess is the message will get through to the peaker plant, even if it requires smoke signals or carrier pigeon.[iii]

So the answer to the second question is no, there won’t inevitably be a BES impact. Ergo, phone systems don’t need to be considered as BES Cyber Assets/Systems.

The auditor did make another good point about the previous post. He pointed to the place where I’d essentially restated the two questions. In discussing what an entity would need to prove in order to show that a Cyber Asset wouldn't have an adverse impact on the BES if lost or misused, I had said they would need to show that this loss or misuse

Won’t impact the asset/Facility (i.e. question 1 above)
in a way that would cause the asset/Facility to fail to fulfill one or more of the BROS that it normally fulfills (question 2).

He noted that making total failure to fulfill one or more BROS the criterion determining whether or not the second condition had been met would eliminate cases where misuse of a Cyber Asset had caused the asset/Facility to partially fulfill its BROS. He gave the hypothetical example of an entity that argued (using the SEMS example from the previous post) that while the plant may have had to reduce its generation output below a certain threshold in the event of the SEMS failure, as opposed to tripping the plant offline, it was still producing energy, doing voltage control, etc. - all of the BROS functions it normally performs; it just wasn't completely fulfilling all of those BROS to the same degree as previously. His point was that even a partial failure to fulfill BROS constitutes adverse impact on the BES. I have changed the second item to read “in a way that would cause the asset/Facility to fail to fully fulfill one or more of the BROS that it normally fulfills..”

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I won’t say there aren’t any of these systems out there. There may still be a few utilities that haven’t gotten permission from the PUC to update the phone system they bought in the 1930’s. Of course, I can’t imagine there are too many operators out there nowadays, ready to plug the long black thingy into the proper hole on their switchboard.

[ii] The word “necessarily” wasn’t in the previous post, but I think it is really crucial (I’ve updated that post now). As I said in the previous post, it seems to me axiomatic that a control system could have an adverse impact on the asset or Facility it’s associated with or located at (question 1); it wouldn’t be a control system if that weren’t the case. But it isn’t axiomatic that the impact on the asset/Facility will translate into an impact on the BES (question 2). In the case of the fire suppression system in the previous post, even though that system had been disabled by a hacker, someone might be at the substation and pick up a fire extinguisher to put out the fire; or the wind might be blowing in a direction where there was no harm to a BES Facility. It is only if the BES impact is inevitable (question 2) that the Cyber Asset can be said to have an adverse impact if lost, misused, etc. – and therefore be a BES Cyber Asset.

[iii] The fact that I’m even considering this question may seem to violate the statement in the BCA definition that redundancy “shall not be considered when determining adverse impact.” Remember, since I’m breaking the determination of “adverse impact” into two parts, this statement only needs to be true for one of the two parts (questions). For the second question, I agree that redundancy doesn’t make any difference – if an asset has a BES impact, it has it regardless of whether or not there is redundancy. But for the first question, I think redundancy is sufficient mitigation to make the answer to the first question “no”, and therefore to make the phone system not a BES Cyber Asset/System. Think what you might have to do if redundancy weren’t a mitigation for the first question, in the control center case: every cell phone used in the control center (and actually maybe every cell phone that could be borrowed by an operator, no matter who owned it) would have to be considered as a BES Cyber Asset!

Wednesday, April 15, 2015

Tom’s Lesson Learned No. 2: “Adversely Impact” and “Support Systems”

I posted the first of Tom’s Lessons Learned last week, and received a few serious comments the next day. And guess what – the comments, and the email discussions I had with the people who sent them, revealed that a new Lesson Learned is needed! As I have said in the past, interpretation questions on v5 – especially on BCS identification and classification – are like the Greek myth of Hydra, the multi-headed serpent that grows two new heads for every one that is cut off. They’re never-ending.

My first Lesson Learned was about the meaning of the words “adversely impact” in the definition of BES Cyber Asset. In that post, I stated that the best way to think about the impact of a particular Cyber Asset on the BES is as a two-step process: the Cyber Asset adversely impacts the asset or Facility it’s associated with, then the asset/Facility adversely impacts the BES itself.

Why did I say this? I must admit I didn’t make my reason clear in the previous post, but I’ll say it now: For the vast majority of Cyber Assets, it is meaningless to talk about their having a direct impact on the BES. If they have an impact, it is only through the asset/Facility they’re associated with. For example, the DCS in a generating station doesn’t directly impact the BES, other than through the station itself. Another example: the relay controlling a circuit breaker for a 500kV line doesn’t in itself impact the BES. If it opens the breaker, it’s the loss of that line (the Facility) that has an impact, not the relay itself.

So the meaning of “adversely impact” in the BCA definition comes down to two questions:

Does the loss or misuse of the Cyber Asset necessarily adversely impact the asset/Facility?
Does this adverse impact on the asset/Facility translate into an adverse impact on the BES?

If the answer to either of these questions is No, the Cyber Asset isn’t a BCA.

However, just stating these two questions doesn’t lead us much further down the path of understanding what “adversely impact” means. What really matters is what the auditor will think it means when he/she pays you a visit say three years from now. Let’s say you didn’t identify a particular Cyber Asset as a BES Cyber Asset because you don't think its loss or misuse would lead to an adverse impact on the BES; the auditor disagrees with this. How do you justify your decision?

The short answer to this question is that you pull out the extensive documentation you created in April 2015, which justifies why you did this. The document will say something like “We decided that Cyber Asset X wasn’t a BES Cyber Asset because we asked these two questions (here you list the two questions above, perhaps not with my exact wording).” If the auditor asks why you used that approach, you can answer, “NERC said in their April 1, 2015 FAQ release that ‘adversely impact’ meant ‘negatively impact’. Since we already assumed that was the case, this didn’t help us understand what the phrase actually meant. Therefore, we rolled our own interpretation.”

Of course, your April 2015 document described this approach in detail (it might include elements of both this current post and the earlier one, although I don’t recommend you state that you were following Tom Alrich’s advice. If you do, the region may triple the VSL for whatever violation you’re assessed).

But how will you defend your answers to the two questions? For both questions, it’s safe to assume you won’t need to defend a “Yes” answer. It’s only if you are saying there is no impact, either of the Cyber Asset on the asset/Facility or of the asset/Facility on the BES, that you may get challenged by your auditor.

Let’s deal with the first question first. How would you defend a decision that the loss or misuse of a particular Cyber Asset won’t adversely impact the asset/Facility it’s associated with? For a control system, I really don’t see a way to do that. A control system has to have an impact; otherwise it isn’t a control system (remember, the “within 15 minutes” part of the BCA definition is separate from what we’re discussing here. Even though a system impacts the asset/Facility and the latter impacts the BES, if it doesn’t do that within 15 minutes, it still won’t be a BCA. But that is a later step in the BCA identification process).

Moving to the second question, how would you defend a decision that the adverse impact on the asset/Facility (caused by the loss or misuse of the Cyber Asset in question) wouldn’t adversely impact the BES? The problem is that there are a whole host of ways that an asset or Facility could impact the BES. It seems to me that you would have to show you had considered all of those ways, in coming to the conclusion that there wouldn’t be an adverse impact.

Where does this list of “ways” (i.e. modes of impact) come from? Fortunately, the SDT has already addressed that question, although not in the CIP-002 R1 standard itself. The BES Reliability Operating Services – discussed in the Guidance for CIP-002-5.1 – constitute a list of ways that an asset/Facility can impact the BES. I think it should be sufficient if you simply showed the auditor that the loss or misuse of the Cyber Asset

a) Won’t impact the asset/Facility (i.e. question 1 above)

b) in a way that would impede the ability of the asset/Facility to fully fulfill one or more of the BROS that it normally fulfills (question 2).

So does this solve all of our problems? Do we now know for sure what “adversely impact” means in the BCA definition? I’d like to say we do, but a problem remains: What I’m advocating contradicts the actual wording of the BCA definition and CIP-002-5.1 R1. This isn’t necessarily a problem, since I have pointed out in many posts – starting with this one – that the NERC entity needs to “roll their own” interpretation or definition, absent any fairly authoritative guidance from NERC on the matter.[i] As I mentioned above and in the previous post, NERC’s only attempt to address this issue, in the April 1 FAQ release, didn’t provide any useful new information.

I say that my interpretation of “adversely impact” contradicts the wording of the requirement. By that, I’m referring to the fact that the requirement is nominally for identification and classification of BES Cyber Systems, with the assets/Facilities only entering into the process by being what the bright-line criteria refer to (for example, see Section 3 of CIP-002-5.1, “Purpose”).

However, I contend that the only way an entity can really comply with the spirit of R1 is to think in terms of CIP v1-3, where you first identified Critical Assets and then Critical Cyber Assets that are “essential to the operation of” those Critical Assets. In fact, I literally know of no entity – and only one region – that adheres strictly to the wording of R1 in this regard. They are all first identifying assets or Facilities that meet the High or Medium criteria, then identifying BES Cyber Systems associated with them. The “interpretation” of “adversely impact” that I’m describing in this post reflects that fact. In other words, this is yet another area where the entity needs to roll their own interpretation - and in fact, all entities have already done so, but most didn't realize it.

I’m going to illustrate this with an example I’ve discussed before. I have stated on several occasions that there are cyber assets whose loss or misuse can affect the BES, but which still don’t fulfill a BROS. One example is from an SPP workshop on BCS identification in 2014. In it, there was a fictional 1500+MW plant with a Stack Emissions Monitoring System (SEMS), which provides information on what chemicals are being emitted in real time (some say that the proper acronym is CEMS, referring to Computerized Emissions Monitoring System. Since I don't address religious questions in this blog, I won't weigh in on this issue).

Let’s suppose the plant has a very stringent EPA permit that requires it to shut down within ten minutes of an environmental excursion (if the problem can’t be fixed in that amount of time). Therefore, the plant manager has made it clear to the operators that, if the SEMS shows an environmental excursion for ten minutes, they must shut the plant down. This means a hacker could take over the SEMS and make it provide false data showing an excursion, resulting in a shutdown.

Does this mean the SEMS can impact the BES? Absolutely. But does it also mean the SEMS fulfills a BROS? No, it doesn’t. Environmental monitoring isn’t a reliability function. If there were a huge excursion and everyone outside the plant got sick, this would be a big problem but it wouldn’t affect reliability. The lights would stay on.

After my post on this topic last week, I engaged in an email discussion with an auditor with whom I often exchange ideas. I pointed out to him that SEMS doesn’t perform a reliability function; he disagreed, and said that, if its misuse can result in the plant being shut down (as I’ve just described), this means it does affect reliability. Obviously, if the plant is shut down it can’t perform the BROS it normally performs, such as supporting voltage.

At the time, I didn’t know how to answer this argument, but I knew there was something wrong with it. What was wrong was that I wasn’t applying to this question the two-step process for determining whether there is adverse impact – which I’d just described the day before! Once you apply that process, the mystery clears up: The SEMS can have a severe adverse impact on the plant (question 1 above), and the plant’s being down will have an adverse impact on one or more BROS and therefore the BES (question 2). In this way, a system that doesn’t directly fulfill a BROS can still be said to “adversely impact” the BES.

Even though I recommend that entities roll their own v5 interpretations, in cases like this where there is nothing more official from NERC, I can’t say that I think NERC is off the hook. I would very much like them to acknowledge that the best way to determine whether a Cyber Asset can “adversely impact” the BES is to use this two-stage process. Of course, I’d also very much like there to be world peace and for the Cubs to win the World Series this year….enuf said (Note on Dec. 4: While the Cubs didn't win the World Series this year, they got a lot farther than anyone expected. Just goes to show that anything can happen).

You probably noticed that, besides “adverse impact”, the phrase “support systems” was in the title of this post. How does this come in? I didn’t set out to write a post – let alone a Tom’s Lesson Learned – on this topic, but Brandon Workentin from EnergySec emailed me the day after the first post to ask what seemed to be an unrelated question on support systems; I now realize this topic is very much related to adverse impact, and is in fact addressed by what I have just said above.

Brandon expressed confusion (as have many others) about systems like HVAC and UPS. These are systems that could in some cases impact the BES within 15 minutes (let’s say the heat fails in a power plant in northern Ontario in January, and it literally becomes impossible for the staff to stay at their posts; or a UPS doesn’t kick in in the event of a power failure and a control center goes dark) – should they be considered as possible BES Cyber Assets/Systems?

NERC addressed this question in the November 25 FAQ document. Unlike their response on adverse impact in the April 1 FAQ document, they did actually answer the question - they said these systems should not be considered as possible BCAs. They said that “support systems” (like HVAC and UPS) aren’t in scope for v5 (unless they’re within an ESP, in which case they’re Protected Cyber Assets). I don't disagree with this answer, but I do disagree with NERC's reasoning behind it.

The problem with this answer is, what is the definition of “support system”? A definition would allow entities not to waste time and money treating support systems as BCS. On the other hand, if there is no definition, what is to prevent entities from declaring systems like DCS and EMS as “support systems” and therefore exempting them from being BCS? I'm not saying that we now need a definition of "support systems". What NERC needs to do is stop bringing in ad hoc arguments to justify their opinions; when they do this, it opens up a potential can of worms that they clearly hadn't anticipated. This is something like the folk religions that dream up a deity for every natural phenomenon; it yields wonderful explanations, since you can always say that it's raining because the rain god was in a good mood. But what do you then do with all these deities you've invented?

But you know what, NERC? I’m going to help you out on this one, just because I’m that kind of guy. What I’ve just discussed in this Tom’s Lesson Learned explains why HVAC and UPS shouldn’t be considered as BCAs; you don’t need to introduce mythical beasts like support systems or Bigfoot. There is no denying the HVAC and UPS will have an adverse impact on the asset/Facility that they support, if they are lost or misused (i.e. my first question above). However, and unlike with the SEMS described above, it isn’t certain that a loss of HVAC or UPS will result in the asset/Facility not being able to fulfill one or more BROS (my second question). Even if the heat or A/C is lost, there might be some sort of mitigating actions that could be performed – like putting on overcoats or bathing suits, respectively – that would prevent a BES impact. With the SEMS situation, if the plant shuts down it’s down – the impact is immediate and can't be mitigated.

There’s one more system I want to discuss, since I’ve used it as an example several times previously, and since I now need to change what I’ve said about it; that is the fire suppression system in a substation. I’ve been saying all along that, even though the system doesn’t directly fulfill a BROS, it needs to be protected as a BCS since its non-availability when needed (i.e. in the event of a fire) could result in the loss of an asset/Facility with which it’s associated (in the case of a substation, it will usually be one or more high-voltage lines that are Facilities meeting one of the criteria 2.4 – 2.8).

The auditor with whom I discussed SEMS last week also pointed out that he didn’t think the fire suppression system should be a BCS, simply because there is no assurance that the loss of the system when needed will result in an impact on the BES (that is, even though there may be an impact on the asset/Facility, there’s no assurance that will translate into a BES impact). Maybe someone is working in the substation and grabs a fire extinguisher to put out the fire. Or maybe the wind is blowing in a different direction, such that the line in question is never endangered.

So I have to agree that the schema I’ve described above would remove the fire suppression system from having to be considered as a BCS.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] Of course, this isn’t the first time I’ve pointed out that coming up with a coherent interpretation of something in R1 requires doing some violence to the wording. All I can say is “ya gotta do what ya gotta do”, and refer to the noted compliance expert Lewis Carroll, from his Through the Looking Glass:

"When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean- neither more nor less."

"The question is," said Alice, "whether you can make words mean so many different things."

"The question is," said Humpty Dumpty, "which is to be master, you or the words. That's all."

Folks, with CIP-002-5.1 R1 and Attachment 1, the question isn’t what the best interpretation of the existing wording is. Rather, it’s what wording will yield a consistent and logical requirement in place of the – in many places – inconsistent and illogical wording currently in place. It’s just a question of who will be master, you or the current R1 wording. You have to make this requirement work for you, even though that requires ignoring some parts of the wording and reinterpreting other parts.

Tuesday, April 7, 2015

EnergySec’s Survey on NERC’s April FAQs

Steve Parker of EnergySec just let me know that they have just put up a survey on NERC’s April CIP v5 FAQs document (available here) that was published yesterday (I wrote about one of NERC’s answers in my post yesterday, being ever-vigilant to bring my readers the latest breaking news).

You may think it’s odd that EnergySec is surveying entities on their reactions to the different answers, when the FAQ document is itself available for public comment. Steve pointed out that public comments by NERC entities are always vetted by management and legal counsel, so they probably don’t reflect what the people in the trenches actually feel. That’s why this is a completely anonymous survey.

Of course, since the survey is anonymous it doesn’t have scientific validity, and NERC can’t be expected to automatically take action based on its results. But I know I’ll be very interested in hearing what people say in it, since I kind of doubt that anyone who doesn’t follow CIP pretty closely will bother to reply. In fact, I hope they’ll do a lot more of these surveys – on the Lessons Learned, on where entities are in their v5 compliance program, etc. The only hope of getting useful information on these topics is to gather it anonymously.

I urge you to take the survey, although you need to first read the FAQ document and ponder what it means.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

Monday, April 6, 2015

Tom’s Lessons Learned No. 1: “Adverse Impact”

NERC released a draft FAQ document on April 1. Question 33 reads “When identifying BES Cyber Systems, what is the definition of adverse impact?” This is a serious question, and one I wrote a long post on in December (not that I ever write a short post, of course). I was therefore quite interested in NERC’s response.

And what was their response? It is three sentences, but the substance is that “adverse impact” means “..a negative effect on the reliable operation of the BES..”; in other words, “an adverse impact is a negative impact”. I’m really glad to hear that, since I wasn’t sure whether “adverse” meant negative or positive. I can’t count the number of times I’ve heard a speaker say, “I want to salute Mr. X for the tremendously adverse impact he has had on our community.”

The FAQ response also refers to comments by the SDT on “adverse impact” (last full paragraph on page 60). The substance of the SDT’s comment (which is two sentences) is “..the term adequately conveys the meaning of an impact that has a negative effect on the reliable operation of the BES and therefore does not need to be added to the NERC Glossary of Terms.”

To summarize, NERC gave a non-answer that referred to a non-answer the SDT had previously given; essentially, they both said the meaning of “adverse impact” should be obvious and nothing more needs to be said on this question. Now, I realize this is just a draft and therefore may change before it’s finalized. But the fact that somebody at NERC (or perhaps one of the regions – I know they’re farming these things out now) thought they were providing some sort of service with this “answer” is pretty disheartening. The point of FAQs is not just to list frequently-asked questions, but to answer them.

Here is why the meaning of “adverse impact” is an important question that needs to be addressed: The definition of BES Cyber Asset is “A Cyber Asset that if rendered unavailable…..would, within 15 minutes….adversely impact…Facilities, systems or equipment, which, if destroyed….would affect the reliable operation of the BES.” NERC clearly thinks there can’t be any real question about the meaning of “adverse impact” (and by implication, that the person who asked the question was an idiot who doesn’t understand plain English). I beg to disagree, and since NERC doesn’t want to answer this, I will.

Before I start, I want to point out one place where I and the SDT agree: that “adverse impact” doesn’t need to be defined in the NERC Glossary. This isn’t because I agree with NERC’s assertion that the meaning is obvious – it isn’t at all. Rather, it is because answering the question of what this term means can’t be done in a one- or two-sentence definition. Rather, it requires a description of a procedure that an entity could follow to determine the meaning of “adverse impact”, in the case of a particular Cyber Asset under consideration as a BCA.

In other words, the question is much better answered in a Lesson Learned (or in the CIP-002-5 Guidance and Technical Basis, had the SDT chosen to do that) than in a FAQ, which is geared for much pithier, black-and-white answers. So here is my Lesson Learned on the meaning of “adverse impact”.

I believe the reason both the SDT and NERC punted on answering this question is that there is a myriad of “adverse impacts” that a Cyber Asset could have. These could include impact on frequency or voltage, impact on the ability of controllers to be aware of the status of the grid, impact on the ability of the grid to recover from an event, etc. It’s clearly impossible to enumerate all the possible ways that adverse impact could happen.

So the question should really be “How do I determine whether a particular Cyber Asset’s being rendered unavailable or misused will adversely impact the BES?” As I concluded in my December post on this question (which I updated in January), the best way to think about the impact a Cyber Asset has on the grid is to go back to CIP v1-3.

There, Critical Cyber Assets were those defined as “essential to the operation of” Critical Assets. While “essential to the operation of” is too limiting a definition of “adversely impact” (since I can certainly conceive of Cyber Assets whose misuse would adversely impact a Critical Asset, yet which wouldn’t be considered essential to its operation), the principle is a good one – and one that would have saved much confusion if it had been followed in the drafting of v5. In v5, the equivalent procedure is to first look at the impact of the Cyber Asset on the Medium or High impact asset or Facility it’s located at or associated with. If the Cyber Asset’s unavailability or misuse would have an adverse impact on the operation of the asset or Facility (and thus on the BES itself), then the Cyber Asset should be considered a BES Cyber Asset.[i]

A pretty obvious example is the DCS in a generating plant. If this is unavailable or misused, the plant will probably shut down; the DCS definitely meets the test of potentially having an adverse impact on the BES, because the plant impacts the BES. Now let’s look at a less obvious example, the fire suppression system in the control room of a 500kV substation. The Facility with which this system is associated is presumably the 500kV line[ii]. Most of the time, that line will clearly operate whether or not the fire suppression system is working. However, when there’s a fire, the fact that the system has been disabled will probably mean the line will be opened or even incinerated. This will obviously impact the BES.

It may occur to you to ask, “What if the Cyber Asset in question has only a small impact on the Medium or High impact asset/Facility?” While it may be true that there is a small impact on the asset/Facility, you should remember that what’s important is the impact on the BES if the Cyber Asset is unavailable or misused. If you can prove to the auditor that there is no way the unavailability or misuse of the Cyber Asset – even if it does impact the asset/Facility itself – can result in any impact to the BES, then I think you would be justified in not declaring it a BES Cyber Asset.

But - in case you were thinking it - you can’t go beyond this and say, “The asset/Facility that this Cyber Asset is associated with actually only has a small impact on the BES, and therefore this Cyber Asset isn’t a BCA.” You might have gotten away with that in CIP v3, by making sure your RBAM showed that a particular asset (say a generating station) had minimal impact on the BES; therefore, it wasn’t a Critical Asset and didn’t have CCAs. But in v5, if an asset/Facility meets one of the bright-line criteria, it is inherently considered to have an impact on the BES; you don’t get to second guess that judgment.

So this is the first draft of the first Tom’s Lessons Learned document (I realize the language would need to be made more formal were this to be an actual NERC Lessons Learned document. Fortunately, Tom’s Lessons Learned aren’t as formal as NERC’s). I will leave it out for a 30-day comment period; at that point, I’ll consider any comments and then declare it finalized. Now that I think of it, I have already written a number of posts that could well be considered Lessons Learned, including this, this, and this. A little rewriting will turn these into LLs as well (although whether I get to that task remains to be seen; maybe I’ll at least publish a list of LLs I’ve written that have been “disguised” as regular posts). But I’m sure there will be more topics I’ll identify in the coming months that need to be considered as LLs. From now on, I’ll write these as LLs.

I do want to point out that I’m not trying to compete with NERC on LLs. If they are working on an LL, I’m certainly not going to develop my own. However, as I’ve said repeatedly, there are probably hundreds of topics that require LLs that NERC doesn’t even have plans to address; these are what I will focus on.

Maybe I’ll even have a little contest with NERC, to see who can have more finalized Lessons Learned by say the end of this year. What do you say, NERC?

April 17: I just reread this post for the first time since I posted it, and was surprised to see how much my thinking had changed by the time I wrote the second post (on more or less the same topic, except I expanded it). The main difference seems to me - and I don't pretend to be an expert on Tom Alrich's posts, which I find quite long and over-written - that I moved from considering the process of determining adverse impact to be a one-step process, to considering it truly a two-step one.

I say this because, in this post, I considered that it was a foregone conclusion that a BES asset would impact the BES (if lost or degraded), so the only thing that really made a difference was whether or not the Cyber Asset could adversely impact the asset in the first place. In the second post, I opened up to the possibility that, even though the Cyber Asset does impact the asset, it doesn't impair the asset's performance of one or more BROS; therefore, the impact on the asset doesn't translate into an impact on the BES itself. This is why I did a complete 180 degree change on the question of whether the fire suppression system would be a BCA, considering it as such in this post but changing my mind (with prompting from an auditor) in the second post.

My motto has always been "Often wrong, but never in doubt."

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] And I assert this is in fact what 99% of NERC entities are doing anyway, as they identify their BES Cyber Systems. Instead of following the strict language of Attachment 1 and looking at the impact of a system on the BES itself, they’re looking at its impact on the asset/Facility it’s located at (if High impact) or associated with (if Medium). If you think about it, there’s really no other approach that makes sense.

[ii] Note it is not the substation itself. Substations – and multi-unit generating plants and control centers – don’t meet the NERC definition of Facility (or the definition of Element, which is at the heart of the Facility definition). If you don’t understand that, I refer you back to this post.