Tuesday, July 9, 2013

The Real Cost of CIP Version 4


All opinions expressed herein are mine, not necessarily those of Honeywell International, Inc.

August 12: I just posted my analysis of what FERC's order today - extending the compliance date for CIP Version 4 - means.

I recently received an email from the person in charge of CIP compliance for the generation arm of a large IOU.  This person was very frustrated by the huge costs that had been put on him, his colleagues, and his organization by the fact that CIP Version 4 was approved and looked like it would be implemented – only to be relegated to the history archives when FERC issued their NOPR in April saying they intended to approve CIP Version 5.  I must admit, I had no idea how great these costs were until my correspondence and phone discussion with him.
I will first provide – almost verbatim – this person’s eloquent email describing the problem.  I will then weigh in with what I think are the lessons to be learned from the Version 4 episode (perhaps ‘debacle’ is a better word).

The Email
“For conversation’s sake, let’s go back in time and assume that version 4 is still alive and no decision has been made for v5 to replace v4. Below is a detailed real world challenge that a utility faces. 
Last year out of nowhere, FERC approved v4; we had two years to comply. Let’s start that counter on May 1, 2012. 
First we have to perform an RBAM to determine our scope of work across Fossil. Of course there is activity in Transmission, Gen Dispatch and Distribution. Version 4 estimates alone for the company (including Nuclear NEI stuff) are over $50 million and the number can grow when assessments are complete. 
I have no control over the RBAM.[i] Transmission planning takes care of that. They take about six weeks to complete and the CIP VP signs off. The company will not let me start anything until the RBAM is fully approved in order to prevent regrettable spending. Now we are in mid-June 2012.
I have 10 plants (Critical Assets) that have to come into compliance for v4. Four of them are >1500MW, 4 are black starts, 2 are less than 1500MW.
The next step is to immediately start the assessments (walkdown, inventory and full ESP diagrams). Since this cost is un-budgeted I need emergency funding (I call it “Storm Money”). It is now late August 2012 that I have charge codes to start the assessments.
I then find out that our corporate level sponsorship mandates that we have to be complete with CIP compliance implementation 5 months before April 1, 2014, to allow 5 months of self-auditing at any site that gets full CIP-003 thru CIP-009 (lesson learned from v3 in Transmission). 2 months for data collection and a three month audit phase on all sites. My deadline is no longer April 1, 2014; it is now November 1, 2013. Once we get the assessment money in August 2012, we start an extremely aggressive assessment plan for 10 plants to be done by EOY. I had to use 4 different vendors running parallel assessments and then receive/approve all database and ESP drawings by December 10th. It was very challenging, but we pulled it off. 
Once that is done you have to develop a compliance strategy for each site. This is what we came up with: Air-Gap the black starts (they are small simple cycle CTs).[ii] Air gapping is OK here since they very seldom ever run and the need for PI data is minimal.
Plants >1500MW: We used what we call a “control system isolation” compliance strategy (our RE gave us a verbal approval on this).[iii] Even though this is not full CIP, it requires multiple projects at each plant. New secure physical networks, splitting of control system networks, installing new fuel gas control system in gas plants and air gapping them, creating new Bently Nevada networks etc. Not as big as full CIP, but a lot of work and money. As you know, you don’t just throw this stuff in. A lot of design and approval stages have to be exercised.
The <1500 MW plants get full CIP-003 thru CIP-009 since our CIP department has ruled that data diodes can't be used as a "get out of jail free" card for CIP compliance (remember CAN-0024?) and full air gap is out of the question. We need the PI data on the corporate network. That’s another long story.
Now it’s late December 2012, and we have pulled off what some thought was impossible – All assessments and compliance strategies finished in less than 4 months.
Since our company has grouped all cybersecurity funding into a single budget request (including Nuclear), this cost will by far exceed the corporate $15MM threshold that requires Risk Committee approval. I now spend about 8 to 10 weeks generating extremely detailed and accurate estimates and work scopes for every single piece of hardware, software and services, procedures, etc. that will be procured to implement CIP in Fossil.
Now we are into early March 2013 and we start the un-budgeted funding request, which takes 60 days at minimum. Remember that our 2013 budget season closed before FERC issued their NOPR in April.
By the time budget is approved, we are into May and spinning off about 30 projects at 10 different plants.[iv] They are all doable, but the full CIP-003 thru -009 implementations have to be finished in 5 months, which is almost impossible since the plants have multiple brands of control systems and we need to go platform agnostic – meaning we have to test all these agnostic tools in test beds before we install the tools on the control systems. Not to mention the procedures, work instructions, physical security systems, central monitoring, alerting etc.
So now we already have an ESP specification created to bid the ESP systems out (bid required by Supply Chain). We put it on the street to viable vendors.
While all of this is going on in May and the entire team is working 7 days per week, we find out that FERC remanded the NERC interpretation regarding six wall protection on extended ESP networks. Now this means that all of the control system network fiber that is not in conduit has to be replaced.[v] This fiber has been in the plants for years (whenever the DCS was installed), and when the fiber makes horizontal traverses within the unit it’s in cable trays most of the time and then it goes into conduit once it leaves the tray. Just in one plant alone, this is going to cost $250K and require multiple unit outages. This money wasn’t budgeted, since our budgets were made up before FERC issued their NOPR.
Now we are into late May early June and running into vendor support issues, since everyone else (other utilities) is hammering the vendors also. It looks like there is really no way we can be ready for a full self audit on November 1. We could probably make it, but quality of work would be sacrificed.  Time to re-group.
New plan – Data Diodes in the plants that need CIP-003 thru CIP-009 for version 4 compliance and then starting building a world class version 5 program behind the diodes. We would then have plants that are the most secure they can possibly be. No inbound communication and we are doing user account logging, AV/Malware, backup and restore etc. behind diodes. It doesn’t get any better than that. I am a very strong supporter of eliminating inbound connections to devices that generate megawatts or protect equipment. You wanna support me – buy a plane ticket. That’s what jets are for.
This is a very non-granular rough overview. I will spare you the numerous other challenges.
Think about all of this frantic activity and then FERC issues their NOPR in April. It looks like we won’t have to be compliant (with Version 5) until 2015 and we have just spent millions of dollars to stay on schedule. If we would have known up front, the cost would have been much less and we could have spread the expenditure across multiple fiscal years, which helps the bottom line in an industry that is suffering financially more than ever before. If you don’t know those details, then you will in the near future. Multiple utilities will be laying off thousands of employees this year and next year to be able to pay for new compliance and replacing the aging fleet.
(He added this postscript later)
My example of the extended ESP networks that traverse all over a plant is somewhat representative of many other complicated factors that I left out of my example. It would take 10 pages to convey all of the facts.  Using the $250K for the ESP fiber is a good example, though. Even though $250K is a considerable sum of money, it is irrelevant compared to this:
We did not know this cost would be required when providing the accountants with our estimates and scheduling the work. We were in mid-stream creating new networks for CIP when we got the word that this expenditure didn’t make the budget for this year. Now the $250K is un-budgeted and greatly affects our schedule. We can pull and terminate the fiber with units running, but we need outages to switch over to the new fiber on control systems and have to ask for un-budgeted funding, which makes the value much more than $250K since schedule and budget are both affected. 
It's like this – the more proactive and eager we are to comply, we pay a higher price for it every time NERC or FERC changes course. Our management starts to become apprehensive about the regrettable spending and then they start waiting until the last minute to release any additional funding (and I don't blame them) - which then puts people like me with unobtainable goals for the next fictitious version.
You don't even want to hear about the procedures issues because I would have to stab myself in the neck to tell that story. We hired a full team of procedures writers for Fossil Plant v4 procedures and then laid them all off 6 weeks later. There are a lot of unnecessary hidden costs in that situation also.”

Tom’s Opinionated Comments
The upshot of the above email is this entity spent many millions of dollars in an accelerated effort to  become compliant with CIP Version 4 on April 1, 2014.  While most of that will still be applicable to Version 5, there was a lot of money wasted because of the hurry-up nature of the project.  There was also a huge human cost, both on the existing compliance team and on others who were hired for V4 compliance then laid off after the NOPR showed they weren’t needed. 
A longer-term effect is that management now is very wary of spending anything for compliance going forward until there is absolute certainty it will be required.  And it is hard to blame them: there has  been no absolute certainty with regard to the direction of NERC CIP for about four years, and there won’t be any until sometime in 2014, if then.
Why did this person contact me about this?  And why do I find this such an important topic?  Because this story could probably be repeated across many NERC entities.  Nobody will ever know the total cost of the CIP Version 4 debacle, but it was obviously huge.  I think it’s important to try to identify the mistakes that were made that caused this to happen, so that the entities responsible for those mistakes – primarily FERC and NERC – will be careful to avoid them in the future.
What follows is my highly impressionistic take on how I think all of this unfolded.  Since most of the key decisions were made behind closed doors and never documented, there won’t be any good way to verify some of what I say here.  I was a fairly close observer on things CIP during this entire time, but I would welcome any comments or corrections from others, especially those who were actual participants.  As always, you can email me at tom.alrich@honeywell.com if you want me to publish your comments without attribution.
  1. After CIP Version 3 was approved in 2009, the CSO706 Standards Drafting Team turned their attention to Version 4.  V4 was intended to be the version that would address all of the remaining issues raised by FERC in Order 706 (which approved CIP Version 1 in January 2008).  The team settled on a radically different approach for V4.  There would be just two standards: CIP-010-1[vi] would be for identification of assets and cyber assets in scope (i.e. it would be the equivalent of CIP-002 in Versions 1-3), and CIP-011-1 would encompass everything that had to be done to those assets (i.e. the equivalent of CIP-003 through CIP-009). 
  2. The SDT held a very well-attended workshop in Dallas in May 2010 to discuss the new draft standards.  Their hope was that questions could be quickly addressed, so that Version 4 could be balloted and approved by July 2010.  At the workshop, there was lots of opposition to many areas in Version 4, a lot of it simply due to the novelty of many of the concepts.  It was clear that getting Version 4 approved would be a long slog, with probably multiple drafts and ballots required.
  3. At this point, the idea somehow came up within NERC that a new CIP version needed to come out in 2010, no matter what.[vii]  This was due to the perception that there was strong sentiment at FERC and in Congress that a new version was needed right away; so NERC couldn’t afford to wait the year or two that it would take to develop the radically new version that everyone knew was really required (in hindsight, this perception on NERC's part was probably wrong).
  4. The main reason that Congress was so upset about CIP was the fact that (in Congress’ opinion) very few assets – other than control centers – had been designated as Critical Assets, due to the fact that Versions 1-3 allowed the entity to develop its own Risk-Based Assessment Methodology (RBAM) for identifying them.  CIP-010 was going to address this issue with a set of “bright line” criteria (BLC) that would force NERC entities to designate certain assets as critical if they met the criteria (e.g. power plants over a certain MW threshold, although it was more than 1500MW at the time).  The thinking went: Why don’t we just change CIP-002-3 to replace the RBAM with the BLC, and leave the other standards (CIP-003 through -009) exactly the same as they were in Version 3?  This would be much easier to get approved, and might well result in a new CIP version sent to FERC in 2010.  Once this happened, the SDT could then turn their attention to the “real” new CIP version, Version 5.
  5. The rest of 2010 was thus spent drafting and balloting Version 4.  It was approved by the NERC Board of Trustees at the end of December and submitted to FERC for their approval in February 2011.  The SDT turned their attention to Version 5.
  6. In 2011, the SDT started making good progress on Version 5.  In fact, they became optimistic that they would get it right on the first draft – it would be balloted by the end of the year, and hopefully approved and sent to NERC in early 2012.  The question then arose: Why should we even bother with V4?  Let’s just go to Version 5!
  7. However, FERC surprised the industry by issuing a NOPR in September 2011, saying they intended to approve Version 4.  Since the optimism on CIP Version 5 was fairly high at that point, I (and others) thought that FERC was just bluffing.  I thought[viii] they were essentially using Version 4 as a club to hold over NERC’s head: “Either you approve Version 5 quickly, or we will make you comply with Version 4 and then Version 5.”
  8. But it turns out FERC wasn’t bluffing.  They actually approved Version 4 in Order 761 in April 2012; the compliance date would be April 1, 2014.  I (and others) quickly changed my tune: Since CIP Version 4 was now the law of the land, and since Version 5 was struggling a lot on the road to approval,[ix] NERC entities now needed to concentrate on getting ready for Version 4 compliance.  And entities like the one above started to do that, leading to the debacle so eloquently described.
Up until FERC approved V4 in April 2012, what mistakes were made?  I would say the biggest mistake up to that point was NERC’s, in panicking in 2010 with the idea that a new CIP version just had to come out that year.  But because of FERC's approval of Version 4, I now think FERC made the bigger mistake.  

I say this because I don’t think they were really serious about Version 4 when they issued Order 761 in April 2012.  They could see Version 5 was struggling for NERC acceptance, and they were afraid that NERC had gone back to thinking V4 would never be approved (as irresponsible bloggers like myself were saying in late 2011 and early 2012).  In FERC’s view, NERC entities were squabbling and nit-picking Version 5, now that they thought they didn’t have to worry about V4.  So Order 761 (approving V4) was a real blow to NERC’s noggin, saying “We’re very serious about this.  Get to work now on approving V5 so you can avoid V4”.  In fact, in Order 761 FERC set a deadline of March 31, 2013 to receive the NERC-approved Version 5.
The reason this could still be called a warning blow was because, if NERC got their act together and approved Version 5 on time, then Version 4 would never come into effect (since the V5 implementation plan just required that V5 be approved by FERC by April 1, 2014 for this to happen).  I think FERC felt safe in believing there would be little if any cost incurred by entities in preparing for V4 compliance, as long as the V5 approval process moved along in 2012.
The V5 approval process did move along, but – as you can see above – many entities felt they simply couldn’t take the chance that V4 would not come into effect on 4/1/2014.  And they spent a lot of money and effort rushing for compliance.[x]  Meanwhile, a huge pressure built up on FERC to make clear their intentions regarding Version 5; they did that in the NOPR issued this past April.  But there had been an entire year during which the only official word from FERC was that Version 4 was coming into effect in 2014; NERC entities had to take the most prudent course and prepare for Version 4.
To summarize, the costliest mistake regarding CIP Version 4 (meaning costliest for NERC entities) was FERC’s approval of V4 in April 2012.  I believe they issued Order 761 mainly to prod along NERC’s approval of CIP Version 5, not because they really intended for it to come into effect.  But they didn’t consider that the industry wouldn’t necessarily recognize this bluff, and in any case couldn’t afford to take the chance that it was a bluff.  We’ve just seen one example of the damage that caused.

P.S. Be sure to sign up for Honeywell’s upcoming webinar with EnergySec, “Covering your Assets in CIP Version 5”.  You can sign up for it here.  The webinar is on August 21st 10:30CDT.  If you can’t make the webinar but want to see the video, sign up anyway.  You’ll get the link to the video as soon as it is posted after the webinar.

P.P.S. (July 11) After I published this post, an Interested Party emailed me with two sets of questions for the entity who wrote most of this post.  I facilitated a correspondence between them (without revealing either's identity to the other).  The result was quite interesting, so I decided to make this a separate post for everyone to see.  You can find that post here.

P.P.P.S (July 22) Another exchange between my email correspondent and an interested party (different one) took place on LinkedIn recently.  I have posted that exchange here.









[i] (Tom speaking here) The use of “RBAM” here isn’t completely correct, since CIP Version 4 didn't require developing a Risk Based Assessment Methodology, as did Versions 1-3.   In theory, the bright-line criteria in Version 4 are supposed to be so easy to apply that a third-grader could sit down for half an hour and identify all of the entity’s Critical Assets.  In practice, that was not the case, which is why the Transmission Planning department of this entity spent six weeks applying those criteria.  I believe the same will be the case for Version 5, and there will need to be guidance provided on actually applying the V5 criteria (CIP-002-5 Attachment 1).

[ii] Of course, air-gapping plants allowed the entity to claim no Critical Cyber Assets under CIP Versions 1-4.

[iii] The author of the email is referring to the provision in Version 4 that exempts cyber assets at >1500MW plants from being CCAs, if they don’t individually affect more than 1500MW.  This provision lives on in Version 5 as Criterion 2.1 in CIP-002-5 Attachment 1.

[iv] I asked this person why they continued full speed on their V4 compliance projects, when FERC had made it clear in their NOPR that they didn’t intend to let V4 come into effect.  He pointed out that this statement couldn’t be relied on 100%, and the consequences would be disastrous if they were caught completely unprepared and V4 did come into effect next April.  It’s hard to argue with that, although see my post on the V5 transition for more perspective on this question.

[v] Since the requirement for a six-wall border goes away in Version 5, one could say that this work should have been halted when the NOPR was issued in April.  However, there are two considerations: 1) FERC may still require some specific cabling protection in the next version, and 2) this entity had already decided they couldn’t afford to take the chance that Version 4 would not become enforceable on April 1, 2014.

[vi] The suffix was “-1” since this was the first CIP-010 standard, but this was still called CIP Version 4 since it was the fourth version of CIP.  Of course, this early CIP-010 (and CIP-011) never was approved by NERC, and now the CIP-010 and CIP-011 in Version 5 also have the “-1” extension.

[vii] I was told that a group of Midwestern IOU’s reached this conclusion and drove the decision to push for an immediate Version 4, but it doesn’t really matter. 

[viii] I wasn’t blogging at the time, but I put out an open letter that you can still download here.  If you don’t want to authenticate, you can email me at tom.alrich@honeywell.com and I’ll send it to you.

[ix] Version 5 was overwhelmingly rejected on the first ballot in December 2011, and a little less overwhelmingly rejected on the second ballot in May 2012.  The third draft was approved on the third ballot in October, 2012.

[x] Some still may be preparing for V4 compliance on 4/1/2014, including the entity discussed above.  However, I think such entities should focus on compliance tasks that will be equally applicable to Version 5 as to Version 4, to avoid stranded costs.  I have suggested such tasks in this post.

2 comments:

  1. Some of you may have noticed some ugly HTML code in place of the footnote numbers; I'm guessing you may be IE8 users. Blogspot made some sort of software changes over the July 4 weekend, and IE8 no longer worked in various ways.

    In fact, I couldn't do this post in IE8 (and it's no longer supported), so I switched to Google Chrome (Google owns Blogspot, by the way). I think the fact that I made the post in Chrome led to the HTML code being visible in IE8 (you'll notice it isn't on previous posts).

    The upshot is that you should go to Chrome (or Firefox, or IE9, although if anyone has problems with those browsers I'd like to hear about it). Face it, Google is going to own all of us and everything we do sooner or later. You just have to accept that.

    ReplyDelete
  2. An interested party emailed me with this comment (actually, a series of rhetorical questions):

    "Are entities having to implement hurry-up band-aid solutions because of the time constraints? Are we substituting compliance for operability (as in air gapping) because we do not have time to do something more carefully thought out and appropriate? ) Anyone communicating this concern to FERC, who wants to shorten the V5 compliance window? What impact on reliability will we suffer as everyone takes unplanned outages to cut over to the CIP-compliant infrastructure?"

    As for communicating this concern to FERC, consider it done (since I know some FERC staff read this blog). I agree they should be very careful about shortening the Version 5 compliance time frame, as they hinted they might do in the V5 NOPR.

    ReplyDelete