Tom Alrich's Blog

Thursday, December 11, 2014

Wrapping Up Serial

I’ve now done four posts (starting with this one) on the question whether substation devices like relays that are connected serially to an intermediate device, that itself is routably connected to a control center or some other location, can be said to participate in External Routable Connectivity (ERC). They have all been interesting posts to write, and have attracted a good amount of attention.

Of course, I don’t have a definitive answer to this question, nor did I ever say I did. In fact, the first post was really number five in my series of posts on “Roll Your Own”, in which I discuss how NERC entities, faced with the many missing definitions and interpretations in CIP v5, need to develop their own definitions and interpretations – faute de mieux as the French say. But I did articulate what I thought might be the best approach; or at least I thought I did.

However, two events today led me to believe that I need to do another post to clarify – for ages to come – what my position on this question is. One was that I had an email conversation in which it was apparent that I hadn’t made my opinion clear – and that’s not surprising, since I’d inexplicably spread my opinion over two posts, and had myself gone back and forth on one issue.

The other event was the webinar today by the CIP V5 Revisions SDT (which should be called the CIP v6/v7 SDT, of course – in fact, they implicitly admitted today that it would have been better just to bite the bullet and say they were drafting CIP v6 to begin with, rather than pretend it wasn’t, which has now led to there being three versions to comply with simultaneously). This webinar ended up shedding some light on this issue, even though it technically has nothing to do with the new CIP versions.

So what is my opinion? In the first post, I brought up the Critical Cyber Asset Identification guidelines published in 2010. Until very recently, I interpreted the discussion of this issue in that document to mean that the relays in question would definitely have ERC. However, I said a customer had recently convinced me that this view was too simplistic, that what was important was whether the relay actually communicated over a routable connection with devices outside the substation. And sure enough, the CCA guidelines actually say that ERC is only present “if a routable connection is used to communicate outside the preliminary ESP.”

But I didn’t really explain why I thought this was the right approach until the second post, where I said “One distinction that seems to be important is whether the intermediate device simply translates the incoming routable messages into a serial protocol for the end device, or whether the intermediate device does something like polling of the end device, meaning that incoming routable messages aren’t passed on in any way. A terminal server might be an example of the first device, while an RTU might be an example of the second.”

In that post, I then got into a diversion where I thought the fact that it could be possible to hack the RTU, then attack the relay through that, meant that really the relay did participate in ERC after all. But that opinion was demolished this week by Steve Parker, who said in the fourth post that this was really a nonsensical idea. And he was right.[i]

However, this diversion has nothing to do with my original opinion: While I’m not a CIP Doctor and don’t play one on TV, I do believe that the approach I described originally is good. And that was confirmed by the second event today, the SDT webinar.

A good part of that webinar was devoted to the new requirement for Low impact assets, CIP-003-7 R2, and the related new terms LERC and LEAP (and if you haven’t been introduced to these two gentlemen and don’t really know much about the new requirement, I suggest you go to the SDT’s web site). The best part was when Jay Cribb of Southern Companies, a key member of the CIP v5 SDT (where he was in charge of CIP-005 and perhaps more) as well as the current SDT, spent some time going over the excellent illustrative diagrams found in the Guidelines and Technical Basis section of the latest draft of CIP-003-7.

It was his discussion of the diagram on page 36 that I found particularly interesting. The diagram shows a device that connects serially to a protocol converter that converts the routable external communications to serial; it makes the point that, because of this, the device is “directly addressable from outside the asset” and therefore participates in ERC[ii]. So this criterion – whether the device is addressable from outside - might be as important a criterion as the one that I cite: whether the device communicates outside of the asset (substation). In practice, they may well amount to the same thing; answering that question is above my pay grade.

The moral of this story? Given that you may have decided you can’t wait any longer for NERC to address this question and you need to roll your own answer, you could do a lot worse than to adopt one or both of these two criteria as determinants of ERC – but whatever you end up doing, be sure to document it!

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] Steve evidently has legions of fans, since they’ve been beating down the doors of my blog to get to this post, despite the fact that I hadn’t publicized it at all. I’m now planning a new series of Steve Parker posts: The Steve Parker Diet, Steve Parker’s Favorite Christmas Recipes, Steve Parker Discusses Parenting, Steve Parker Decries Congressional Dysfunction….and more. I know a good thing when I see it. Stay tuned.

[ii] Actually LERC, the Low impact version of ERC.

Tuesday, December 9, 2014

Steve Parker Corrects Me

I’m fortunate to have friends who read my posts and immediately point out where I’ve gone wrong. I certainly welcome this, since I’ve had my share of being wrong and I don’t like these things to stand uncorrected (although I do always leave the original post as is, and simply post the correction. The alternative approach – simply wiping away any traces of ever being wrong – reminds me of the Winston Smith's job as the novel 1984 opens: It is to excise all references in the newspaper archives to political figures who have been liquidated due to running afoul of Big Brother. They have become “unpersons”, one of the many chilling words the regime has invented to give a feeling of legitimacy and normalcy to its dirty work. Of course, we all know the truth is sacrosanct to our governments nowadays, so this doesn’t happen in real life - oh wait, there's something coming on the TV about a new Senate report...).

Steve Parker of EnergySec pointed out that the central assertion of this recent post on serial is simply wrong. The post was discussing devices (like relays) that are connected using a serial protocol like Modbus to an intermediate device like an RTU in a substation; this intermediate device is then connected routably to the outside world. The question was whether the relays have External Routable Connectivity (ERC), or just the intermediate device does.

I had pointed out in the previous post that an entity asserting no ERC in a case like this might have to demonstrate that there weren’t documented ways that an attack could be mounted on the relay, running through the RTU. That had resulted in someone writing in to tell me of vulnerabilities that allowed exactly that sort of attack to come through a particular manufacturer’s RTU.[i] I stated that the availability of this attack vector probably meant that the relay should be considered as participating in ERC.

However, Steve set me straight. He said, “That is not a relevant argument. If we go down the road of device A being hacked exposing device B, the whole model blows up since anything and everything can potentially be hacked to provide access to something else. The evaluation (Tom’s note: He means the evaluation of whether the relay has ERC or not) is based on the potential impact of individual devices/systems being misused. Such impact cannot depend on the misuse of additional devices. In other words, the impact cannot be transitive or cumulative.”

And now that I think about it, he’s definitely right. I should never have suggested that the fact that one device can be attacked through another makes the former subject to the same vulnerabilities as the latter. If this were the case, then every computer in the world would have to be considered just as vulnerable as the least-protected computer sitting on the Internet. And that’s not right, IS IT?

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I immediately notified the manufacturer of the vulnerability, through another organization that I knew had a close relationship with them.

Sunday, December 7, 2014

Roll Your Own Part VI: “Affect the BES”

This is yet another in a series of posts on questions of the meaning of particular wording in CIP version 6.3940 – questions for which there is not now and perhaps never will be any definitive guidance from NERC or the NERC regions. In other words, this is just another case (number 6 of what I project to be about 5,246 cases) in which entities need to “roll their own” interpretation. For a full description of the implications of what it means to roll your own, I refer you to the first three posts in this series, starting with this one [i].

I recently prepared an up-to-date summary of all the steps required to identify and classify BES Cyber Systems for a customer, and then called him to review it. As I’m sure you know by now, one of the first required steps is to identify Cyber Assets[ii] that meet the definition of BES Cyber Asset (although the need to do this is never explicitly stated in CIP-002-5.1 R1 or in Attachment 1. This is because the requirement was evidently written by a haiku master for whom compression of meaning into very few words was the primary criterion for a good requirement. CIP-002-5.1 R1 may win a Pulitzer Prize for Poetry, but it won’t win a prize for regulatory clarity, if there is such a thing):

A Cyber Asset that if rendered unavailable, degraded, or misused would, within 15 minutes of its required operation, misoperation, or non-operation, adversely impact one or more Facilities, systems, or equipment, which, if destroyed, degraded, or otherwise rendered unavailable when needed, would affect the reliable operation of the Bulk Electric System.[iii]

As we got to this step, my customer asked, what does “affect the reliable operation of the BES” mean? And to be honest, I didn’t have an answer – except to say there’s no official guidance that I know of on this question, at least for BES Cyber Assets.

However, there is guidance on this point for BES Cyber Systems. This guidance is found in the long discussion on pages 17 - 22 in the Guidance and Technical Basis section of CIP-002-5.1, of the BES Reliability Operating Services. It is a good discussion, and while there are still many questions to be answered about how the BROS are applied in the real world, using the BROS can certainly be helpful in identifying BES Cyber Systems.

Because the BROS analysis really is meant to apply on the system level, it should be applied as part of the "top-down" approach to BCS identification. You can read about that approach in this post on my overall "methodology" for CIP-002-5.1 R1 compliance, or in this post which specifically compares it to the "bottom-up" approach.

When I originally wrote this post (the one you're reading. I'm updating it on Jan. 27), I thought that all entities should apply both the top-down and bottom-up approaches to identify BES Cyber Systems. Because I thought that, and because the BROS are much more suited to the top-down approach, I said there was no point in entities re-doing the BROS analysis as part of the process where they identify Cyber Assets that meet the BCA definition - i.e. the bottom-up approach; they would presumably have already used the BROS in their top-down analysis (which should always come first). So in my original version of this post, I didn't recommend the BROS be used to help determine what "affect the BES" means for particular Cyber Assets.

However, I have now changed my opinion that all entities should perform both the top-down and bottom-up analyses to identify BCS. I do still recommend this for control centers and generating stations (except plants that meet Criterion 2.1, which need a different methodology altogether - one I hope to write about one of these days). But for substations, I no longer think the top-down approach provides any value; the bottom-up by itself is sufficient.

Given this change, my previous argument that entities shouldn't use the BROS in identifying Cyber Assets that meet the BES Cyber Asset definition is no longer valid; I do think it is helpful to do so, although I don't think you should only consider the BROS when identifying Cyber Assets that affect the BES. Specifically, I think the entity should a) read through the discussion on pages 17 - 22 in the Guidelines and Technical Basis section of CIP-002-5.1; b) identify the BROS that apply to the substation being examined, or specifically to the Facility within that substation that the Cyber Asset (potential BCA) is associated with; and c) determine whether the Cyber Asset actually contributes to fulfilling one or more BROS. If it does, and if it its loss, etc. would result in a BES impact in less than 15 minutes, then the Cyber Asset is a BCA.

However, the BROS don't tell the whole story when it comes to identifying Cyber Assets that have a sub-15-minute impact on the BES. Here are two examples of Cyber Assets whose loss, misuse, etc. can definitely affect the BES within 15 minutes, yet which would not be identified as BCAs at all if only the BROS are used in the bottom-up approach. In SPP’s excellent BES Cyber System Identification workshop (which I attended in March in Dallas, and which they re-ran in June), they discussed an environmental monitoring computer (CEMS) in a large plant subject to criterion 2.1. They hypothesized there was a rule at the plant saying that the plant had to be tripped offline if there were an environmental excursion that lasted more than ten minutes. So someone wanting to bring the plant down would only have to hack into the CEMS computer and display false data indicating there had been an excursion; the plant would immediately be tripped. This computer clearly has an impact on the BES within 15 minutes, but it doesn’t fulfill a BROS at all; environmental compliance isn’t a reliability function (essential though it may be for the owner of the plant).

Here’s another example in a substation. Many substations have fire suppression systems protecting their control room; there is usually a computer controlling this system. If say a disgruntled employee (or an outside hacker) uses the computer to disable the fire suppression system, it won’t be able to control a fire when it breaks out – meaning the lines, transformers, etc. controlled by the relays in the control room may be brought down in the event of a fire there (of course, here is where the words “when needed” in the BCA definition come into play. The hack of the system doesn’t produce its effect until a fire breaks out and it’s not suppressed. But the interval of time between when the fire breaks out and isn’t suppressed and the time when the BES is impacted will most likely be less than 15 minutes, so the computer is a BCA). As with the CEMS system above, fire suppression isn’t a reliability service and this computer wouldn’t be identified as a BCA if only the BROS were used to define “affect the reliable operation of the BES”.

I’ve just said you shouldn’t base your BCA identification on the BROS. Here’s another thing you shouldn’t base it on: network connectivity (and I want to thank Steve Parker of EnergySec for pointing this out to me). I have heard it said that some people – perhaps even in NERC and the regions – are pointing to a Cyber Asset’s connectivity as a measure of its impact on the BES (for some Cyber Assets). The reasoning they use is probably something like, “This Cyber Asset doesn’t itself have a direct impact on the BES, but it is routably connected to a number of other Cyber Assets. Collectively, loss of most or all of these Cyber Assets, through say a DOS attack launched from any one of them, would affect the BES within 15 minutes. Therefore the Cyber Asset should be a BCA and part of a BCS, and the rest of the Cyber Assets on the network should be Protected Cyber Assets.”

This is of course a perfectly good argument for securing this network. The network should be deemed a “critical network”, and the devices on the network should be protected to a higher degree than devices on a different network that isn’t “critical”. Moreover, I’ll stipulate that the SDT should have included some provision in CIP v5 so that an entire network would have to be declared a BCS, if it could have this collective impact. However, they didn’t do this; CIP-002-5.1 R1 says nothing about “critical networks”, only BES Cyber Assets and BES Cyber Systems. So don’t let anyone tell you that network connectivity has anything to do with whether a Cyber Asset is a BCA or not – it doesn’t.[vi]

There is another way you can define "affect the BES". To describe this, I'd like to go back to the days of yesteryear – 2008, when dinosaurs still roamed the earth and CIP v1 was being implemented. My readers may have heard their tribal elders talk about how Critical Cyber Assets were identified in v1-v4. It was much simpler then. You first identified “big iron” – your Critical Assets (substations, generating plants, etc). Then you identified Critical Cyber Assets that were “essential to the operation of” these Critical Assets. End of story.[vii]

I suggest that a similar methodology be used to identify BES Cyber Assets – i.e. to implement the bottom-up approach. You need to look at the asset/Facility with which the Cyber Asset you’re evaluating is associated; this asset or Facility will be the “subject” of the bright-line criterion that has made you consider this Cyber Asset as a BCA in the first place (since you only have to identify BCAs/BCS for assets/Facilities that meet one or more of the High or Medium criteria in the first place).[viii] Then you need to identify Cyber Assets that are “essential to the operation of” the asset/Facility in question (and you need also to consider the effects of misuse, availability “when needed”, etc. – since the BCA definition includes more than the CCA “definition” did).[ix]

Of course, there is still the question of what “essential” means. There isn’t one definition that applies to all types of assets/Facilities; rather it has to be defined specifically for each type. Fortunately, many NERC entities already have a lot of experience with this question, since it was the foundation of their v1-v3 compliance efforts. And if you haven’t had to comply with CIP before v5, there are organizations like mine that can help you do this.

To summarize this post:

Every entity needs to develop and document a methodology for identifying BES Cyber Systems, then apply it in their environment. There are two main approaches in this methodology: "top-down" and "bottom-up".
The bottom-up approach to identifying BES Cyber Systems (which I think should be used by all entities, except for a generating station meeting Criterion 2.1. For substations, bottom-up is the only approach needed; for control centers and non-2.1 generating plants, a combination of both approaches is needed) requires the entity to develop their own "definition" of what the words "affect the BES" mean in the NERC definition of BES Cyber Asset.
I haven't provided a definition for you, but I have discussed a methodology that I think is more than adequate. It involves two steps: a) Using the BROS and b) considering whether a Cyber Asset is "essential to the operation of" the asset or Facility it's associated with.
Step 3a) only makes sense for assets that haven't already applied the top-down approach - that is, for substations. Step 3b) makes sense for all assets.
None of what I say here is actually mandated by the requirements. What is mandated is that you have some methodology for identifying and classifying BES Cyber Systems. I'm suggesting the material in this post can form part of that overall methodology. To see my suggestions for the full methodology, see this post.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] I’m saying you should read the first three posts because they all provide some general guidance on how to roll your own definitions and interpretations. Of course, the others are worth reading as well, but mainly because of the specific issues they address, not as a discussion of rolling your own in general. It would be nice to include all of these in a big, comprehensive book on CIP v5 (I’ve certainly written enough blog pages on v5 to fill a book or two), but I see no way to do that for probably a year or more. I’m discovering all of the stuff I’m writing about in the blog in real time, and I don’t expect that situation to change very soon; there’s no way I could write a book now without having to update it every month or so.

[ii] Of course, identifying Cyber Assets – i.e. “programmable electronic devices” – is really the first step in the BCS identification and classification process. But a previous step to that is rolling your own definition of “programmable”, as discussed in the first post in this series.

[iii] Of course, there’s more to the definition than this. But this is the operational part.

[iv] I’ll give you notice here that this session with the customer led me to a much broader realization: It is simply not possible to come up with any single methodology for complying with CIP-002-5.1 R1, even if you leave spaces for the various “roll your own” issues. Therefore, I believe this requirement is simply not auditable, in the sense that PV’s can be given for failure to follow the “proper” methodology. The best an auditor can do is determine whether the entity has made a good faith effort to gather all the available information and come up with a methodology that takes as much of that into account as possible. If they have, there can’t be any PVs, since no penalty for violating this requirement would ever be upheld in a court of law if it were challenged; there are simply way too many holes and inconsistencies. As you can imagine, I’ll have more to say on this in the near future.

[v] In fact, the bottom-up approach is the one that is “required” in R1 although, as in the case of BCA identification, the need to do this is never explicitly stated in the requirement. It’s another example of a point I plan to make in the near future – that R1 should be read more as a poem than a regulatory requirement. I’m not kidding about this, either.

[vi] I did also hear that someone had been told by an auditor that the fact that a Cyber Asset had certain known vulnerabilities meant that it was a bigger threat to the BES and should be considered a BES Cyber Asset. Again, the threat posed by a device has nothing to do with whether it’s a BCA. It should of course be protected (and if it’s networked with at least one other BCS, it will be a PCA and will therefore have to be protected in the same way as BCS).

[vii] I have to admit that there was a great oversimplification for substations in the v1-v3 approach, since it considered the entire substation to be the Critical Asset, not the individual Facilities in it – lines, transformers, etc. This was based on an improper analogy to generating stations and control centers. The latter two assets both have a defined function: generating power and controlling the grid, respectively. But a substation is simply a collection of equipment with a fence around it. It’s the Facilities (500kV lines, etc) at the substation that should be judged critical or not, not the entire substation. In other words, when it comes to substations in v1-v3, CCA’s should have been defined as “essential to the operation of” the Facilities located at the substation, not the substation itself. This seems to have been done when the bright line criteria were introduced in v4, although since v4 was never implemented these questions were never really brought out – at least I certainly never realized there was a difference between a Transmission Facility and a substation when v4 was being discussed.

[viii] Since some of the Medium criteria refer to Facilities, not assets, you can’t just say the criteria refer to assets – as I used to do (and as it seems 95% of people in the NERC community - including most of the NERC and Regional Entity people who should know better - say). So when asked what the criteria refer to, I have been saying they refer to their subjects, which can all be described as either assets or Facilities - with the further complication that, in some cases, the word “Facilities” isn’t used in a criterion even though the subject is a Facility (e.g. in criteria 2.2, 2.9 and 2.10). However, a customer pointed out to me that the criteria aren’t written as complete sentences, so technically they don’t have a subject. He was right, of course, but I’m simply going to pretend I didn’t hear that. I’ll still refer to “the subjects of the criteria”.

[ix] You’ll note I’m substituting “impact on the asset/Facility” for “impact on the BES” here. I have always believed that the idea that Cyber Assets have a direct impact on the BES itself is a real stretch (I argued that in this post in 2011, as well as in person and by email with SDT members that year) – and whether or not this is possible, it is certainly nothing the entity can determine, except in a few cases. What you can determine is the impact of the Cyber Asset (potentially a BCA) on the asset/Facility it’s part of, just like you did in v1-v3. Once you know whether the Cyber Asset impacts the asset or Facility, you’ll know whether or not it impacts the BES. This is so because the bright-line criteria determine whether an asset/Facility has a High, Medium or Low impact for you; you don’t have to determine this on your own, as you did in v1-3 with the RBAM (I again thank Steve Parker for pointing this out). That’s why I’m saying here that you should look at the impact of the Cyber Asset on the asset/Facility it’s associated with, not on the BES directly.

Wednesday, December 3, 2014

Sean McBride Weighs in on Serial

I was pleased to receive an email from Sean McBride, co-Founder of Critical Intelligence, about my most recent post on serially-connected BES Cyber Systems in Medium impact substations (this post was a follow-up to the first post, and his comments apply equally to that one). I’ve always had tremendous respect for Sean, who has really done great service to the electric power industry through his company. I recommend you spend a couple minutes on his website to see if his services would benefit your company, by keeping you abreast of new vulnerabilities affecting the particular control systems in your environment.

There was an added “bonus” in his email: He mentioned that he is a blogger, too, and provided this link. I went to his blog, and was really impressed by the very interesting (and so far unknown to me) information he presented, as well as his very fluid writing style. I also really liked the brevity of his posts - which are very different from those of a certain CIP v5 blogger, who writes seemingly interminable pieces that require the better part of a day to get through - and read about 6 or 7 of them in my first shot. I’m definitely putting Sean’s blog on my bookmarks bar and will go to it regularly.

We sent about four emails back and forth, but here is the principal point Sean made: Probably the biggest “vulnerability” for serially-connected devices in substations is not that an attacker can hack the device coming in through an RTU that communicates routably with the outside world. Rather, it is just the reverse: that someone knowledgeable will physically break into the substation, hack into the serial device (e.g. a relay) and use the DNP3 vulnerabilities [i] demonstrated in the last couple of years by Adam Crain and Chris Sistrunk to attack the EMS system itself.

I agreed with Sean that this is a real concern from a cyber security point of view. But I also pointed out that I don’t see this as something that CIP directly addresses, other than by the fact that the substation needs to be physically secured under CIP-006. There is nothing in CIP (any version) that says that communications protocols need to be secured, especially for communications between the substation and the EMS.[ii]

While Sean understood my point, he did add "It is super important that electric system operators understand that if an adversary gets access to an insecure-by-design level 1 protocol (DNP3, Modbus, etc -- serial or TCP), then it is ‘game over’ -- the attacker can do what he wants.” And there you have it: Whether or not CIP requires it, all Transmission Owners/Operators should look into securing the serial protocols they use to communicate with substation devices, both from the EMS and within the substation itself. And this applies to more than just DNP3 – also to other serial protocols like Modbus.

Before I finish, I do want to bring up another point that Sean made. This had to do with my third footnote in the second post on serial. That note referred to the final sentence of that post, which read “The moral of this story is that, if you’re going to claim that a serial device – set up with an intermediate device as discussed above – doesn’t participate in ERC, you need to convince the auditor that it’s very unlikely the device could actually be attacked.” My footnote said “You might do this by demonstrating that there haven’t been any such vulnerabilities identified by ICS-CERT or similar organizations.

Sean’s comment on this was similar to his comment on the other issue: “…any device speaking an insecure by design protocol (e.g. DNP3, Modbus) is essentially open for abuse.” I take this to mean that simply demonstrating to the auditor that there aren’t any publicly exposed attacks against a particular type of device at this time doesn’t also demonstrate that there never could be an attack – especially when a protocol has been shown to be insecure already.

So now I’ve contradicted myself in just a few paragraphs (I believe this is a new record for me – usually I wait until at least the next post to contradict myself). In the first part of this post, I told you I didn’t think you had to address the DNP3 vulnerabilities in order to comply with CIP; in the second part, I implied you might need to demonstrate to the CIP auditor that you have taken steps to address DNP3 vulnerabilities. To quote Emerson, “…consistency is the hobgoblin of little minds.” I certainly don’t want to be accused of having a little mind, especially by Emerson.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] Technically, the vulnerabilities aren’t in the DNP3 protocol itself, but in its implementation by the majority of vendors. However, IMHO this is a distinction without a difference. If a protocol is insecurely implemented in the great majority of cases, it is an insecure protocol.

[ii] Since communications between ESPs is explicitly ruled out as a subject of CIP, the external communications aspect of the DNP3 problem is definitely out of scope for CIP. But I’m not prepared to say that internal communications using DNP3 (e.g. between an RTU and a relay in a substation) are definitely out of scope; this is because of FERC’s directive to NERC in Order 791, saying that “communications networks” within the ESP also need to be secured. Since that directive was mainly concerned with the issue of cabling (and switches/hubs) between devices within the ESP, which (cabling) goes outside the PSP, I’m inclined to doubt that Order 791 really requires the DNP3 problem to be addressed for internal substation communications. But it’s an interesting question.

Tuesday, December 2, 2014

TCIPG

No, I didn’t just choose some random letters for the title of this post. IMNSHO[i], TCIPG is one of the best-kept secrets in cyber security - and without a doubt the best-kept secret in Smart Grid security. Here is my short[ii] summary of why I think this organization is so great:

TCIPG is a research organization that focuses on making the Smart Grid more secure.
It is based at the University of Illinois[iii], although it’s a collaboration between U of I, University of California at Davis, Washington State University, and Dartmouth College.
TCIPG is funded primarily by a $18MM grant from DoE and DHS, which I believe became effective in 2010. This grant will run out next year.
There are a number of industry partners (full disclosure: One of those is Honeywell, and Dr. Himanshu Khurana of Honeywell sits on TCIPG’s External Advisory Board), plus some of the national labs.
There are two primary events that are open to industry participants. One is the annual two-day Industry Workshop, held in Champaign, Illinois at the University of Illinois. I just attended this for the second time, and was amazed (also for the second time) at how valuable it was (you can see the presentations[iv] at the link provided – I especially recommend the one by Tom Siebel, a big supporter of TCIPG and the person for whom the U of I Center for Computing Science is named). This is a free event.
The other is their biannual “Summer School”. This is a weeklong event held on a beautiful campus in rural St. Charles, Illinois (outside Chicago). I attended the most recent one, in June 2013. The next one – which could be the last they have – is scheduled for June 15-19, 2015. I highly recommend that anyone interested / involved in the cyber security of the Smart Grid attend this. The 2013 school was a great experience for me. For one thing, it is pitched toward audiences who want to hear about cutting-edge research, but who also can always use some good grounding in the basics – the basics of electricity transmission and distribution, as well as cyber security (I certainly can always use this). There are “101” lectures by professors, as well as mind-boggling presentations by professors and industry people like Jason Larsen of Idaho National Labs, who very convincingly showed how he can compromise any electronic device that gets put on his desk (and doing exactly that is a big part of his job. I understood maybe a tenth of what he said, but it was really amazing to hear him describe the many different routes he can take to compromise a device).
For another thing, the Summer School has great hands-on labs. My favorite in 2013 was when we broke up into small groups and gathered around tables. Each table had a small generator, a solar panel, a windmill driven by a fan, a lamp and a few other sources of load; our objective was to put these together into a self-sustaining microgrid. Again, this was over my head personally, but it was great to watch the others put it together (we didn’t ultimately get it to work, but a couple other groups did). So I recommend you watch this link for signup information.

There is another thing you should know about TCIPG: In an era when a lot of people think the Federal government can’t spend a dime without wasting it, I think TCIPG is one of the best examples of a program that leverages a fairly modest amount of funding to achieve a huge amount of benefit. Since 2010, they have already developed valuable technologies for securing the Smart Grid – a few of which have already been commercialized. Even more importantly, they have conducted a huge amount of research, which you can read about here (along with their publications here). They have achieved all of this on $18 million by leveraging faculty members and students (graduate and undergrad) at the four schools, as well as national labs. And you can read about – and download – some of the different Smart Grid educational tools and programs they’ve developed, both for the general public and for industry.

I mentioned their funding runs out next year. I compliment those Departments (and especially Carol Hawk of DoE) for their vision in funding this effort in the first place, and highly recommend they figure out a way to continue and even broaden the effort.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] In My Not So Humble Opinion

[ii] Full disclosure: “Short” means something entirely different in this blog from most other publications. In fact, a good translation for my use of “short” is “long”. A good translation for my use of “long” is “forget about ever getting through it”.

[iii] I confess this fact alone endears them to me, since I live in Chicago and can attend their events without having to set foot in an airport.

[iv] I will do a post soon on one of the presentations, which I think will be particularly interesting for readers. No, it has nothing to do with CIP Version 5.

Tuesday, November 25, 2014

CIP v7 and the Final (?) Compliance Schedule for CIP v6.3940

Today, NERC posted five revised CIP standards (and the related Implementation Plan and Definitions documents) for comment. These will constitute CIP v7, and will be balloted Dec. 30 – Jan. 8.

My initial birth announcement for CIP v7 only referred to twins – that is, at the time it looked like only two CIP v6 standards would be revised to v7, CIP-003-6 and CIP-010-2; they will now be CIP-003-7 and CIP-010-3. However, in their never-ending quest for perfection, the SDT decided that three other requirements also needed to be revised: CIP-004-6, CIP-007-6 and CIP-011-2; these will now become CIP-004-7, CIP-007-7 and CIP-011-3.[i] In addition, the Implementation Plan and two Definitions documents (for the Low impact requirement changes) are also changing. This means there are now eight documents that need to be approved for CIP v7; instead of giving birth to twins, NERC is the Octomom. We’ll all have to pray for a successful delivery.

First Things First

The first thing I want to do in this post is to update my recent post in which I designated the new compliance version of CIP – that is, the version you’ll actually have to comply with – to be v5.7879 (there was actually an infinitely repeating decimal, 5.78787878…. I decided this wouldn’t work too well in compliance documentation, so I rounded it off). Little did I know that less than three weeks later I would have to change that number.

I can’t use the same algorithm to compute the new number, since that assumed there were only going to be two versions of the CIP standards to comply with at one time. Silly me, I once again underestimated NERC’s ability to make everything as complicated as possible – as you can see, the industry now has three versions to implement simultaneously.[ii]

So I’ve come up with a new algorithm: I multiply the number of requirements in each version (7 in v5, 6 in v6 and 20 in v7) by the version number (5, 6 or 7) and divide their sum by the total number of requirements (33). This yields 6.39393939…, which I’ll round off to 6.3940 just because I’m that kind of guy. So this is the new compliance version: 6.3940! I won’t be so bold as to say this time that this isn’t likely to change, since I thought that before. I wouldn’t be at all surprised if some new glitch causes the SDT to have to revise one or more of the v7 standards; that will yield – are you sitting down? – CIP v8! Speaking of which, maybe I’ll have a V8™ now.

What Has Changed?

The second thing I want to do is discuss the changes that are in the new v7 standards and the other three documents. Briefly (or as brief as I can be, which isn’t saying much), here are the substantive changes (you can find the documents here):

Implementation Plan: The substantive change here is that the compliance date for CIP-003-7 Attachment 1 Section 2 (physical access controls for Low impact assets) has been pushed back from April 1, 2018 to Sept. 1, 2018.

CIP-003-7: The changes from v6 are some wording changes in Attachments 1 and 2, and a lot of changes in the Guidance; of course, there are more substantial changes from v5, since this standard now includes the Low impact requirement changes ordered by FERC.

CIP-004-7: There has been a change in one requirement part, minor VSL changes, and a few new sentences in the Guidelines and Technical Basis section. All of these changes are related to the new requirement for Transient Electronic Devices and Removable Media.

CIP-007-7: There are small VSL changes and two sentences in the Guidance (again, all related to Transients).

CIP-010-3: From v6, there are changes in the VSLs, Attachments 1 and 2, and the Guidance. The big change from v5 is the requirement for Transient Electronic Devices and Removable Media, CIP-010-3 R4.

CIP-011-3: The only change in this standard is in the Guidance section.

Definitions: CIP v6 had two documents with new Definitions, related to the new Low impact requirement. These definitions have been tweaked some.

The New Implementation Schedule

As I mentioned above, there has been one change to the implementation schedule. Therefore I revised my recent post on the schedule for CIP v5.7879, which I'm now calling 6.3940 of course. So please go there to get the Final (???) implementation schedule for CIP v6.3940.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] This leaves only two standards proudly bearing the CIP v6 designation: CIP-006-6 and CIP-009-6; this is down from eight v6 standards a week ago. My, how the great have fallen!

[ii] I’ve come to believe that some NERC managers’ bonuses are based on their ability to make things as complicated as possible. This would perhaps explain why we’re seeing this sudden flurry of complicating activity toward the end of the year – the managers are panicking as they suddenly realize NERC CIP compliance isn’t quite as complicated as it could possibly be. I must say, if my suspicion is true, these managers have richly deserved their bonuses this year! I never thought it could be this complicated.

Monday, November 24, 2014

Follow-Up to Last Week’s Serial Post (not Post Cereal)

In my post on serially-connected devices last week, I discussed the question whether devices (usually in substations) that are serially connected to an RTU or similar intermediate device[i] that participates in external routable connectivity (ERC) do themselves participate in ERC for the purpose of CIP v5. My personal opinion[ii] is that, if the end device communicates externally through that intermediate device (as described in the CIP v1 CCA Identification guidance document I referenced), then it does have ERC. If it doesn’t so communicate, it doesn’t have ERC.

Of course, then the discussion becomes what “communicates externally” means. There’s a lot of room for debate on that topic – as I found out from a few other parties in emails last week. One distinction that seems to be important is whether the intermediate device simply translates the incoming routable messages into a serial protocol for the end device, or whether the intermediate device does something like polling of the end device, meaning that incoming routable messages aren’t passed on in any way. A terminal server might be an example of the first device, while an RTU might be an example of the second.

However, this distinction would go away if it were possible to hack through the RTU to the end device – because in that case the end device would be able to communicate externally (not by intention of course, but that’s not the issue in cyber security). In the post, I state that I’m assuming “If someone were to hack the RTU through the external routable connection, they couldn’t access the connected device directly.” A footnote to this sentence says that the entity that decides to use my “guideline” for ERC, and therefore excludes serial devices connected to an RTU that communicates routably outside the facility, probably needs to be prepared to convince the auditor that it isn’t likely this hack can occur.

So guess what happened? You’re right…The next day I got an email from a cyber security professional at a large IOU, providing an example in which it would be possible to do exactly that – namely, to hack a routable connection to a remote RTU, that is itself connected serially to one or more local devices, and attack one of those serial devices. Here is what he said, although I have removed the manufacturer and model name to protect the guilty (and to save my a__ from getting sued): “You can remotely communicate directly with any relays connected serially to a (particular manufacturer/model), log in to the serial device and reprogram or (do) anything you could do to it if (physically) connected directly through the serial connection (port), if it is not properly configured.”

The moral of this story is that, if you’re going to claim that a serial device – set up with an intermediate device as discussed above – doesn’t participate in ERC, you need to convince the auditor that it’s very unlikely the device could actually be attacked.[iii]

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[i] This isn’t to be confused with the Intermediate System – a new NERC defined term – that forms the basis of the new requirement for Interactive Remote Access, CIP-005-5.1 R2.

[ii] There is supposed to be a NERC Lessons Learned document coming out soon (in draft form) on this issue, but you can’t count on it to a) provide a definitive answer or b) be finalized in time for you to start your substation compliance work (and you definitely can’t count on it having the same status as an Interpretation arrived at through the normal 2-3 year RFI process). If you need an answer now, you should consider the “Roll Your Own” approach that I have been discussing lately, starting with this post.

[iii] You might do this by demonstrating that there haven’t been any such vulnerabilities identified by ICS-CERT or similar organizations.