A CIP
compliance analyst with a large electric utility wrote in recently with the
following question:
'I am curious as to your experience
with solutions for CIP7R4.2.2 “Detected failure of Part 4.1 event logging”. We have heard from program managers that other
companies in the industry use either an ICS vendor solution or rig up a
“heartbeat” or “polling” proprietary solution. In this particular situation,
the platforms are Intel and Linux. I’m curious as to what is accepted as a
solution to this interesting requirement.'
I passed
this question on to an auditor who usually has something interesting to say on
anything having to do with NERC CIP. He didn’t disappoint this time – in fact,
he obviously devoted about an hour on a gorgeous weekend day (at least it was
here in Chicago – I don’t know about the city where the auditor lives) to
putting together the following answer:
“The answer
is long and complicated. It all depends
on the capabilities of both the monitored and monitoring systems. First of all, the entity needs to fully
understand the expectation of the requirement.
The requirement is not to determine that the Cyber Asset generating the
logs is up and running, or even solely that it is generating logs locally. The expectation is to detect a failure of the
logging process from start to finish.
There are numerous potential points of failure. Something could happen on the Cyber Asset
generating the logs that causes it to stop logging (perhaps the log file is
full). If the device cannot natively
send its logs to the log server/SIEM, it will need an agent to perform this
function; something could happen to cause the agent to fail. Perhaps the IP address of the log server is
incorrectly configured and the logs are being sent to the bit bucket. Perhaps there is a networking issue and the
log server is not reachable from the Cyber Asset generating the logs. And then there are the issues that crop up on
the log server/SIEM to contend with, especially when the log service and SIEM
are different applications on the same or different servers.
“Here is
what I have seen that does not work:
“- Some entities have simply monitored the
Cyber Asset generating the logs using a simplistic method such as pinging the system. That approach fails because it can only
detect when the system is either completely down or unable to be reached over
the network. The problem with the ping
approach is that it cannot detect when the device is up but the logging service
has failed. As a side note, a Cyber
Asset that is down is not generating logs.
That is not a failure of event logging as envisioned by CIP-007-6 R4
Part 4.2.2. When the system is down,
there is nothing to log. That does not
mean that monitoring system availability is not important; it just does not
accomplish what is expected in this instance.
“- A variation of the above is to monitor
the logging agent on the Cyber Asset that cannot natively send its logs to a
log server/SIEM. Quite often this is
accomplished by seeing the service is “running.” This approach fails because of several
reasons. The service could be hung;
while it is “running,” it is not doing anything. The destination IP address of the log
server/SIEM could be incorrect. There
could be a networking issue making the log server/SIEM unreachable. And, if the only monitoring is of the source
and not also of the log server/SIEM itself, the log server/SIEM could be
down. The problem with monitoring the
logging service on the source Cyber Asset is that this cannot detect a failure
in the path between the source log and the destination log server/SIEM.
“OK, so what
can work? Here is what I have seen:
“- Some systems are normally “chatty,”
meaning that they generate a lot of log traffic in the normal course of
operation. If the SIEM is capable, an
event trigger could be configured that would generate an alert if the source
system has not been heard from in a reasonable period of time. For example, a Windows or Unix/Linux system
normally generates many logs per minute.
The entity could determine how long it typically takes for the source
system to reboot, add a buffer, and set the event trigger to alert if nothing
has been received from the source system within the timeout window. For example, let’s say the source Windows
system normally generates an average of ten event log messages per minute when
idle and takes five-to-ten minutes to reboot after applying patches. If the entity defined a trigger event that
would alert if no log messages have been received from the Windows system in
fifteen minutes, that would accomplish the Part 1.4 requirement while
minimizing false alerts. If the system
generates only one message an hour and takes five-to-ten minutes to reboot, a
two- or three-hour timeout might be appropriate.
“- Some entities cause their Windows and
Unix/Linux Cyber Assets to issue a specifically crafted “heartbeat” event log
message on a defined periodicity rather than simply monitoring for any log
traffic. In this case, the SIEM is
configured to generate an alert if the heartbeat message is not received as
expected. Again, allowing for normal
outages, such as the reboot timing, the failure to receive the heartbeat
message indicates a failure somewhere along the path that needs to be
investigated. This is relatively easy to
implement, using a cron job in Unix/Linux or an AT scheduled task in
Windows. The periodically scheduled task
uses the appropriate operating system features to generate an event log message
that is then picked up and sent to the log server/SIEM. In Windows, this can be done from a .bat file
that uses the command line interface to execute the “eventcreate” command. Again, the timeout is based on the
periodicity of the periodic event message creation.
“- Some Cyber Assets are very quiet,
especially network switches. These
devices usually have no native capability to generate an event log message on
demand. There are several options
here. If the switch is a managed switch
with external IP accessibility, the entity might be able to use a remote
management system to periodically connect to and log into the switch. This could be as simple as relying on a
third-party solution that is already being used to periodically back up the
configuration (e.g., CiscoWorks or Industrial Defender). The switch is expected to log the access
event per CIP-007-6 R4 Part 4.1.1 and 4.1.2 anyhow. The login attempt message can be used in lieu
of a specially crafted heartbeat. If the
switch is not externally reachable for management purposes, the entity might be
able to trigger the log in event from another Cyber Asset within the ESP and
accomplish the same thing.
“- As a last resort, the entity staff need
to manually check on the device, perhaps as part of the daily system checks, to
see if there are recent log messages in its buffer that were not sent out.
“If the
entity is using multiple log servers and/or redundant SIEMs, the monitoring
should include all of them. That way,
the entity does not find itself unexpectedly in a single point of failure
situation.
I am sure
there are other options, but these are the typical ones I have seen and none of
them require extensive programming effort or expensive vendor support.”
Any opinions expressed in this blog post are strictly mine
and are not necessarily shared by any of the clients of Tom Alrich LLC.
If you would like to comment on what you have read here, I
would love to hear from you. Please email me at tom@tomalrich.com. Please keep in mind that
if you’re a NERC entity, Tom Alrich LLC can help you with NERC CIP issues or
challenges like what is discussed in this post – especially on compliance with
CIP-013. And if you’re a security vendor to the power industry, TALLC can help
you by developing marketing materials, delivering webinars, etc. To discuss any
of this, you can email me at the same address.
No comments:
Post a Comment