A power system is a complicated interconnection system and can be categorized by power generation, transmission, substation and distribution, and a myriad of devices which should be controlled at each area. Fig. 1 shows the control systems corresponding to the power systems .
Fig. 1.The control systems and their connections between generation and substation
The Energy Management System (EMS) controls and monitors bulk power plants and high-voltage substations. The regional control centers (RCC or SCADA), controls and monitors medium-voltage substations. The small control centers (SCC) connect with unmanned mediumvoltage substations. Each control system connects with each other. EMS sends control data to RCCs and RCCs send status data to EMS to estimate states of power systems. The Distribution Automation System (DAS) independently operates automation switches in distribution areas. However, DAS must connect with EMS due to Distributed Energy Resources (DER) such as Distributed Generators (DG) in the Smart Grid environment.
The power system at present must adjust to increasing demand and complexity in a changing pattern of electric consumption because electricity cannot be stored. Thus control systems are needed to dynamically generate and deliver electricity in real time manner.
In this paper, independent and hierarchical communication network architecture for Smart Grid is designed by information hiding and suppressing data exchange. The function of self-healing and resist attack will be easily implemented by this proposed network. Communication bandwidth and topology is considered corresponding to the advent of Distribution Energy Resources. Availability is also considered when it comes to design a communication network for power systems because power systems should provide seamless service. New network architecture of optical network is also proposed for the distribution area in power systems to increase reliability. Additionally, a mitigation operation algorithm using device-level intelligence is suggested if servers are compromised by cyber-attacks.
2. Cyber Threats in Power Systems
2.1 Cyber Attack matrix in power systems
The electrical power system is an important infrastructure which provides basic needs of life, so it has always been a high priority target for military and insurgents. Nowadays, cyber-attacks are common threats, so power systems can be attacked by cyber-attacks with or without physical actions . The Smart Grid has introduced Distributed Energy Resource (DER), and this creates a myriad of network connections to vulnerable environments. In the Smart Grid risk may increase due to increase number of connections and number of entry points . Possible attackers in the power systems are terrorists, insiders and spies. Possible goals are disrupting the system and seizing control of the system .
A sniffing program could be installed on the targeted system so that it enters a system and provides information to attackers on the systems’ configuration, connection, vulnerabilities, and operation statuses. Hackers recently intruded utility networks and installed software which could disrupt the system . Detecting sniffing attack is almost impossible unless the attacker begins injecting data to probe the configuration.
State Estimation is used in the power system to evaluate current state and predict future states of the power grid by measuring data from many sensors which are scattered in the power grids. Accuracy of measuring data is critical because estimation only depends on measured data. If data are forged, the state estimation also brings the wrong results which could cause unpredictable calamity. In this point of view, tampering with state estimation is one of the best ways to disrupt the power system; a false data injection attack could be the best attack to achieve this goal [6-8].
The control systems are evolving based on open standard technologies to make power systems intelligent. Control systems have been thought to be safe from malwares, but the Stuxnet malware has been discovered in control systems . Stuxnet is the first known malware that targets the controls at a specific industrial control system such as a power plant . The ultimate goal of Stuxnet is to sabotage that facility by reprogramming Programmable Logic Controllers (PLCs) to operate as the attackers intend them to, most likely out of their specified boundaries. Fig. 2 illustrates the targeted system architecture.
In the Fig. 2, frequency converters are used to control the speed of another device, such as a motor in field level. For example, if the frequency is increased, the speed of the motor increases. Stuxnet communicates with at least 31 frequency converters to sabotage the target system by slowing down or speeding up the motor to different rates at different times [10, 11]. Stuxnet infects PLCs with different codes depending on the characteristics of the target system.
Fig. 2.The targeted system architecture by Stuxnet
To analyze attack pattern of Stuxnet, attack trees are used. Attack tree method is a way of making decisions about how to improve security . The main attack goal of Stuxnet is to sabotage control facilities such as power plants. To achieve this goal, Stuxnet modified I/O in target control process. For almost 17 months, Stuxnet targeted a specific Siemens centrifuge control component to moderate the speed at which the nuclear facilities’ centrifuges rotated in order to damage, but not destroy them .
To achieve the goal, Stuxnet used a gain in access and to create logic. The protocol used in the plant is Profibus which is an open protocol, so injection of logics is achieved by analyzing the given protocol. To gain access, Stuxnet used three sub-attacks: compromising system, getting ICS’s schematics and infecting computers.
Stuxnet is designed to spread aggressively. Stuxnet used both known and previously unknown vulnerabilities to spread, and was powerful enough to evade state-of-thepractice security technologies and procedures 
Whenever forged software mounts on the system, the power system in the range of the device could be affected.Trojan device attack is that unauthorized outside forces can gain access to the system and can modify it or give false data to the system operators to make wrong decisions.Stuxnet is able to perform the following actions to modify PLC : - Monitor PLC blocks being written to and read from the PLC. - Infect a PLC by inserting its own blocks and replacing or infecting existing blocks. - Mask the fact that a PLC is infected.
Fig. 3 shows how control PC changes code block of PLC. Stuxnet is a very powerful and complicated malware, so a single solution cannot prevent an attack like Stuxnet, but a mitigation method including process and policy can significantly reduce the negative consequences that result from such an attack . A proposed mitigation method will be presented in the next section.
Fig. 3.Modifying PLC
3. Proposed Mitigation Method
Perfect protection of the system from cyber-attacks is not possible. Operational approaches which could mitigate the possible cyber-attacks mentioned in the previous section are proposed.
3.1 PLC and IED
In control systems, the Intelligent Electronic Device (IED) or Programmable Logic Controller (PLC) acquires the remote data, which includes meter readings, pressure, voltage, or other equipment status, then performs local control and transfers the data to the server .
PLCs scan their I/O by electrically reading each I/O point. This is done quickly, but in a system with lots of I/O points it can take some time to completely scan all the points. Thus, the recorded data are dependent on the scan period. If the scan period is long, some important data will not be recorded. However, IEDs have an exceptional report function, so whenever exceptional data occurs in the field, IEDs store these with a time stamp and send them to the server in real time.
PLCs are closely dependent on, and programmed and controlled by servers. PLCs are programmed by a server in a program mode and execute the program in a run mode. However an IED has its own program (firmware) and can communicate with both servers and other IEDs. We can mitigate a cyber-attack like Stuxnet by an operational method. When a server is compromised by cyber-attacks, encryption methods cannot protect remote devices from a malicious command from the compromised server. Unlike using PLC, IED cannot be modified by servers; servers just try to send control commands which make system operate incorrectly.
3.2 Device-level intelligence
The main difference of PLC and IED is intelligence. PLCs have no intelligence; only they are programmed by the server and execute the program with given inputs. However, an IED has its own program and communicates with the server; the server only sends messages to IED and IED executes the messages.
Substituting IED for PLC is one of the mitigation methods when a server is compromised by cyber-attacks. Whenever the server sends a message and if the message is the control message, IEDs check the threshold which is maximum or minimum value to protect systems from any failures. For example, Stuxnet modifies the input value of a frequency converter by 1410, but maximum value is 1210; a motor of centrifuge spins too fast and it damages the motor.
Fig.4 presents that substituting IED for PLC gives intelligent to device-level. Adding intelligence to the device-level will mitigate effects of cyber-attacks like Stuxnet.
Fig. 4.Substituting IED for PLC
Fig. 5 shows the proposed system architecture. An IED can substitute for PLC and connect and control each field device. An IED also has communication devices, so communication processors are not needed.
Fig. 5.Proposed system architecture in power plants
Stuxnet compromised an Operator Server and made it to program a PLC to increase the speed of motors; this would cause to damage motors and the damage would spread out to the system. If an IED replaces a PLC and IED checks the increasing speed of a motor before execution, and if the increasing speed will deteriorate a motor and other systems, then IED will reject execution of the command and report a warning to other IEDs in the network.
In normal operation, a server checks status of each IED and sends commands to make a system optimal. When a server is not able to coordinate with other IEDs, each IED exchanges status with other IEDs and makes best decision to keep the systems operating correctly.
If IED decides that the command from servers is normal, IED executes the command and changes the threshold based on status of devices. If the control value exceed threshold, IED rejects the command and reports warning message to other IEDs. After then, IED goes in to autonomous operation. Autonomous operation will be explained more detail in section 3.4. Fig. 6 shows the mitigation algorithm against cyber-attacks and it can be implemented as below:
Fig. 6.Proposed defense flow chart in control systems
3.3 Availability Analysis
Availability is one of the good tools to measure system requirements because seamless service is the important factor for power system. Availability in IEC 60870-4 is defined as below
where MTBF is Mean Time Between Failures and MTTR is Mean Time To detect and Repair a failure; MTTR is assumed as 1 hour for devices and 24 hours for cable disconnection in this example.
Fault tree analysis is used to measure availability of the system . Fault trees are useful to predict the overall system unavailability. From (1) unavailability can be defined by as below
Fig. 7 shows an example of a legacy SCADA communication system. EMS connects with distribution server by T1 or E1 line and Distribution server connects with IED by various communication modems. MTBF data are used from -. Table 1 shows unavailability corresponding MTBF of each component. From the given MTBF data, unavailability in Table 1 is calculated by (2). MTBF of an Ethernet switch is 20.5 years and the unavailability is:
Fig. 7.Example of a legacy SCADA communication system
Table 1.Approximate component unavailability
The MTBF data in Table 1 are based on averaged data from fact sheets of various manufacturers, so the actual MTBF should be used to evaluate present operation components .
Fig. 8 shows unavailability of legacy system based on Table 1; availability is 99.9919% in normal operation without cyber-attacks. If one of servers (engineering server or operator server) is compromised by cyber-attacks, function availability goes to zero because PLCs are dependent on servers.
Fig. 8.Unavailability of legacy system in power plants
Isolation and autonomous operation are applied to the legacy system to mitigate cyber-attacks. Cyber-attacks mostly affect systems not the network, so the analysis is focused on unavailability of the system not the one of the network. PLC is changed into IED and whenever IED detects cyber intrusion, IED isolates itself and does autonomous operations.
Fig. 9 shows increasing availability in field level function operation standpoints. Even if other servers fail, all field devices which are monitored and controlled by IED work correctly; IED keeps systems work properly in its boundary.Unavailability is 0.00067% with isolation of IED when servers fail due to any of reasons.
Fig. 9.Function unavailability with isolation of IED
When the servers can detect cyber intrusion, function availability may be increased depending on intrusion detection rate. However, availability of legacy method reached 99.9977 in case of 100% detection rate, which is less than the result of proposed method. Fig.10 shows the simulation result.
Fig. 10.Simulation result of the proposed method
Table 2 shows comparative function availability in field level. Device-level intelligence gives more availability to the legacy system without considering cyber-attacks. When cyber-attacks are occurred, IED must detect intrusions. Availability will be changed by the intrusion detection rate. However, most field level devices operate with limits (minimum and maximum thresholds) , so using threshold to detect intrusion is a proper approach.
Table 2.Comparative function availability in field level
Cyber-attacks in control systems are limited in the specific facilities, but more factors should be considered in large area and network systems. In the next section cyber-attacks will be considered in the proposed network architecture.
3.4 Cyber defense of the proposed network
In the previous section, a mitigation method is proposed for single-site cyber-attacks. However, power control systems are connected with each other by communication networks. In this section, more detail defense mechanism will be explained based on the proposed network architecture.
In normal operation, servers (DMS : Distribution Management System) control IEDs to coordinate with other servers to make the power system optimal. Servers normally send a command of increasing or decreasing generation of electricity to IEDs.However, servers in a power system may be compromised by a malware like Stuxnet. Fig.11 shows the proposed network to mitigate cyber-attacks such as Stuxnet.
Fig. 11.Proposed network to mitigate cyber-attacks such as Stuxnet
From the given network, all IEDs share their status with other IEDs in real-time manner;real-time information is the key of profitability on the Smart Grid.
If a server is compromised by cyber-attack the server sends IEDs commands which may cause disruption of power systems such as increasing or decreasing generation power. If IEDs have intelligence, they can check the validation of the command. For example, IEDs can get the standard frequency of electricity, 60Hz, and also can estimate the frequency of electricity after increasing or decreasing generation of electricity. If the frequency of electricity increases 4% of the standard frequency, generators start to trip and if the frequency of electricity decreases 4%, areas start blacking out .
To keep the frequency of electricity steady and in the bound of tolerance, EMS dynamically controls generators based on generation and demand information. Generators will be scattered in the Smart Grid, so autonomous and distributed monitoring and controlling generators is indispensable both for reliability and security. EMS makes control decision based on status of power systems, so IED has to be connected with EMS to send accurate data and receive precise control signal.
Defense flow chart in Fig. 6 may be modified for multiring and multi-server systems. If IEDs receive commands from a server, control value is checked by thresholds. If the result of execution with control value exceeds thresholds, IEDs reject the command from the server and report a warning message to the network. If there are available servers in the network, IEDs change network configurations and connect with the available server. If there are no available servers, IED isolates its network and does autonomous operation. This can be implemented as below:
Fig. 12 presents this extended algorithm. When a server is compromised by cyber-attacks, the server can control all devices which connect with the server by network. However if IEDs can detect if a server is abnormal, the effect of cyber-attack will be alleviated. If an IED detects that the server is abnormal by device-level intelligence as mentioned previous section, it sends warning message to the network so that other IEDs recognize that the server is not in a normal operation mode.
Fig. 12.Extended defense flow chart in multi-ring systems
When IEDs receive the warning data from the other IEDs, IEDs change its mode into ‘Autonomous Operation Mode’ and rejects all data and commands from the server unless the server sends ‘Recover Command’ with a validation method such as an one time password (OTP). The autonomous operation mode can be implemented as below:
Fig. 13 shows the autonomous operation procedure and how to return normal operation. In Autonomous operation mode, each IED updates its status in real time manner and controls DGs to keep the power system stable. From the shared data such as total generation and availability, each IED decides which IED has higher priority to control its generator.
Fig. 13.Autonomous operation mode
Other possible attacks should be considered from compromising servers. A compromised server may generate forged data and send them to the higher level server, EMS, to inject false data. If EMS receives false data, state estimation would be wrong and consequently wrong command would be sent out to the generators. Thus, IEDs should send warning message to higher level servers to protect EMS from false data injection attack.
Since IEDs connect to the higher network through a server, a compromised server can hijack a warning message if an IED tries to send a warning message to other servers. Also, IEDs only can communicate with adjacent servers, so IEDs in the network of a compromised server cannot directly communicate with EMS.
A proposed network in Fig. 11 can be used to solve this problem. A shared IED can send a warning message to EMS through a detour path which is not connected with a compromised server. Fig.14 shows the detour path with dashed line.
Fig. 14.Shared IED sends warning message to EMS by a detour path to prevent from false data injection attack
When a healthy DMS receives a warning message from IEDs, the message is forwarded EMS and other DMSs. Thus all devices in the network can know which devices are compromised. A healthy DMS takes over IEDs in compromised rings and makes a new ring network. Fig. 15 shows that a compromised server may be isolated and the other servers make a new ring network to provide seamless service.
Fig. 15.Isolation of a compromised server
If multiple servers are compromised and there are no available higher-level servers, IEDs operate with autonomous mode and all compromised servers are isolated. Fig. 16 shows isolation and autonomous operation in the proposed network.
Fig. 16.Isolation and autonomous operation
Stuxnet is analyzed and a device-level intelligence operation is proposed. To minimize system impact from a cyber-attack, independent and autonomous operation is suggested. Fast isolation and self-healing functions are easily implemented by this operation mechanism.
Evolving technologies will bring lots of extra possible cyber-attacks in the power system. Thus, we need further study of mitigation methods corresponding to new cyberattacks.
Finding optimal thresholds also can be re-studied based on field applications. Detecting cyber-attack is dependent on thresholds, so threshold update affects system reliability. More factors should be considered to calculate thresholds for other more complicated applications, such as load shedding, because this kind of application is affected by the status of other systems.