1. Introduction
With the development of smart grid technology, smart grid has increasingly shown its importance. Compared with the centralized one-way transmission of traditional grid, smart grid makes a feature of decentralized two-way transmission as shown in Fig. 1 and is aimed at providing improved reliability, efficiency and sustainability, consumer involvement and security [1]. In smart grid, every user is equipped with a smart meter. On one hand, the smart grid needs to collect real-time information from smart meters for adjusting power supplies or changing electricity price etc. On the other hand, the aggregator is not supposed to know the personal detailed electricity consumption information on each user because the detailed electricity consumption information may leak the user’s behavior information [2]. Therefore, there are two important tasks for security in the smart grid [3]. One is to hide the data privacy of the individual smart meter readings. At the same time, the other one is to allow the aggregator to obtain the overall electricity consumption information about all users.
Fig. 1. The architecture of smart grid
To solve the two important tasks in smart grid, homomorphic Paillier encryption technique [4] can be applied. So the aggregator can perform the aggregation operation without decrypting the encrypted data of users. Such homomorphic encryption has been applied in many existing data aggregation schemes [1, 5-12]. Most of the aggregation schemes focused on one dimensional data [5-12], while Lu et al.’s scheme [1] focuses on multidimensional user data which is rarely mentioned in existing approaches. But the scheme of [1] has one drawback that it can’t resist internal attackers. That is, if the operation authority (OA) is compromised by attackers, users’ data will be revealed. And although they adopte the batch verification technique to reduce authentication cost, the time consuming of their pairing operations is still proportional to the number of users. Fan et al. proposed a privacy-enhanced data aggregation scheme [13]. In [13], each user embeds a blinding factor into their ciphertext to protect their data from internal attackers; and the aggregator also holds one that makes the sum of all these blinding factors is 0. This kind of blinding factor technique has been applied in several works [6, 14-17]. But they all share one major drawback that they are not tolerant to smart meter failures. If some users fail to report their data, the aggregator will obtain the wrong aggregation result as the sum of the blinding factors is no longer 0. A privacy-preserving data aggregation scheme is proposed by Chen et al. [9]. In [9], TA takes charge of distributing the blinding factors to the aggregator and users. If some users’ smart meters don’t work, the aggregation scheme can still work well by the help of TA. But the communication overhead is significant in [9] and it cannot get message authentication. Based on above descriptions, this paper presents a privacy-preserving data aggregation scheme that can deal with the multidimensional data. System model includes a trusted authority (TA), an untrusted aggregator and users. We use the Paillier Cryptosystem and blinding factor technique to encrypt the multidimensional data as a whole and take advantage of the homomorphic property of the Paillier Cryptosystem to achieve data aggregation. Signature and efficient batch verification have also been applied into our scheme for data integrity and quick verification. And the efficient batch verification only requires 2 pairing operations. Our scheme also supports fault tolerance which means that if some users’ smart meters don’t work, our scheme can still work well with the help of TA. Furthermore, we consider the situation that the batch verification may fail. If the batch verification fails, we can use the technique proposed in [18] to quickly find the invalid signatures. In addition, we give two extensions of our scheme, one is that our scheme can be used to compute a fixed user’s time-of-use electricity bill. The other is that our scheme is able to effectively and quickly deal with dynamic user situation. In security analysis, we prove the unforgeability and batch verification security in details and explain that our scheme can also resist external attackers and internal attackers. Through performance analysis, our scheme has lower computational complexity and communication overhead than existing schemes.
The remainder of this paper is organized as follows. The system model and preliminaries are introduced in Section 2. Our detailed scheme and the extensions of our scheme are given in Section 3 and Section 4, respectively. In Section 5, the security and performance analysis of our scheme are discussed. Lastly, we draw conclusions in Section 6.
2. Simplified System Model and Preliminaries
2.1 Simplified System Model
We consider simplified smart grid architecture in our system model, which includes a number of users, an aggregator and a trusted authority (TA) as shown in Fig. 2. This paper mainly deals with how to report users’ data to the aggregator in a privacy-preserving way with the signature aggregation and batch verification.
Fig. 2. System model
• Trusted Authority (TA): TA is a trusted entity in our system model which belongs to some independent organizations like Regional Transmission Organizations (RTO) or Independent System Operators (ISO). In the initialization phase, TA selects blinding factors and sends them to the aggregator and each user. Besides that, if some smart meters cannot report the real-time data to the aggregator, TA can help to make the aggregation process to continue normal execution. We assume that TA cannot be compromised by any strong adversaries.
• Aggregator: The aggregator is a powerful entity who is curious about users’ electricity consumption data. In our system model, the aggregator might be compromised by the outside adversaries, thus it is untrusted. During the initialization phase, the aggregator generates secret values and publishes public information. During the aggregation phase, the aggregator can use the batch verification to verify users’ signatures and get users’ total electricity consumption information without knowing each user’s.
• Users U={U1, U2,…,Un}: represents user set. If there are n users, U={U1, U2,…,Un}. Each user Ui∈U is equipped with a smart meter that can record the real-time electricity consumption information. These real-time data will be reported to the aggregator in a certain period, i.e. every 15 minutes. Sometimes smart meters may malfunction i.e. they may stop reporting for a while or reset it later.
2.2 Bilinear Pairing Setting
We define two cyclic multiplicative groups {G1,G2} with the same prime order q. g is a generator of G1 and G2 possess a nondegenerated and efficiently computable bilinear map e: G1×G1→G2. The bilinear pairing contains the following features [13].
• Bilinearity: \(e(P, P) \neq 1_{\mathrm{G}_{2}}\) and e(Pa,Qb)=e(P,Q)ab∈G2 for all P,Q∈G1, \(a, b \in Z_{q}^{*}\)
• Nondegeneracy: There exists P,Q∈G1 such that \(e(P, Q) \neq 1_{\mathrm{G}_{2}}\).
• Computability: There exists an efficient algorithm to compute e(P,Q) for all P,Q∈G1.
2.3 Paillier Cryptosystem
The Paillier Cryptosystem [4] can acquire the homomorphic properties. The Paillier cryptosystem specifically contains key generation, encryption and decryption [1].
• Key Generation: According to the security parameter k, two large prime numbers p,q are first selected, where |p|=|q|=k. Then the RSA modulus are computed with N=pq and λ=lcm(p-1,q-1). Define a function \(L(u)=\frac{u-1}{N}\), after choosing a generator \(g \in Z_{N^{2}}^{*}\), \(\mu=\left(L\left(g^{\lambda} \bmod N^{2}\right)\right)^{-1} \bmod N\) is further calculated. Then, the public key is pk=(N,g), and the corresponding private key is sk=(λ,μ).
• Encryption: Given a message m∈ZN, select a random number \(r \in Z_{N}^{*}\). The ciphertext of m is c=E(m)=gm·rN MOD N2.
• Decryption: Given the ciphertext \(c \in Z_{N}^{*}\), the corresponding message can be recovered as m=D(c)=L(cλ mod N2). μ mod N.
In addition, the Paillier encryption can be proved to satisfy the chosen plaintext security, and the correctness and security can be referred to [4].
2.4 Small Exponent Test
If someone needs to verify the equations set Yi=gXi(i=1,2,…,n), where g is a generator for a group of prime order q . One can calculate and check if the equation \(\prod_{i=1}^{n} y_{i}=g^{\sum_{i=1}^{n} x_{i}}\) is true. If there are such two pairs (x1,y1) and (x2,y2), their product can be verified correctly, but each verification is not correct, for example, by submitting the pairs (X1-α,Y1) and (X2-α,Y2) for any α [19]. Because of the above reasons, the small exponent test method is proposed in [20], which will be applied to pairings in the following section.
Small exponent test [20]: choose a small exponent δi of ℓb bits and compute \(\prod_{i=1}^{n} y_{i}^{\delta_{i}}=g^{\sum_{i=1}^{n} x \delta_{i}}\). The probability is 2-ℓb when a bad pair is accepted. Choose the size of b l according to the trade-off between efficiency and security.
2.5 Notation
Some notations are defined as follows.
t: the time when the aggregator needs to aggregate the power usage data.
U: the user set.
Ui: the users in the neighborhood, where i=1,2,…,n.
IDi: the identifier of Ui.
πi: the Ui’s blinding factor.
π0: the blinding factor of aggregator.
N, gAG: the public key of aggregator.
λ,μ: the private key of aggregator.
Xi: the Ui's private key.
Yi: the Ui's private key.
3. Proposed Scheme
3.1 Initialization Phase
Aggregator: First the aggregator calculates the Paillier Cryptosystem’s public key (N=p1q1,gAG), and the corresponding private key (λ,μ), where p1, q1 are two large primes, gAG is a generator of \(\mathrm{Z}_{N^{2}}^{*}\), λ=lcm(p1-1,q1-1) and \(\mu=\left(L\left(g_{A g}^{\lambda} \bmod N^{2}\right)\right)^{-1} \bmod N\). Assume that there needs to report l types of electricity usage data in total. Then the aggregator needs to choose a sequence \(\overrightarrow{\mathrm{a}}=\left(a_{1}, a_{2}, \ldots, a_{l}\right)\) where α1∈Z+, for i=1,2,…,l. Lastly, the public information {N(g1…gl)} is published by the aggregator, and {λ,μ} is kept as secrets.
TA: Choose the security parameter k, TA runs Gen(k) to generate (q,g,G1,G2,e). TA also needs to select (n +1) blinding factors {π0,π1,…,πn} at random such that \(\pi_{0}+\pi_{1}+\cdots+\pi_{n} \equiv 0 \bmod N\). Firstly, n random numbers {π1,…,πn}(π1∈ZN, i=1,…,n) are generated by running pseudorandom generators, and π0=-(π1+…+πn)mod N is computed. Actually, the size should not be less than 1024 bits for each blinding factor. Lastly, TA respectively sends π0 to the aggregator and πi to Ui for each i=1,2,…,n. In addition, TA need to select three secure hash functions H1, H2, H3 where H1:{0,1}*→G1, \(H_{2}:\{0,1\}^{*} \rightarrow Z_{N}^{*}\) and \(H_{3}:\{0,1\}^{*} \rightarrow Z_{q}^{*}\).
Ui: Ui selects a random number \(X_{i} \in Z_{q}^{*}\) and computes the corresponding value Yi where Yi=gXi. Then Xi is Ui's private key and Yi is its public key.
3.2 Report Phase
Each user Ui∈U collects l types of data (di1,di2,…,dil) by the smart meter in a period time so as to achieve the real-time users’ electricity consumption data. Then Ui performs the following steps.
(1) After getting the l types of data (di1,di2,…,dil) from the smart meter, Ui computes the ciphertext \(C T_{i}=g_{1}^{d_{i 1}} \cdot g_{2}^{d_{i 2}} \cdots \cdot g_{l}^{d_{i}} \cdot\left(H_{2}(t)\right)^{\pi_{i}} \bmod N^{2}\) according to Paillier Cryptosystem.
(2) Ui first calculates two values hi, W, where hi=H3(CTi), W=H1(t) and then computes Vi=WXihi. Finally the signature σi=Vi.
(3) Ui reports the ciphertext and signature {σi,CTi} to the aggregator.
3.3 Aggregation Phase
After receiving total n reports {σi,CTi} from the users where i=1,2,…,n, the aggregator calculates hi=H3(CTi) for i=1,2,…,n and W=H1(t) and performs the following steps.
(1) Firstly, the aggregator verifies all signatures by checking whether \(e\left(g, \sigma_{i}\right)=e\left(W, Y_{i}^{h_{i}}\right)\). In order to improve the efficiency of verification, the aggregator can perform the batch verification by checking whether \(e\left(g, \prod_{i=1}^{n} \sigma_{i}^{\delta_{i}}\right)=e\left(W, \prod_{i=1}^{n}\left(Y_{i}^{h_{i}}\right)^{\delta_{i}}\right)\) where δi is a random element in \(\mathrm{Z}_{q}^{*}\). Then the time-consuming pairing operation e(·,·) can be reduced from 2n to 2 times.
(2) If the signatures are all valid in step 1, the aggregator computes \(V=H_{2}(t)^{\pi_{0}} \prod_{i=1}^{n} C T_{i} \bmod N^{2}\), where V satisfies \(V=g_{A g}^{a_{1} \sum_{i=1}^{n} d_{i 1}+a_{2} \sum_{i=1}^{n} d_{i 2}+\cdots+a_{l} \sum_{i=1}^{n} d_{d}} \cdot H_{2}(t)^{\beta N} \bmod N^{2}\). This equation will be explained in Section 3 and 4. Here we take \(M=a_{1} \sum_{i=1}^{n} d_{i 1}+a_{2} \sum_{i=1}^{n} d_{i 2}+\cdots+a_{l} \sum_{i=1}^{n} d_{i l} \text { and } R=\left(H_{2}(t)\right)^{\beta}\) and R=(H2(t))β. And we can see the report \(V=g_{A g}^{M} R^{N} \bmod N^{2}\) is a ciphertext of the Paillier Cryptosystem. Thus the aggregator can get M by taking the decryption algorithm of the Paillier Cryptosystem. By taking the Algorithm 1 [1], the aggregator can recover the summation of the data with the different type (D1,D2,…,Dl), where \(D_{j}=\sum_{i=1}^{n} d_{i j}(j=1,2, \ldots, l)\) represents the data aggregation value for type j.
Algorithm 1 [1]. Recover the aggregated report
The batch verification will fail when any of the signatures is invalid. In this case, we can use the technique proposed by Law and Matt [18] in 2007 for tracing the users who provide invalid signatures in a batch.
3.4 Correctness
Signature Verification: According to the bilinearity feature of the bilinear pairing mentioned in Section 2.2, we give the correctness verification as (1).
\(\begin{aligned} e\left(W, \prod_{i=1}^{n}\left(Y_{i}^{h_{i}}\right)^{\delta_{i}}\right) &=e\left(W, \prod_{i=1}^{n}\left(g^{x, h_{i}}\right)^{\delta_{i}}\right) \\ &=\prod_{i=1}^{n} e\left(W, g^{x h_{i} \delta_{i}}\right) \\ &=\prod_{i=1}^{n} e\left(W^{x h_{i} \delta_{i}}, g\right) \\ &=\prod_{i=1}^{n} e\left(V_{i}^{\delta_{i}}, g\right) \\ &=e\left(\prod_{i=1}^{n} \sigma_{i}^{\delta_{i}}, g\right) \end{aligned}\) (1)
Ciphertext Decryption: In the aggregation phase, if they are valid for all the signatures, the aggregator needs to compute the value V for further decryption. We will now verify the correctness of the ciphertext decryption in our scheme. We use the feature of the blinding factors that \(\pi_{0}+\pi_{1}+\cdots+\pi_{n} \equiv 0 \bmod N\) so the correctness verification is given as following (2).
\(\begin{aligned} V &=H_{2}(t)^{\pi_{0}} \prod_{i=1}^{n} C T_{i} \bmod N^{2} \\ &=H_{2}(t)^{\pi_{0}} \prod_{i=1}^{n} g_{1}^{d_{i 1}} \cdot g_{2}^{d_{i 2}} \cdots \cdot g_{l}^{d_{i}} H_{2}(t)^{\pi_{i}} \bmod N^{2} \\ &=\prod_{i=1}^{n} g_{1}^{d_{i 1}} \cdot g_{2}^{d_{i 2}} \cdots \cdots g_{l}^{d_{i}} H_{2}(t)^{\pi_{0}+\pi_{1}+\cdots+\pi_{n}} \bmod N^{2} \\ &=g_{1}^{\sum_{i=1}^{n} d_{i 1}} \cdot g_{2}^{\sum_{i=1}^{n} d_{i 2}} \ldots \cdots g_{l}^{\sum_{i=1}^{n} d_{i l}} \cdot H_{2}(t)^{\sum_{j=0}^{n} \pi_{j}} \bmod N^{2} \end{aligned}\\ \longrightarrow{\sum_{j=0}^{n} \pi_{j}=0 \bmod N \Rightarrow \sum_{j=0}^{n} \pi_{j}=\beta N \text { for some } \beta}\\ \begin{aligned} &=g_{1}^{\sum_{i=1}^{n} d_{i 1}} \cdot g_{2} \sum_{i=1}^{n} d_{i 2} \ldots g_{l} \sum_{i=1}^{n} d_{d i} \cdot H_{2}(t)^{\beta N} \bmod N^{2} \\ &=g_{A g}^{a_{1} \sum_{i=1}^{n} d_{i 1}} \cdot g_{A g}^{a_{2} \sum_{i=1}^{n} d_{i 2}} \ldots \cdot g_{A g}^{a_{l} \sum_{i=1}^{n} d_{d}} \cdot H_{2}(t)^{\beta N} \bmod N^{2} \\ &=g_{A g}^{a_{1} \sum_{i=1}^{n} d_{i 1}+a_{2} \sum_{i=1}^{n} d_{i 2}+\cdots+a_{l} \sum_{i=1}^{n} d_{i l}} \cdot H_{2}(t)^{\beta N} \bmod N^{2} \end{aligned} \) (2)
3.5 Fault Tolerance Handling
As mentioned in Section 2.1, if some users’ smart meters don’t work well, smart meters might take a break from reporting or reset it later. Here we use \(\widehat{U} \subset U\) to represent the users with broken smart meters and \(\mathrm{U} / \widehat{\mathrm{U}}\) to represent the rest users with normal smart meters. Then these scenarios will greatly influence the character of the blinding factors such that \(\sum_{i \in U / 0} \pi_{i} \neq 0 \bmod N\). This will directly lead to get the wrong aggregation results for the aggregator. Depending on such occasions, we now modify some parts of our scheme to make the aggregation go well.
The initialization phase and the report phase will not change. During the beginning part of the aggregation phase, once the aggregator discovers that the total number of the users’ reports is not n , it then sends the set \(\hat{U}\) to TA. After receiving the set \(\hat{U}\), TA calculates \(\hat{H}=H_{2}(t)_{i \in U}^{\sum \pi_{i}}\) and sends \(\hat{H}\) back to the aggregator. Then the aggregator can continue calculate \(V^{\prime}=\hat{H} \cdot H_{2}(t)^{\pi_{0}} \prod_{i \in \mathrm{U} / \mathrm{U}} C T_{i} \bmod N^{2}\) as (3).
\(V^{\prime}=\left(g_{A g}\right)^{a_{1} \sum_{i \in U / /} d_{i 1}+a_{2} \sum_{i \in U / N} d_{i 2}+\cdots+a_{l} \sum_{i \in U N} d_{i l}}\left(H_{2}(t)\right)^{\beta N} \bmod N^{2}\) (3)
Here we take \(M^{\prime}=a_{1} \sum_{i \in \mathrm{U} / \mathrm{U}} d_{i 1}+a_{2} \sum_{i \in \mathrm{U} / \mathrm{U}} d_{i 2}+\cdots+a_{l} \sum_{i \in \mathrm{U} / \mathrm{U}} d_{i l}\) and R=(H2(t))β. And we can see the report \(V^{\prime}=g_{A g}^{M^{\prime}} R^{N} \bmod N^{2}\) is still a ciphertext of the Paillier Cryptosystem. Thus the aggregator can get M' by taking the decryption algorithm of the Paillier Cryptosystem. By taking the Algorithm 1, the aggregator can recover the summation of the data with the same type (\(D_{1}^{\prime}, D_{2}^{\prime}, \ldots, D_{l}^{\prime}\)) where each \(D_{j}^{\prime}=\sum_{i \in \mathrm{U} / \hat{\mathrm{U}}} d_{i j}\).
4. Extensions
4.1 Extension to Support Time-of-Use Electricity Pricing Mode
Time-of-use electricity pricing mode divides one day into several different time periods and sets up corresponding electricity price for these time periods. In each time period, smart meter collects the total power consumption and then multiplies by the corresponding price to get the electricity fees for the user. So when the aggregator needs to obtain a fixed user’s electricity fees in a certain time period, our scheme can support such demand.
Assume one day is divided into T1,T2,…,Tm and the corresponding price is β1,β2,…,βm. The collection cycle of the aggregator covers w time periods, i.e. from T1 to Tw, from Tw+1 to T2w, and so on. We use \(\Sigma_{i 1}, \Sigma_{i 2}, \ldots, \Sigma_{i m}\) to represent the total electricity usages of a fixed user Ui in each time period. We take the first cycle as an example. It is to say that we will show how the aggregator computes the electricity fees for user Ui during T1 to Tw. In report phase, for j=1,,…,2w-1, Ui computes the ciphertext as \(C T_{i, j}=g^{\beta_{j} \Sigma_{i j}} H_{2}\left(T_{j}\right)^{\pi_{i}} \bmod N^{2}\) and for the last one Tw the ciphertext is modified as \(C T_{i, \omega}=g^{\beta_{\omega} \Sigma_{i \theta}} \hat{H}_{\omega}^{\pi_{i}} \cdot r_{i}^{N} \bmod N^{2}\), where \(r_{i} \in Z_{N}^{*}\) is a random number and \(\hat{H}_{\omega}\) is constructed as (4).
\(\hat{H}_{\omega}=\left(\prod_{j=1}^{\omega-1} H_{2}\left(T_{j}\right)\right)^{-1} \bmod N^{2}\) (4)
In the aggregation phase, the aggregator collects w ciphertexts and aggregates them as following (5).
\(\begin{aligned} C T_{i} &=C T_{i, \omega} \cdot \prod_{j=1}^{\omega-1} C T_{i, j} \bmod N^{2} \\ &=g^{\beta_{1} \sum_{i 1}+\beta_{2} \sum_{i 2}+\cdots+\beta_{\omega} \sum_{i \theta}} \cdot\left(\prod_{j=1}^{\omega-1} H_{2}\left(T_{j}\right)\right)^{\pi_{i}} \cdot\left(\hat{H}_{\omega}\right)^{\pi_{i}} \cdot r_{i}^{N} \bmod N^{2} \\ &=g^{\sum_{j=1}^{\beta} \beta_{j} S_{i j}} \cdot r_{i}^{N} \bmod N^{2} \end{aligned}\) (5)
Therefore the aggregator can get \(\sum_{j=1}^{\omega} \beta_{j} \Sigma_{i j}\) by taking decryption algorithm of the Paillier Cryptosystem. And \(\sum_{j=1}^{\omega} \beta_{j} \Sigma_{i j}\) is the electricity fees for user Ui during T1 to Tw.
4.2 Extension to Support Dynamic Users
In this section, we will discuss the case that some users may join or leave the system at any time period. So we need to modify some parts of the proposed scheme so that the aggregation can still be successful no matter users’ addition or removal.
Here we assume that at any time period, the added users set is Ua and the removed users set is Ub. So in the initialization phase, TA first generates blinding factors \(\left\{\pi_{i}\right\}_{i \in \mathrm{U}_{a}}\) for new users. Then TA gives the aggregator \(\pi_{0}^{\prime}=\pi_{0}-\sum_{a \in \mathrm{U}_{a}} \pi_{a}+\sum_{b \in \mathrm{U}_{b}} \pi_{b}\) as the new blinding factor. Here we take \(\tilde{\mathrm{U}}=\mathrm{U}+\mathrm{U}_{a}-\mathrm{U}_{b}\) as the new updated users. In the report phase, the aggregator will receive a set of reports \(\left\{\sigma_{i}, C T_{i}\right\}_{i \in \tilde{U}}\). Since the batch verification won’t be influenced, so if they are valid for all the signatures, the aggregator computes the value \(V=H_{2}(t)^{\pi_{0}^{\prime}} \prod_{i \in U} C T_{i} \bmod N^{2}\) as (6).
\(\begin{aligned} V &=g_{A g}^{a_{1} \sum_{i \in \tilde{U}} d_{i 1}+a_{2} \sum_{i \in \mathbb{U}} d_{i 2}+\cdots+a_{l} \sum_{i \in \tilde{U}} d_{i l}} H_{2}(t)^{\pi_{0}^{\prime}+\sum_{j \in \tilde{U}} \pi_{j}} \bmod N^{2} \\ &=g_{A g}^{a_{1} \sum_{i \in \tilde{U}} d_{i 1}+a_{2} \sum_{i \in U} d_{i 2}+\cdots+a_{l} \sum_{i \in \tilde{U}} d_{i l}} H_{2}(t)^{\pi_{0}^{\prime}+\sum_{i=1}^{n} \pi_{i}+\sum_{j \in U_{a}} \pi_{a}-\sum_{j \in U_{b}} \pi_{b}} \bmod N^{2} \\ &=g_{A g}^{a_{1} \sum_{i \in \tilde{U}} d_{i 1}+a_{2} \sum_{i \in \mathbb{U}} d_{i 2}+\cdots+a_{l} \sum_{i \in \tilde{U}} d_{i l}} H_{2}(t)^{\sum_{i=0}^{n} \pi_{i}} \bmod N^{2} \\ &=g_{A g}^{a_{1} \sum_{i \in \tilde{U}} d_{i 1}+a_{2} \sum_{i \in \mathbb{U}} d_{i 2}+\cdots+a_{l} \sum_{i \in \tilde{U}} d_{i l}} H_{2}(t)^{\beta N} \bmod N^{2} \end{aligned}\) (6)
Then the aggregator takes the decryption algorithm of the Paillier Cryptosystem and Algorithm 1 to get (D1, D2, …, Dl) where \(D_{j}=\sum_{i \in \mathbb{U}} d_{i j}\). We can see that when users join the system, TA only needs to generate blinding factors for the new users and change the blinding factor for the aggregator. And when users leave the system, we only need TA to generate new blinding factor for the aggregator. All the other users do not need to change anything for aggregation. So it’s obvious that this scheme can handle the dynamic user scenarios in an efficient way.
5. Security and Performance Analysis
5.1 Security Analysis
Against External Attackers: The communication flows from users to the aggregator can be eavesdropped by external attackers but they cannot get users’ electricity consumption information from the encrypted data. That is to say, although external attackers can get the data \(C T_{i}=g_{1}^{d_{i 1}} \cdot g_{2}^{d_{i 2}} \cdots \cdots g_{l}^{d_{i}} \cdot\left(H_{2}(t)\right)^{\pi_{i}} \bmod N^{2}\) from user Ui, they can’t get the information about (di1, di2, …, dil) because πi is unknown to attackers. So external attackers can’t get the value \(\left(H_{2}(t)\right)^{\pi_{i}}\). And they can’t know (di1, di2, …, dil). Therefore, the electricity consumption information about users is secure against the external attackers.
Against Internal Attackers: As we mentioned before in the system model, the aggregator is untrusted. So here we’d like to refer the aggregator as the internal attacker. We can see that the aggregator obtains all the users’ encrypted data \(\left\{C T_{i}\right\}_{i=1}^{n}\) and the value (D1, D2, …, Dl). First the aggregator can’t get each user’s electricity usage data because the value πi is unknown to the aggregator. So the aggregator can’t know \(\left(H_{2}(t)\right)^{\pi_{i}}\) the value which means the aggregator won’t know (di1, di2, …, dil). Second, although the aggregator gets the aggregated result \(D_{j}=\sum_{i=1}^{n} d_{i j}\), it still unable to obtain personal user electricity consumption data (di1, di2, …,dil). So the electricity consumption information about users is secure against the internal attackers.
Traceability: If the batch verification fails, our scheme has the ability to find out which users’ signatures are invalid by finding invalid signature algorithm in [18].
Unforgeability: The signature part of our scheme is unforgeable under the assumption of the standard CDH problem. We give the proof of unforgeability as follow.
Proof of Unforgeability
In random oracle model, the signature part of our scheme is existentially unforgeable under the assumption of the standard CDH problem in multiplicative cyclic groups. According to the theorem 3.2 in [21], we prove our signature is unforgeable as follows.
Definition 1. Order-q group G1 is a (t',ε')-bilinear group if G1 satisfies the following properties:
• A group G2 of order q and a bilinear map e:G1×G1→G2 exist, and e is computable in time at most t'.
• No algorithm ε-breaks CDH on G1.
Definition 2. In a chosen-message attack [22], if a signature scheme (KeyGen,Sign,Verify) satisfies existential unforgeability, the scheme is defined by the following game between an adversary A and a challenger:
Setup. The challenger can get a public key PK and private key SK by running algorithm KeyGen. The adversary A can obtain PK.
Queries. A can adaptively requests at most qs messages \(M_{1}, \ldots, M_{q_{s}} \in\{0,1\}^{*}\) with PK. The challenger responds to each signature σi=Sign(SKi,Mi) which is queried.
Output. If M is not in (\(M_{1}, \ldots, M_{q_{s}}\)) and Verify (PK,M,σ) = valid , A outputs a pair (M,σ) and wins the game
AdvSigA is defined as the probability that wins in the above game, taken over the coin tosses of and of A.
Definition 3. If a forger makes at most qs signature queries and \(q_{H_{1}}\) queries for the hash function H1 and \(q_{H_{3}}\) queries for the hash function H3, A is called (\(t, q_{H_{1}}, q_{H_{3}}, q_{s}, \varepsilon\)) - breaks a signature scheme in time at most t with AdvSigA being at least ε. If no forger (\(t, q_{H_{1}}, q_{H_{3}}, q_{s}, \varepsilon\)) -breaks a signature scheme, the signature scheme is (\(t, q_{H_{1}}, q_{H_{3}}, q_{s}, \varepsilon\)) - existentially unforgeable under an adaptive chosen-message attack.
Theorem 1. Let G1 be a (t',ε')-multiplicative cyclic group of order q. Under an adaptive chosen-message attack, the signature scheme proposed on G1 is (\(t, q_{H_{1}}, q_{H_{3}}, q_{s}, \varepsilon\))-secure against existential forgery.
Proof. If a forger can (\(t, q_{H_{1}}, q_{H_{3}}, q_{s}, \varepsilon\))-breaks the signature scheme, there is a t'-time algorithm B to solve standard CDH on G1 with probability ε' at least.
Algorithm B can give an instance (q,g,ga,gb) of the CDH problem to output gab with a generator g of G1. Algorithm B simulates the challenger and interacts with forger A as follows:
Setup. Algorithm starts by giving the generator and the public key Yi=ga.
H-queries. Algorithm A can query the random oracle at all times.
For H1-query on tj:
1. If the query tj already appears on the H1-list in a tuple (tj, H1-coinj, kj, lj) then algorithm B retrieves (kj, lj) from H1-list.
2. Otherwise, a random coin H1-coinj∈{0,1} will be generated by B. If H1-coinj=1, B generates \(k_{j} \in Z_{q}^{*}\) and lj=0; else B generates \(k_{j}, l_{j} \in Z_{q}^{*}\). B logs (tj, H1-coinj, kj, lj) in the H1-list.
3. responds with \(W=H_{1}\left(t_{j}\right)=g^{k_{j}} \cdot\left(g^{b}\right)^{l_{j}}\).
For H3-query on CTi:
1. If the CTi query has already appeared on the H3-list in a tuple (CTi, ni), algorithm B retrieves ni from H3-list.
2. Otherwise, generates \(n_{i} \in Z_{q}^{*}\) and logs (CTi, ni) in the H3-list.
3. B responds hi=H3(CTi)=ni.
Signature Queries. While requests a signature on (CTi, tj), makes the following response to the query.
1. Algorithm B can obtain H1(tj)by running the above algorithm for responding to H1- queries. The corresponding tuple is (tj, H1-coinj, kj, lj) on the H1-list. When H1-coinj=0, let abort.
2. Otherwise, we know H1-coinj=1 and hence W=H1(tj)=gkj and hi=H3(CTi)=ni. Define σi=Vi=(ga)kjni. Observe that \(e\left(g, \sigma_{i}\right)=e\left(W, Y_{i}^{h_{i}}\right)\) and so σi is a valid signature on (CTi, tj). Algorithm A can get σi from algorithm B.
Output. Finally, algorithm products a signature σf on (CTf, ts) and CTf is without signature query. If there is no tuple containing CTf and ts on the H1-list and H3-list, B issues a query for H1(ts) and H3(CTf) by itself to ensure there is such a tuple. Assume σf is a valid signature on (CTf, ts); if not, B aborts. Then the tuple (ts, H1-coin, k, l) can be found by algorithm B on the H1-list and the tuple (CTf, n) on the H3-list. When H1-coin=1, B aborts. Otherwise, B can derive its CDH problem answer from the following (7).
\(\begin{aligned} \sigma_{f} &=W^{a h} \\ &=H_{1}\left(t_{s}\right)^{a H_{3}\left(C T_{f}\right)} \\ &=\left(g^{k} \cdot\left(g^{b}\right)^{l}\right)^{a n} \\ &=\left(g^{k+b l}\right)^{a n} \\ &=g^{a n k} \cdot g^{a b n l} \end{aligned}\) (7)
The description of algorithm B is completed.
Data Integrity: Since the signature part of our scheme is proved secure under the CDH problem in the random oracle model, data integrity can be guaranteed.
Batch Verification Security: Our scheme satisfies batch verification security and it can be proved in the following.
Proof of Batch Verification
According to the security proof of [19], first Verify (M1,SK1,σi) =…= Verify (Mn,SKn,σn)=1 implies that Batch((M1,SK1,σi),…,(Mn,SKn,σn))=1. Let vector △=(δ1,δ2,…,δn) where each δ1 is ℓb bits random element in \(\mathrm{Z}_{q}^{*}\). The following (8) represent the verification equation.
\(\begin{aligned} e\left(\prod_{i=1}^{n} \sigma_{i}^{\delta_{i}}, g\right) &=\prod_{i=1}^{n} e\left(\sigma_{i}^{\delta_{i}}, g\right) \\ &=\prod_{i=1}^{n} e\left(H_{1}(t)^{S K_{i} H_{3}\left(M_{i}\right)}, g\right) \\ &=\prod_{i=1}^{n} e\left(H_{1}(t)^{H_{3}\left(M_{i}\right)}, g^{s K_{i}}\right) \\ &=\prod_{i=1}^{n} e\left(H_{1}(t)^{H_{3}\left(M_{i}\right)}, P K_{i}\right) \end{aligned}\) (8)
Since σi, H1(t)H3(Mi) and PKi are all in G1 for all i , we can rewrite σi=gσi, H1(t)H3(Mi)=gri and PKi=gXi for all \(\alpha_{i}, r_{i}, X_{i} \in Z_{q}^{*}\). So the verification equation can be rewritten to (9).
\(\begin{aligned} e\left(\prod_{i=1}^{n} g^{\alpha_{i} \delta_{i}}, g\right) &=\prod_{i=1}^{n} e\left(g^{f_{i} \delta_{i}}, g^{x_{i}}\right) \\ & \Rightarrow e(g, g)^{\sum_{i=1}^{n} \delta_{i} \alpha_{i}}=e(g, g)^{\sum_{i=1}^{n} \delta_{i} r_{i} X} \end{aligned}\) (9)
If we set βi=αi-riXi, then we can obtain (10).
\(\begin{aligned} e(g, g)^{\sum_{i=1}^{n} \delta_{i} \alpha_{i}-\sum_{i=1}^{n} \delta_{i} r_{i} x_{i}} &=1 \\ & \Rightarrow \sum_{i=1}^{n} \delta_{i} \alpha_{i}-\sum_{i=1}^{n} \delta_{i} r_{i} X_{i} \equiv 0 \bmod q \\ & \Rightarrow \sum_{i=1}^{n} \delta_{i} \beta_{i} \equiv 0 \bmod q \end{aligned}\) (10)
If Batch((M1,SK1,σi),…,(Mn,SKn,σn))=1, but we find out that for some j, there exists such a situation that Verify (Mj,SKj,σj)=0. Because q is a prime, so βj has an inverse γj such that \(\beta_{j} \gamma_{j} \equiv 1 \bmod q\). Thus, we can set j =1 and \(\delta_{1} \equiv-\gamma_{1} \sum_{i=2}^{n} \delta_{i} \beta_{i} \bmod q\). And now we can see that Verify (M1,SK1,σi)=0 but Batch((M1,SK1,σi),…,(Mn,SKn,σn))=1. Obviously this breaks batch verification. So we define E be an event that Verify (M1,SK1,σi)=0 holds but Batch((M1,SK1,σi),…,(Mn,SKn,σn))=1. Note that we make no assumptions about the remaining values.
Let the last n-1 values of △ be △'=δ2,…,δn and |△'| denotes the number of possible values for this vector. From above we know there is exactly one value of δ1 for a fixed vector △', which will make event happen. Given by randomly chosen, the probability of E is Pr[E|△']=2ℓb. So, when we choose at random and sum up all possible choices of △', \(\operatorname{Pr}[E] \leq \sum_{i=1}^{|\Delta|}\left(\operatorname{Pr}\left[E / \Delta^{\prime}\right] \cdot \operatorname{Pr}\left[\Delta^{\prime}\right]\right)\) can be obtained. When the values are plugged, we can obtain \(\operatorname{Pr}[E] \leq \sum_{i=1}^{2^{\iota^{b}(n-1)}}\left(2^{-\iota_{b}} \cdot 2^{-l_{b}(n-1)}\right)=2^{-\iota_{b}}\). The advantage is negligible for valid batch verification over invalid signature.
5.2 Performance Analysis
Comparison. In this part, our scheme is compared with other schemes [1, 5, 9, 13] about some security features as in Table 1.
Table 1. Comparison of Features
Computational Cost. In this part, we will evaluate the computational complexity of our scheme in two aspects. One is the computational costs of each user and the aggregator throughout the whole process. The other is the computational costs about the data aggregation and batch verification. The following is a detailed description.
First, our scheme is compared with Lu et al.’s scheme [1] about the computational costs of each user and the aggregator throughout the whole process. As for our scheme, when one user generates a ciphertext CTi of his electricity usage data (di1, di2, …, dil), it requires (l+1) exponentiation operations and l multiplication operations. Moreover, it requires 1 exponentiation operation and 1 multiplication operation to generate the signature σi for the user. Therefore, a total of l+2 exponentiation operations and l+1 multiplication operations are required for user. When the aggregator receives the total n ciphertexts from users, it requires 2 pairing operations, 2n exponentiation operations and 2(n-1) multiplication operations for batch verification. And the aggregator requires 1 exponentiation operation and n multiplication operations for generating the value V. As for decryption, it requires 1 paillier cryptosystem decryption operation [1]. Therefore, the aggregator requires a total of 2 pairing operations, 2n+1 exponentiation operations, 3n-2 multiplication operations and 1 paillier cryptosystem decryption operation. In [1], the local GW verifies the signatures and aggregates the encrypted data while OA recovers the aggregated report and gets the total aggregated data. So the computational complexity of the aggregator in our scheme will be compared with the GW and OA together in Lu et al.’s scheme [1]. According to [1], we make the comparison in Table 2. This paper defines the computational complexity costs of a pairing operation, an exponentiation operation, a multiplication operation and Paillier Cryptosystem Decryption by Cp, Ce, Cm and Cpai respectively.
Table 2. Comparison of Computational Complexity
Furthermore, Table 3 shows the time costs of all operations according to [13].
Table 3. Time Costs of Operation
According to the operating costs, Fig. 3(a) describes the variation of each user’s computational costs and Fig. 3(b) describes the variation of the aggregator’s computational costs in terms of l. we can see that there is an unknown number n represents the total user numbers in Table 2. In Fig. 3(b), we simply assume n=100. In reality, the total users’ number is larger; the advantage will be more obvious in our scheme. So it is clear that we distinctly reduce the computational complexity.
Fig. 3. Computational cost of each user and the aggregator
Second, we compare our scheme with the schemes in [1,13] in the computational costs about the data aggregation and batch verification. For homogeneity, we set l=1 both in our scheme and in [1]. Our scheme requires 2 exponentiation operations and 1 multiplication operation to calculate CTi for each user and 1 exponentiation operation and n multiplication operations for the aggregator to aggregate ciphertexts of all users. Therefore, our scheme requires 2n+1 exponentiation operations and 2n multiplication operations in all. According to the schemes in [1,13], the computational costs of aggregation are (3n+2)Ce +3nCm and (4n+2)Cp+(2n+2)Ce +(3n+3)Cm respectively. We make the comparison in Table 4 and Fig. 4.
Table 4. Comparison of Aggregation
Fig. 4. Performance Comparison of Aggregation
As for batch verification, our scheme requires 2 pairing operations, 2n exponentiation operations and 2(n-1) multiplication operations for aggregator to batch verify all signatures. According to the schemes in [1,13], the computational costs of batch verification are (n+1)Cp+(2n+1)Ce+(n+1)Cm and (n+1)Cp+(n+1)Cm respectively. We make the comparison in Table 5 and Fig. 5.
Table 5. Comparison of Batch Verification
Fig. 5. Performance Comparison of Batch Verification
Communication Overhead. In this part, we will evaluate the computation overhead of user-to- aggregator in our scheme. Each user generates the encrypted data CTi and σi, then sends them to the aggregator. So the size should be Sz=|CTi|+|σi|. If N is 1024-bit and Gi is 160-bit, then the size Sz=2048+160. So the total communication overhead is S=n·Sz from user to aggregator for n users and l types electricity usage data. For each dimensional data, each user generates a 2048-bit ciphertext in [13]. So if they have to transmit l types data, the communication overhead is S'=(2048·l+160)·n in total. We plot the communication overhead of Fan et al.’s scheme [13] in Fig. 6(a) and our scheme in Fig. 6(b) according to user number n and data type l.
Fig. 6. Communication Overhead between User and Aggregator
From the above performance analysis, our scheme obviously meets more security features but has less computational complexity and lower communication overhead. So our scheme is suitable to be applied into the smart grid communications.
6. Conclusion
In this paper, a data aggregation scheme is proposed. Our scheme has many excellent properties. First, we take the multidimensional data which is rarely mentioned in researches on smart grid into account. Second, though each user has multidimensional data, we use the Paillier Cryptosystem to encrypt the multidimensional data as a whole and take advantage of the homomorphic property to achieve data aggregation demand. Third, we apply blinding factor technique into our scheme so our scheme can resist the internal attackers on the security level. Fourth, our scheme is able to support fault tolerance so that even some smart meters don’t work, the aggregation process can still work well. Fifth, we construct efficient batch verification that reduces the computational complexity from 2n to 2 pairing operations. Sixth, our batch verification is suitable to use the technique in [18] to find invalid signatures if the batch verification fails. Seventh, our scheme can be extended to support time-of-use electricity pricing mode and dynamic users. Eighth, we provide security analysis that our scheme can resist external attackers and internal attackers and give detailed proof of unforgeability security and batch verification security. Ninth, through performance analysis, the computational costs and communication overhead can be significantly reduced in our scheme. In the future, we will study the possible attack named human-factor-aware differential aggregation attack and extend our scheme to resist such attack.
References
- R. Lu, X. Liang, X. Li, X. Lin, and X. Shen, "EPPA: An efficient and privacy-preserving aggregation scheme for secure smart grid communications," IEEE Transations on Parallel Distributed Systems, vol. 23, no. 9, pp. 1621-1631, Sep. 2012.
- R. Anderson and S. Fuloria, "Who controls the off switch?" in Proc. of IEEE International Conference on Smart Grid Communicaions, pp. 96-101, 2010.
- W. Jia, H. Zhu, Z. Cao, X. Dong, and C. Xiao, "Human-factor-aware privacy-preserving aggregation in smart grid," IEEE Systems Journal, vol. 8, no. 2, pp. 598-607, June 2014. https://doi.org/10.1109/JSYST.2013.2260937
- P. Paillier, "Public-Key Cryptosystems Based on Composite Degree Residuosity Classes," in Proc. of Advances in Cryptology-EUROCRYPT, vol. 1592, pp. 223-238, 1999.
- F. Li, B. Luo, and P. Liu, "Secure information aggregation for smart grids using homomorphic encryption," in Proc. of the 1st IEEE International Conference on Smart Grid Communications, pp. 327-332, 2010.
- F. D. Garcia and B. Jacobs, "Privacy-friendly energy-metering via homomorphic encryption," in Proc. of Security and Trust Management, pp. 226-238, 2010.
- H. Li, X. Lin, H. Yang, X. Liang, R. Lu, and X. Shen, " EPPDR: An efficient privacy-preserving demand response scheme with adaptive key evolution in smart grid," IEEE Transactions on Parallel Distributed Systems, vol. 25, no. 8, pp. 2053-2064, Aug. 2014. https://doi.org/10.1109/TPDS.2013.124
- F. Borges, and M. Muhlhauser, "EPPP4SMS: Efficient privacy-preserving protocol for smart metering systems and its simulation using real-world data," IEEE Transactions on Smart Grid, vol. 5, no. 6, pp. 2701-2708, Nov. 2014. https://doi.org/10.1109/TSG.2014.2336265
- L. Chen, R. Lu, and Z. Cao, "PDAFT: A privacy-preserving data aggregation scheme with fault tolerance for smart grid communications," Peer-to-Peer Networking Applications, vol. 8, no. 6, pp. 1122-1132, Nov. 2015.
- K. Shim and C. Park, "A secure data aggregation scheme based on appropriate cryptographic primitives in heterogeneous wireless sensor networks," IEEE Transactions on Parallel Distributed Systems, vol. 26, no. 8, pp. 2128-2139, Aug. 2015. https://doi.org/10.1109/TPDS.2014.2346764
- S. B. Othman, A. A. Bahattab, A. Trad, and H. Youssef, "Confidentiality and integrity for data aggregation in WSN using homomorphic encryption," Wireless Personal Communications, vol. 80, no. 2, pp. 867-889, Jan. 2015. https://doi.org/10.1007/s11277-014-2061-z
- S. Verma, P. Pillai, and Y. F. Hu, "Energy-efficient privacy homomorphic encryption scheme for multi-sensor data in WSNs," in Proc. of the 7th IEEE International Conference on Communication Systems and Networks (COMSNETS), pp. 1-6, 2015.
- C. Fan, S. Huang, and Y. Lai, "Privacy-enhanced data aggregation scheme against internal attackers in smart grid," IEEE Transactions Industrial Informatics, vol. 10, no. 1, pp. 666-675, Feb. 2014. https://doi.org/10.1109/TII.2013.2277938
- V. Rastogi and S. Nath, "Differentially private aggregation of distributed time-series with transformation and encryption," in Proc. of ACM SIGMOD International Conference on Management of data, pp. 735-746, 2010.
- E. Shi, T. H. H. Chan, E. Rieffel, R. Chow, and D. Song, "Privacy-preserving aggregation of time-series data," in Proc. of NDSS Symposium, 2011.
- H. Bao and R. Lu, "A lightweight data aggregation scheme achieving privacy preservation and data integrity with differential privacy and fault tolerance," Peer-to-Peer Networking and Applications, vol. 10, no. 1, pp. 106-121, Sep. 2015.
- H. Bao and R. Lu, "A new differentially private data aggregation with fault tolerance for smart grid communications," IEEE Internet of Things Journal, vol. 2, no. 3, pp. 248-258, June 2015. https://doi.org/10.1109/JIOT.2015.2412552
- L. Law and B. J. Matt, "Finding invalid signatures in paring-based batches," in Proc. of IMA International Conference on Cryptography and Coding, pp. 34-53, 2017.
- J. Camenisch, S. Hohenberger, and M. Pedersen, "Batch verification of short signatures," Journal of Cryptology, vol. 25, no. 4, pp. 723-747, Oct. 2011. https://doi.org/10.1007/s00145-011-9108-z
- M. Bellare, J. A. Garay, and T. Rabin, "Fast batch verification for modular exponentiation and digital signatures," in Proc. of International Conference on the Theory and Applications of Cryptology-EUROCRYPT, vol. 1403, pp. 236-250, 1998.
- D. Boneh, B. Lynn, and H. Shacham, "Short signatures from the Weil pairing," Journal of Cryptology, vol. 17, no. 4, pp. 297-319, July 2004. https://doi.org/10.1007/s00145-004-0314-9
- S. Goldwasser, S. Micali, and R. Rivest, "A digital signature scheme secure against adaptive chosen-message attacks," SIAM Journal Computing, vol. 17, no. 2, pp.281-308, 1988. https://doi.org/10.1137/0217017