1. Introduction
Cloud storage not only allows clients to access their outsourcing data anytime and anywhere, but also charges them a small fee, therefore, more and more people turn to upload data onto it. The cryptography technology provides a technical support on the confidentiality and privacy of these outsourcing data. However, simple cryptography encryption algorithm will hinder the search capability on these encrypted outsourcing data. To address this issue, Song et al. are the first people to propose the concept of searchable encryption (SE) [1]. Because it uses symmetric encryption technology in their paper, it can be seen as a searchable symmetric encryption scheme (SSE).
The data owner and server are the two main participants in SSE. The data owner uses a symmetric encryption algorithm to encrypt data, then he (or she) uploads them on the cloud. When searching the data that contain keyword w, he (or she) encrypts this keyword w by using secret key, and generates a search token t(w), which will be sent to the cloud server. The cloud server computes search results by using ciphertexts and t(w), and sends them to the data owner. Lastly, the data owner decrypts these search results locally.
There are many constructive SSE schemes in recent years, such as schemes supporting single keyword [2,3], multiple keyword [4,5,6,7], fuzzy matching [8,9], ranked search [10,11,12], dynamic SSE schemes [13,14,15,16,17,18,19,20,21], parallel SSE scheme [22], and the scheme that supports multi-level access policy [23]. In addition, Bӧsch et al. did a comprehensive survey of SSE [24].
Some researchers also consider the security level problem in SSE. For example, to against malicious adversary, Kurosawa et al. used the message authentication code technology [25], while Cheng et al. used indistinguishability obfuscation [26]. Dai et al. used the physically unclonable function to prevent memory leakage [27]. While, Li et al. introduced the coercer into SSE [28].
In the above schemes, the cloud server usually is trustworthy, who directly controls the users' data. Although this third party is trustworthy, sometimes it will damage the user's data for its personal benefit. For example, it may tamper with users' data to save its space. Once it happens, the users cannot get true search results. What's worse, if the third party deletes the data that used to verify whether the results are right or not, the users will never judge the correctness of returned results. In order to solve these problems, a simple solution is that the user selects multiple cloud storage platforms to store his (or her) data. He (or she) can perform search on these platforms respectively, and merges the search results together. But this method will waste a lot of network traffic and bandwith.
However, the blockchain technology can provide a potential solution to the above issue. The blockchain is an emerging technology in recent years, which is stemmed from the Bitcoin system [29] but can be seen as an independent technology. It is composed of blocks one after another. The data is collected and verified by nodes on the blockchain. Only it is accepted by most of nodes, it can be stored in one block. Users can access these data freely, but they cannot tamper with them because the blockchain uses some tools, such as the cryptography hash function and so on.
The data on the blockchain is maintained by everyone. The modified data cannot be accepted as long as the majority of nodes are honest. Therefore, we can use this technology to build a cloud storage system to ensure the data integrity. This it to say, users can store their data on a blockchain in the form of transactions. Consequently, except accessing the data flexibly, they do not have to worry about their data being tampered with by illegal users.
Because the size of each block on the blockchain is fixed, the number of data stored in it is limited. When more and more data are generated, the length of the whole blockchain continues to raise. As a result, the problem of how to search data on the blockchain becomes intractable. Taking the Bitcoin system for example, the data on this blockchain are transactions, whose size is small. If Alice wants to find transactions she finished in a certain period of time, she has to find them in the order from back to front. Suppose there are |T | transactions on the blockchain, therefore, the search efficiency is linear in O(|T |).
It is very interesting to consider the privacy security of data and improve the search complexity on the blockchain. Because it not only protects the privacy of data, but also can guarantee the correctness of the search results. Moreover, it can save users’ time. Take the electronic medical systems as an example, at present each hospital keeps the electronic medical records (EMRs) of their patients privately, which can be seen as a private cloud server. These hospitals do not share EMRs to each other. When a patient chooses a new hospital to see a doctor, because he or she cannot obtain all his or her EMRs in time, his or her illness may not be treated quickly. However, this dilemma can be avoided by using blockchain. That is, each hospital uploads the patients' EMR onto the blockchain in time. The patient then can find his (or her) EMRs at any time and does not need to interact with previous hospital respectively. This scenario was mentioned by Swan in [30]. However, it did not give an effective solution.
Our contribution. In this paper, we combine blockchain with SSE, and give a solution to protect the privacy of data and realize search. Our contributions are summarized as below:
- We propose a SSE framework on the blockchain and name it SSE-on-BC, which can better guarantee the integrity of the data and resist the malicious adversary.
- We construct two schemes based on the size of data. Because the smart contract can verify data automatically on the blockchain, the data owner in our schemes can fully believe that the returned search results are correct.
- We complete the security and performance analyses for our schemes, which show that our schemes is adaptively secure and feasible.
Organization. The remainder of this paper is organized as follows. In section 2, we review some tools and notations. In section 3, the SSE-on-BC model and its security definition are proposed. There are two concrete constructions in section 4. Next are the analyses of performance and security of our schemes. The conclusion is present in the last section.
2. Preliminaries
We will review some tools and notations in this section. It mainly includes negligible function, the model of SSE, Bitcoin system, and so on.
Definition 1. A function f (·) is negligible if for every polynomial p (·) there exists an integer N such that for all integer n>N it holds that \(f(n)<\frac{1}{p(n)}\) .
2.1 The model of SSE
In Fig. 1, there have two players: the data owner and the cloud server. In the first stage, the data owner uses his secret key k to encrypt data D into C, and builds an invertible index I, which are sent to the cloud server. When searching data containing the keyword w, the data owner combines the secret key k with w, and gets a search token tw, which is sent to the cloud server. The cloud server returns the search result Cij . Lastly, he decrypts Cij locally.
Fig. 1. Traditional SSE Model.
2.2 Bitcoin system
To make readers understand blockchain clearly, in this section, we review some knowledge about the Bitcoin system.
The addresses and transactions are two important elements in the Bitcoin system. To create a transaction, each client must generate a pair of keys (i.e., a private key and a public key) firstly. The private key is used to sign transaction, and gets a signature σ. The public key is used to generate an address and verify whether the σ is valid or not [31]. Compared with traditional electronic cryptocurrency [32, 33], the Bitcoin supports change. To make reader understand clearly, we will use symbol A = (A.pk, A.sk) to denote a key pair of user A. Let σ=sigA (m) denote a signature about transaction m, which is computed by A’s private key A.sk, and a verification result about signature σ denoted by verA (m,σ), which is computed by A’s public key A.pk.
A transaction T may have multiple inputs and outputs. The inputs show where these coins come from. The outputs indicate how much money should be given to each recipient, which is represented by an address. Each transaction will have an in-script and an out-script, and both of them are written in Bitcoin scripting language, i.e., the stack based language [34]. Generally, if transaction T wants to redeem transaction Tx, its in-script must match with the out-script of Tx.
Fig. 2. The construction process of transaction Tx.
Let BTC represent the bitcoin cryptocurrency symbol. To make the reader understand clearly, we will use Fig. 2 to explain how the transaction works. Suppose Alice wants to pay Bob v BTC = v1BTC + v2 BTC (here we do not consider the transaction fee), she needs to create a transaction Tx. She finds two unredeemed transactions Ty1 and Ty2 from her wallet, such that v = v1 + v2. In order to show she can spend these money, she puts her signatures σ1 and σ2 in the in-script of Tx. Alice adds a function \(\pi\)x (body,σ) in the out-script of transaction Tx to indicate she will transfer v BTC to Bob, whose output is a Boolean.
Generally, we can use Tx=(y1,y2,\(\pi\)x,v,σ1,σ2) to denote the transaction Tx, where y1 is a hash of Ty1 and y2 is a hash of Ty2. In addition, the clients can specify a time t in a transaction, which means that this transaction will be collected by miners after time t. In the Bitcoin system, if a transaction wants to be accepted earlier, it needs to pay some transaction fees. That is, v1 + v2 > v usually holds, and the difference between them is the transaction fees.
Besides, we enumerate the meanings of some functions and symbols that we will use later, which are shown in Table 1.
Table 1. Notations used in our SSE-on-BC scheme.
3. Our System Model
In this section, the SSE-on-BC model is firstly presented, the following is its security definition.
3.1 The model of SSE-on-BC
It have two participants in Fig. 3 the data owner U and the server S (i.e., a receiver of transaction). The data owner U has n data D1,···, Dn. To protect their privacy, he will encrypt them into C1,···, Cn by using symmetric encryption algorithm. He then uploads them on the blockchain in the form of transactions T1,···,Tn respectively. He then creates a transaction Inx based on these transactions T1,···,Tn. To find the data containing the keyword w, he puts the search token t(w) and the identifier TXInx of transaction Inx into function Φ, and embeds function Φ into transaction t. He then broadcasts it on the blockchain. If the server S can provide correct search results, it can redeem transaction t by using transaction s. Otherwise, the data owner U will use transaction p to redeem transaction t.
Fig. 3. The Model of SSE-on-BC.
Our SSE-on-BC model (i.e., SSE-on-BC= (Gen, Enc, Trpdr, Search, Dec)) contains the following five polynomial-time algorithms:
(a) (K, U, S)← Gen (1k): It is a probabilistic algorithm run by the data owner U and the server S. The inputting parameter is k, and the outputs are a secret key K, a pair of keys U=(U.pk,U.sk) and a pair of keys S=(S.pk,S.sk).
(b) (T, Inx, TXInx)← Enc (K, U, D,\(\left\{T_{d i}\right\}_{i=0}^{n}\) ): is a probabilistic algorithm run by the data owner U. It inputs the secret key K, the pair of keys U, the documents set D = (D1,···, Dn) and n+1 unredeemed transactions Td0, …, Tdn, and outputs n+1 transactions T = {T1,···,Tn}, Inx. Besides, the data owner needs to store the identifier TXInx of transaction Inx locally.
(c) t← Trpdr (K, w, U, TXInx, Tw): It is a determinate algorithm, which is run by the data owner U. The inputs are the secret key K, the pair of keys U, keyword w, identifier TXInx and an unredeemed transaction Tw. The output is a transaction t, whose receiver is either U or the server S.
(d) s/p← Search (T, Inx, TXInx ,t, S/U): It is run either by the server S or the data owner U. If the server S can provide correct search results, it needs to take T, Inx, TXInx , t, S as input, and outputs a transaction s. Otherwise, the data owner inputs U and t to output a transaction p, which can be used to redeem transaction t.
(e) {Dij}← Dec (K, s): It is a decryption algorithm run by the data owner U. The inputs are the secret key K and the transaction s, and it outputs the plaintexts {Dij} locally.
A SSE-on-BC scheme is correct if for all k\(\in\)N, for all K, U, S output by Gen(1k), for all data \(\boldsymbol{D} \subseteq 2^{A}\) , for all (T, Inx, TXInx) output by Enc(K, U, D, \(\left\{T_{d i}\right\}_{i=0}^{n}\) ), for all keyword \(w \in \Delta\), such that
\( {Search }\left(\mathbf{T}, {Inx}, T X_{\mathrm{I}nx}, {Trpdr}\left(K, w, \mathbf{U}, T X_{\mathrm{I}nx}, T_{w}\right), \mathbf{S}\right)=s \wedge \operatorname{Dec}(K, s)=\left\{D_{i j}\right\}, \text { for } 1 \leq i \leq n.\) (1)
3.2 Security Definition
A secure SSE-on-BC scheme should satisfy the following conditions.
• The server S cannot derive any useful information about the plain data when it accesses to the blockchain for the first time;
• After search, in addition to the search results, the server S also cannot get any useful information about plaintexts and keywords.
• If the server S cannot return the right search results to the data owner U, it cannot redeem the transaction t created by the data owner U.
Fig. 4. Game \(\operatorname{Re} a l_{A}^{\mathrm{n}}(k)\) .
Adversary either is adaptive or non-adaptive. When the adversary is adaptive, it can select keyword based on the previous keywords and search results. When the adversary is non-adaptive, it should choose all the keywords at once. In this paper, we only consider the former.
Definition 2. Let Π=(Gen, Enc, Trpdr, Search, Dec) denote a SSE-on-BC scheme, L be a leakage function that is parameterized by access pattern, search pattern and size pattern defined in [3], k be the security parameter. Considering the following games \(\operatorname{Re} a l_{A}^{\mathrm{n}}(k)\) and \(\text {Ideal}_{A, s}^{\mathrm{II}}(k)\) shown in the Fig. 4 and 5.
Fig. 5. Game \(\text {Ideal}_{A, s}^{\mathrm{II}}(k)\) .
We say a SSE-on-BC scheme is adaptively semantically secure if for all polynomial size adversaries A=(A0, A1,, Aq) where q=poly(k), there exists a non-uniform polynomial size simulator S=(S0, S1,, Sq), such that for all polynomial size D,
\(| {Pr}\left[\mathrm{D}\left(V, s t_{\mathrm{A}}\right)=1:\left(V, s t_{\mathrm{A}}\right) \leftarrow \operatorname{Re} a l_{A}^{\mathrm{II}}(k)\right]-\operatorname{Pr}\left[\mathrm{D}\left(V, s t_{\mathrm{A}}\right)=1:\left(V, s t_{\mathrm{A}}\right) \leftarrow \text {Ideal}_{A, s}^{\mathrm{II}}(k)\right] | \leq n e g(k),\) (2)
where the probabilities are taken over the coins of Gen and Enc.
4. The detailed scheme
Since the size of each block on the blockchain is limited, we should consider the size of the data before uploading. To solve this problem, we present two concrete constructions in this section.
4.1 A SSE-on-BC scheme supports lightweight data
Suppose the size of data array D=(D1,···, Dn) is small. In order to upload them on the blockchain, the data owner U will do the following steps:
(a) Gen: After inputting a security parameter k, the data owner U gets a secret key array K= (K1,K2), where Ki ← {0,1}k (i =1, 2). Besides, the data owner U and the server S generate a pair of keys U= (U.sk,U.pk) =(u1,gu1) and a pair of keys S= (S.sk,S.pk) = (s1,gs1) respectively, where u1,s1\(\in\)Zp and gu1, gs1\(\in\)G.
(b) Enc: For each document \(\mathrm{D}_{j}(1 \leq j \leq n),\) the user computes:
\(C_{i}=\varepsilon \cdot {Enc}\left(K_{1}, D_{i}\right)(i=1, \cdots, n),\) (3)
He then selects an empty set DB(wi) for each keyword \(w_{i} \in W(i=1, \cdots, m)\). If document \(\mathrm{D}_{j}(1 \leq j \leq n),\) contains keyword wi, he puts Cj into DB(wi). To make readers understand clearly, suppose Δi=|DB(wi)|, and DB(wi)=\(\left\{C_{i 1}, \cdots, C_{i \Delta,}\right\}\). He continues to compute:
\(t_{w_{i}}=F_{1}\left(K_{2}, w_{i}\right),\) (4)
\(l_{w_{i}}=F_{2}\left(K_{2}, w_{i}\right),\) (5)
\(k_{w_{i}}=F_{3}\left(K_{2}, w_{i}\right),\) (6)
\(h_{w_{i}}=H\left(k_{w_{i}}, C_{i 1}\|\cdots\| C_{i_{1}}\right).\) (7)
In order to store the ciphertext \(C_{i}(i=1, \cdots, n)\) on the blockchain, he finds n unredeemed transactions TXD01,····,TXD0n from his own wallet, which contain d1,····, dn amount of coins respectively. He then builds transactions TXDi (i =1,····,n) in the following manner:
1) For transaction TXD , he embeds Ci (i =1,···,n) into its out-script. Then he uses transaction TXD0i to compute the body value of transaction TXDi .
2) Sign transaction TXDi by using his private key U.sk, which is broadcasted to the blockchain.
3) If the transactions TXDi,···TXDn, appear on the blockchain, the data owner U computes TXDi =H1(TXDi )(i =1,···,n), which are seen as the identifiers of transactions TXDi ,···,TXDn respectively.
For each keyword \(w_{i}(i=1, \cdots, m), \text { if } C_{j} \in D B\left(w_{i}\right)\) he replaces Cj with \(T X I D_{D},(1 \leq j \leq n, i=1, \cdots, m)\) (1 j n, i =1,,m).
\(\text { Let } \Delta=\max _{1 \leq i \leq m}\left\{\Delta_{i}\right\} . \text { If } \Delta_{i}<\Delta\), he pads \(\mathrm{DB}\left(w_{i}\right)\) with Δ-Δi elements 0p such that |DB(wi)| = Δ, where i=1,···,m. Here, we still use symbol DB(wi) to represent the result after padding.
He chooses an empty array I. For each keyword wi\(\in\)W, he computes:
\(e_{w_{i}}=\delta . E n c\left(l_{w_{i}}, D B\left(w_{i}\right)\right)\) (8)
He stores \(\left(t_{w}, e_{w_{i}}, h_{w_{i}}\right)\) into array I in a lexicographical manner.
To generate a transaction Inx for documents D, the data owner U does:
1) Find an unredeemed transaction TX0 from his wallet, which contains d0 coins.
2) For transaction Inx, he embeds I into its out-script.
3) Take transaction TX0 as input, and compute the body of transaction Inx.
4) Sign transaction Inx by using U.sk, and broadcasts it on the blockchain.
After it appears on the blockchain, he computes its identifier TXInx=H1(Inx) and stores it locally, otherwise he needs to recreate transaction Inx.
Suppose Φ(•,•) is a function, which consists of a decryption algorithm and a verification algorithm. It takes two strings x, y as input. It then executes:
1) Use y to find the transaction q.
2) Decrypt the information that embedded in transaction q by using x. Suppose the decryption results are (α, β).
3) Inputs α, β and x, and it will verify whether β H(x, α) holds or not. If it does, it will outputs α,1, where 1 is a Boolean value. Otherwise it outputs a termination symbol .
(c) Trpdr: When U wants to find the data containing the keyword w, he will create a transaction ask shown in Fig. 6. The concrete construction is as follows:
1) Find an unredeemed transaction Tq from his wallet, which contains dt coins.
2) Compute tw = F1 (K2, w), lw = F2 (K2, w) and kw = F3 (K2, w), the data owner U then puts Φ((tw, lw, kw), TXInx) into the out-script of ask.
3) Use Tq to compute the body of ask.
4) Inputting transaction ask, the data owner U and server S compute the body of transaction Fuse respectively. Here, it has a time lock t in the transaction Fuse.
5) The server S signs the transaction Fuse by using S.sk, and sends it to U to let him add his own signature in it.
6) After signing the transaction ask by using U.sk, the data owner U broadcasts it.
7) If transaction ask does not appear on the blockchain until time t-maxU, the data owner U can redeem transaction Tq by using his private key, and quits the protocol immediately. Here, the symbol maxU means the maximal possible delay time of transaction Tq appares on the blockchain.
Fig. 6. How to get the lightweight data containing keyword w.
(d) Search: When the server S wants to redeem the transaction ask, it needs to bulid a transaction return shown in Fig. 6, which contains the information of the search results. The concrete process is as follows:
1) Input transaction ask, and compute the body of transaction return.
2) Run function Φ((tw, lw, kw), TXInx):
i. Use TXInx to get the information I embedded in the transaction Inx.
ii. Use tw to find (ew, hw), which is stored in I.
iii. Decrypt ew by using lw: DB(w) = δ.Dec(lw, ew). For brevity, let we use \(D B(w)=\left\{T X I D_{D_{4}}, T X I D_{D_{4}}, \cdots, T X I D_{D_{4}}\right\}\) to denote the decryption results, where \(T X I D_{D_{i j}}=H_{1}\left(T X_{D_{i}}\right)(j=1, \cdots, n)\) denotes the identifier of transaction \(T X_{D_{1}}\) (j=1,···,n).
iv. Read the document ciphertext Ci from transaction TXDij by using ( j =1,···,n).
3) Verify whether the equation \(H\left(k_{w}, \mathbf{C}_{l_{i}}\|\ldots\| \mathbf{C}_{l_{s}}\right)=h_{w_{w}}\) holds or not. If it holds, it puts \(\left\{C_{l_{j}}\right\}\) into the in-script of transaction return.
4) Sign transaction return, and broadcast it onto the blockchain.
(e) Dec: After transaction return appearing on the blockchain, the data owner U can read \(\left\{C_{l_{j}}\right\}\) from it. He continues to do: \(D_{l_{j}}=\varepsilon \cdot \operatorname{Enc}\left(K_{1}, C_{l_{j}}\right)(1 \leq j \leq n)\). If the transaction return does not appear on the blockchain after time t, he will broadcast transaction Fuse and gets his money back.
4.2 A SSE-on-BC scheme supports the Data with big size
If the scale of data is larger, we should deal with it before uploading it. Suppose the data owner U' has n documents D1,···, Dn, whose size is larger. In order to store them on the blockchain, he will do:
a) Gen: It inputs the security parameter k, and outputs a secret key array K= (K1,K2), where Ki ← {0,1}k (i =1, 2). Besides, the data owner U' and the server S generate a pair of keys U= (U'.sk,U'.pk) =(u1,gu1) and a pair of keys S= (S.sk,S.pk) = (s1,gs1) respectively, where u1,s1\(\in\)Zp and gu1, gs1\(\in\)G.
b) Enc: The data owner U' encrypts documents D=(D1,···, Dn) by using the secret key K1:
\(C_{i}=\varepsilon . {Enc}\left(K_{1}, D_{i}\right)(i=1, \cdots, n).\) (9)
1) If |Ci| > ι-p:
He divides Ci into s blocks \(C_{i 1}^{\prime}, C_{i 2}^{\prime}, \cdots, C_{i s}^{\prime}\), such that , where \(\left|C_{i j}^{\prime}\right|+p \leq t, \text { where } s=\left\lceil\frac{\left|C_{i}\right|}{t-p} |\right.\).j=1,...,s.
For each keyword wi W (i =1,···,m), he chooses an empty set d(wi) and assigns elements to it in this way: If document Dj \((1 \leq j \leq n)\) contains keyword wi, he puts Cj into d(wi). Suppose d(wi)=\(\left\{C_{i 1}, \cdots, C_{i \Delta_{i}}\right\}\). He computes:
\(t_{w_{i}}=F_{1}\left(K_{2}, w_{i}\right),\) (10)
\(l_{w_{i}}=F_{2}\left(K_{2}, w_{i}\right),\) (11)
\(k_{w_{i}}=F_{3}\left(K_{2}, w_{i}\right),\) (12)
\(h_{w_{i}}=H\left(k_{w_{i}}, C_{i 1}\|\cdots\| C_{i \Lambda_{i}}\right).\) (13)
He finds s unredeemed transactions \(T X_{D^{\prime} 0_{4}}, \cdots, T X_{D^{\prime} 0_{a}}\) from his wallet, which contain \(d_{i 1}, \cdots, d_{i s}\) amount of coins respectively, and builds transactions \(T X_{D_{i k}}(k=1, \cdots, s)\) as follows:
For k =1:
i. Embed \(C_{i 1}^{\prime} \| 0^{p}\) into the out-script of transaction \(T X_{D_{i_{1}}^{\prime}}\) .
ii. Take transaction\(T X_{D^{\prime} 0_{i 1}}\) as input, and compute the body of transaction \(T X_{D_{i_{1}}^{\prime}}\) .
iii. Sign transaction \(T X_{D_{i_{1}}^{\prime}}\) by using U'.sk, and broadcast it onto the blockchain.
iv. After transaction \(T X_{D_{i_{1}}^{\prime}}\) appears on the blockchain, he computes its identifier \(T X I D_{D_{i_{i}}}=H_{1}\left(T X_{D_{i_{i}}}\right)\) .
For \(2 \leq k \leq s\):
i. In the out-script of transaction \(T X_{D_{k}^{\prime}}\) , he embeds information \(C_{i k}^{\prime} \| T X I D_{D_{i(k-1)}^{\prime}}\) .
ii. Take \(T X_{D^{\prime} 0_{i 1}}\) as input, and compute the body of transaction \(T X_{D_{k}^{\prime}}\) .
iii. Sign it by using U'.sk, and broadcast it to the blockchain.
iv. If the transaction \(T X_{D_{k}^{\prime}}\) appears on the ledger, he computes its corresponding transaction identifier \(T X I D_{D_{i}^{\prime}}=H_{1}\left(T X_{D_{i k}}\right)\) .
2) When\(\left|C_{i}\right| \leq \imath-p(1 \leq i \leq n)\), he finds an unredeemed transaction TXD0i from his wallet, which contains di coins. He then builds a transaction \(T X_{D_{1}}\) as follows:
i. In the out-script of transaction \(T X_{D_{1}}\) , he embeds information Ci.
ii. Inputting transaction TXD0i , he computes the body of transaction \(T X_{D_{1}}\).
iii. Sign it by using U'.sk, and broadcast it on the blockchain.
iv. After it appears on the blockchain, he computes its identifier \(T X I D_{D_{i}}=H_{1}\left(T X_{D_{i}}\right)\) .
For each keyword \(w_{i}(1 \leq i \leq m)\), he assigns an empty set DB(wi). He assigns elements to it in the following way:
i. If \(w_{i} \in D_{i j}\) and \(\left|C_{i}\right|>t-p\), he puts \(T X I D_{D^{\prime}, s}\) into the set DB(wi).
ii. If \(w_{i} \in D_{i j}\) and , he puts \(T X I D_{D_{i j}}\) into the set DB(wi).
Suppose Δi=|DB(wi)|, and \(\Delta=\max _{1 \leq i \leq m}\left\{\Delta_{i}\right\}\). If Δi <Δ, he pads the set DB(wi) with Δ-Δi elements 0p such that |DB(wi)| = Δ, where i=1,·,m.
He continues to do:
\(e_{w_{i}}=\delta . E n c\left(l_{w_{i}}, D B\left(w_{i}\right)\right),\) (14)
For w1, he generates a transactionTXIw1 as follows:
i. He finds an unredeemed transaction TXIw10 from his wallet, which contains dw10 coins.
ii. Compute K11= F2(K2,0p) and \(r_{1}=\delta \cdot \operatorname{Enc}\left(K_{11}, t_{w_{1}}\left\|e_{w_{1}}\right\| h_{w_{1}} \| 0^{p}\right)\) .
iii. Embed r1 in the out-script of \(T X_{I w_{i}}\) .
iv. Take transaction TXIw10 as input, and compute the body of transaction .
v. He signs the transaction TXIw1 , and broadcasts it on the blockchain.
vi. After it appears on the blockchain, he computes its identifier: \(T I_{w_{1}}=H\left(T X_{M_{1}}\right)\) .
vii. If transaction TXIw1 does not appear on the blockchain, the data owner can redeem transaction TXIw10 quickly and quits the protocol.
For \(w_{j} \in W(2 \leq j \leq m)\), the data owner builds transaction \(I^{\prime} K_{I w_{j}}\) as follows:
i. Find an unredeem transaction TXIj0 from his wallet, which contains dj0 coins.
ii. Compute K11= F2(K2,0p), and rj=\(\delta \cdot \operatorname{Enc}\left(K_{11}, t_{w_{j}}\left\|e_{w_{j}}\right\| h_{w_{j}} \| T I_{w_{j-1}}\right)\) .
iii. Embed rj in the out-script of TXIwj .
iv. Input transaction TXIj0 , and compute the body of transaction TXIwj .
v. Sign transaction TXIwj by using U'.sk, and broadcast it on the blockchain.
vi. If the transaction TXIwj appears on the blockchain, he records its identifier \(T I_{w_{i}}=H_{1}\left(T X_{l w_{i j}}\right)\) .
vii. If transaction TXIwj does not appear on the blockchain, the data owner can redeem transaction TXIj0 quickly and quits the protocol.
The data owner needs to store TIwm locally.
Let Φ(·,·) be the function defined in section 4.1.
c) Trpdr: When finding data that contain the keyword w. He needs to create a transaction ask, which is shown in Fig. 7:
i. Find an unredeemed transaction Tq from his wallet, which contains dt coins.
ii. Compute tw =F1(K2,w), lw=F2(K2,w), K11=F2(K2,0p) and kw=F3(K2, w).
iii. Embed Φ(( tw, lw, kw), K11, TIwm ) into the out-script of ask.
iv. To compute the body of transaction ask, he inputs transaction Tq.
v. Taking the transaction ask as input, for transaction Fuse, the data owner U' and the server S compute its body. This transaction Fuse contains a time t. The server S signs transaction Fuse and sends it to U'.
vi. After signing the transaction ask, the data owner U' broadcasts it.
vii. After time t-maxU', if the transaction ask does not appear on the blockchain, the data owner U' redeems transaction Tq by using his private key and quits the protocol immediately, where maxU' is the maximal possible delay of including it in the blockchain.
Fig. 7. How to return the documents that contain keyword w.
d) Search: When the server S wants to redeem the transaction ask as shown in Fig. 7, it does:
1) Take transaction ask as input, and compute the body of transaction return transaction.
2) Run the function Φ( tw, lw, kw,K11, \(T I_{w_{m}}\) ): Firstly, it uses \(T I_{w_{m}}\) to get the information rm from transaction \(T X_{Iw_{m}}\) . It then computes \(t_{w_{m}}\left\|e_{w_{m}}\right\| h_{w_{m}} \|\) \(T I_{w_{n-1}}=\delta \cdot D e c\left(K_{11}, r_{m}\right)\) . Next, it will do:
- If \(\boldsymbol{t}_{w_{m}}=\boldsymbol{t}_{w}\), it continues to do: \(D B_{w_{n}}=\delta . D e c\left(l_{w_{n}}, e_{w_{n}}\right)\) . For brevity, let we use \(D B_{w_{m}}=\left\{T X I D_{D_{m}}, \cdots, T X I D_{D_{m_{a}}}\right\}\), to denote the decryption results. It then finds ciphertext Ci by using \(T X I D_{D_{m}}(1 \leq i \leq \Delta)\) :
i. In the transaction \(T X_{D_{m_{1}}}\) , if it contians \(C_{m_{i}}\), it outputs it.
ii. If the information is \(C_{m, s}^{\prime} \| T X I D_{D_{m}(s-1)}\) in the transaction \(T X_{D_{m_{1}}}\) , it firstly outputs \(C_{m_{r} s}^{\prime}\), and then uses identifier \(T X I D_{D_{m}^{\prime}}\) to get the information \(D_{m, j}^{\prime}(j=s-1, \cdots, 1)\) from transaction \(T X_{D_{m j}}(j=s-1, \cdots, 1)\). Lastly, it sets \(C_{m_{i}}=C_{m_{1}}^{\prime}\|\cdots\| C_{m_{s} s}^{\prime}\) .
- If \(t_{w_{m}} \neq t_{w}\) , it continues to use transaction identifier \(T I_{w_{m-1}}\) to read information \(r_{m-j}(j=1, \cdots, m-1)\) from transaction \(T X_{I w_{m-1}}(j=1, \cdots, m-1)\). If \(\text { If } t_{w_{n-1}}=t_{w}\) holds, it stops. That is to say, he does:
i. Decrypt \(t_{w_{n-1}}\left\|e_{w_{n-1}}\right\| h_{w_{m-1}} \| T I_{w_{m-1-1}}=\delta \cdot D e c\left(K_{11}, r_{m-j}\right)\),
ii. Verify \(t_{w_{m-j}}=t_{w}\). If this equation holds, he uses the above method to decrypt \(D B_{w_{m-1}}\) to get \(\left\{C_{l 1}, \cdots, C_{l n}\right\}\). If it does not hold, he continues to read the information rm-j-1 embedded in the transaction \(T X_{\ln _{0}-1-1}\) until \(t_{w_{m-j}}=t_{w}\) holds.
3) Embed the \(\left(\left\{C_{l 1}, \cdots, C_{l n}\right\}, h_{w}\right)\) into the out-script of transaction return.
4) After signing the transaction return, he broadcasts it.
e) Dec: After the transaction return appears on the blockchain, the data owner U recovers \(\left\{C_{l,}\right\}\) from it. He continues to compute \(D_{l_{j}}=\varepsilon . \operatorname{Enc}\left(K_{1}, C_{l_{j}}\right)(1 \leq j \leq n)\). After time t, if the transaction return still does not appear on the blockchain, he will broadcast transaction Fuse to get his money back.
5. Security and Performance Analysis
The idea of the scheme presented in section 4.1 is similar to that in section 4.2. The difference between them is that the latter needs to divide documents into blocks before uploading them on the blockchain. When search, the server needs to find all the appropriate blocks and merge them together. Here, we only present performance analysis and security analysis for the first scheme. For the second scheme, readers can derive them by themselves.
5.1 Performance
Our computer configuration is Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz , 32GB memory. We simulate our scheme on the Fabric with version number 1.4, which is stable. We create an orderer server, three organizations on it, and each organization has two peer nodes. That is to say, we build 6 peer nodes in the blockchain network. The size of the block is set to be 99MB. It takes about 2s to generate a block. We instantiate the pseudorandom functions F1, F2, F3 with HMAC-SHA256, the hash function H and H1 with HMAC-SHA256, and SE schemes with AES in the CBC mode with a 256 bit key. We sample 9411 RFC files (400MB) from the IETF website (https://www.ietf.org/rfc/) and extract 600 keywords randomly. We then transform them in the form of array (keyword, file). The number of test data ranges from 1000 to 105.
To show the efficiency of our scheme, we will elaborate from the following points:
Setup time. It mainly means the time used to generate an invertible index. The time begins after the documents are uploaded to the blockchain, and ends after the index I appears on the blockchain. The Fig. 8 shows the time to create an index for files with different scales. It is easy to get that as the size of the data grows, the time of creating an index is increasing.
Search time. This time includes the search token generation time of the keyword w, the time it takes to create a smart contract, and the time to find the files containing keyword w. Because the transaction ask contains a function , we can use a smart contract to simulate it. In this smart contract, it contains decryption algorithm, for loop algorithm, and hash verification. Fig. 9 shows the result after it is created on the Fabric.
When searching the data containing the keyword w, the server needs to provide the transaction return to complete it. We simulate it by invoke the smart contract that we built above. As shown in Fig. 10, we give its search time respectively under different scales of data.
Fig. 9. The information about the smart contract
Fig. 8. The time it takes to create an index for data with different scales. The symbol w represents a keyword, D denotes a file.
Fig. 10. The time it takes to finish a search on the data with different sizes. The symbol w represents a keyword, D denotes a file.
Table 1. Comparison between verifiable SSE schemes. The n denotes the total number of files, m denotes the number of transactions, d(w) denotes the number of the files containing the keyword w.
Table 2 is a comparison result between our SSE-on-BC scheme with other works. Let n denote the number of files need to be uploaded in the cloud server, m, the number of the transactions used to store n files on the blockchain, r, the size of indistinguishability obfuscation, and d(w), the number of files containing the keyword w. As shown in it, schemes [25] is optimal. However, it cannot resist fully malicious adversary, as well as scheme [26]. Though our scheme store files in the form of transaction, the size of each transaction is nearly equal to the size of the ciphertext which is stored in the transaction. That is to say, our schemes is also optimal.
5.2 Security Analysis
In this section, we give its security proof of our first scheme.
Theorem 1. If F1, F2, F3 are pseudorandom functions, H and H1 are collision resistant hash function, and ε = (ε.Enc, ε.Dec) is PCPA-secure symmetric encryption scheme, then the scheme we present in section 4.1 is adaptively IND-CKA2 secure.
Proof. Let we construct a PPT simulator \(S=\left\{S_{0}, S_{1}, \ldots, S_{q}\right\}\) such that, for an adversary \(A=\left\{A_{0}, A_{1}, \ldots, A_{q}\right\}\) , the output of \(\text {Ideal}_{A, S}^{\mathrm{II}}(k)\) and \(\operatorname{Re} a l_{A}^{\Pi}(k)\) is computationally indistinguishable.
Suppose the simulator S can get access to the trace of a history \(L=\left(\left|T_{1}\right|, \cdots,\right.\left.T_{n}|,| \operatorname{In} x |, \tau\left(T X_{w}\right)\right)\) where \(\tau\left(T X_{w}\right)\) denotes the search pattern and the access pattern about keyword w. It then generates \(\left(\operatorname{In} x^{*}, T_{1}^{*}, \cdots, T_{n}^{*}, T r^{*}, T S^{*}\right)\) and creates transaction ask* as follows:
i. Simulating \(T_{1}^{*}, \cdots, T_{n}^{*}\).
If q=0, it can set \(C_{1}^{*} \leftarrow\{0,1\}^{\left|S_{1}\right|}, \cdots, C_{n}^{*} \leftarrow\{0,1\}^{\left[C_{n} |\right.}\) .
Because the encryption algorithm \(\varepsilon=(\varepsilon . E n c, \varepsilon . D e c)\) is PCPA-secure, it means that \(C_{1}^{*}, \cdots, C_{n}^{*}\) are computationally indistinguishable from \(C_{1}, \cdots, C_{n}\) coming from the \(\operatorname{Re} a l_{A}^{\mathrm{II}}(k)\) game. Moreover, the adversary does not have the private key, therefore, it cannot create valid transactions \(T_{1}^{*}, \cdots, T_{n}^{*}\) which embeds \(C_{1}^{*}, \cdots, C_{n}^{*}\) respectively. If it asks the simulator S to sign these transaction, it will result in the transactions \(T_{1}^{*}, \cdots, T_{n}^{*}\) are computationally indistinguishable from the transactions\(T_{1}, \cdots, T_{n}\) that generated in the \(\operatorname{Real}_{A}^{\mathrm{n}}(k)\) game.
ii. Simulating Inx*.
If q=0, S sets\(t_{w}^{*} \leftarrow\{0,1\}^{k}, e_{w}^{*} \leftarrow\{0,1\}^{k}, h_{w}^{*} \leftarrow\{0,1\}^{k}\) Therefore, the tw, ew, hw output by Enc are computationally indistinguishable from t* w, e* w, h* w.
If q≥1, S selects \(\overrightarrow{l_{w_{a}}} \leftarrow\{0,1\}^{k}, k_{w_{a}}^{*} \leftarrow\{0,1\}^{k}\), and does \(e_{w_{q}}^{*}=\delta \cdot \operatorname{Enc}\left(l_{w_{q}}^{*}, \quad D B^{*}\left(w_{q}\right)\right)\) , \(h_{w_{q}}^{*}=H\left(k_{w_{q}}^{*}, C_{w_{q} 1}^{*}\|\cdots\| C_{w_{q} n}^{*}\right)\). Because F2 ,F3are pseudorandom functions, the \(\left(e_{w_{q}}^{*}, h_{w_{q}}^{*}\right)\) is computationally indistinguishable from \(\left(e_{w_{q}}, h_{w_{q}}\right)\) generated from the step Enc.
Because function F1 is pseudorandom, the \(t_{w_{q}}\) output by Enc is computationally indistinguishable from \(t_{w_{q}}^{*}\) which is choosed at random from {0,1}k .
Therefore, Inx* is computationally indistinguishable from Inx.
iii. Simulating Tr*.
In the transaction Tr* , it embeds t*w and TXinx. Because TXinx is broadcasted to each other, A can get it easily. Here we only consider t*w is indistinguishable from tw. It uses the pseudorandom function F1 to generate tw for keyword w in the step Trpdr in the section 4.1, and tw is indistinguishable from \(t_{w}^{*} \leftarrow\{0,1\}^{k}\) that S chooses at random. Therefore, Tr* is computationally indistinguishable from Tr.
iv. Claiming the transaction ask by using transaction s.
When q=0, if A wants to get the money from the transaction s* . S returns \(\left(\left\{C_{i 1}, \cdots, C_{i n}\right\}, h_{w}\right)\), to A , where \(C_{i j} \leftarrow\{0,1\}^{k}(j=1, \cdots, n)\)(j=1,···,n) and \(h_{w} \leftarrow\{0,1\}^{k}\) .
When \(q \geq 1\), firstly returns \(\left(\left\{C_{\mu q 1}, \cdots, C_{\mu q n}\right\}\right)\) to A , where \(C_{w q j}(j=1, \cdots, n)\) is the history of access pattern about keyword wq. S then sets \(k_{w_{q}}^{*} \leftarrow\{0,1\}^{k}\) and computes \(h_{w_{q}}^{*}=H\left(k_{w_{q}}^{*}, C_{w q 1}\|\cdots\|_{W_{w q n}}\right)\) which will be sent to . Because F3 is a pseudorandom function, therefore the transaction s that A creates cannot claim the money from transaction ask.
6. Conclusion
This paper provided a search method for encrypted data on the blockchain, and constructed two concrete search algorithms based on the size of data. We also give its security and performance analyses. Compared to the existing SSE schemes, our scheme can automatically resist malicious adversary. In addition, the server only needs to find the document transactions which are related to the keyword w, therefore, our search complexity is sub-linear with the total number of documents. Since our scheme can better protect the privacy and integrity of data, it can be applied in many industries, such as medical healthcare, insurance and finance.
At present, the blockchain is still in its infancy, and it only supports static data. Therefore, how to design a scheme supports data update and search on it is very interesting. This is also our next work.
References
- D. X. Song, D. Wagner, and A. Perrig, "Practical techniques for searches on encrypted data," in Proc. of 2000 IEEE Symposium on Security and Privacy, IEEE, pages 44-55, 2000.
- E. J. Goh, "Secure indexes," IACR Cryptology ePrint Archive, 2003. http://eprint.iacr.org/2003/216
- R. Curtmola, J. A. Garay, S. Kamara, and et al., "Searchable symmetric encryption: improved definitions and efficient constructions," in Proc. of the 13th ACM conference on Computer and communications security, ACM, pages 79-88, 2006.
- P. Golle, J. Staddon, and B. Waters, "Secure conjunctive keyword search over encrypted data," International Conference on Applied Cryptography and Network Security, Springer, pages 31-45, 2004.
- T. Moataz and A. Shikfa, "Boolean symmetric searchable encryption," in Proc. of the 8th ACM SIGSAC symposium on Information, computer and communications security, ACM, pages 265-276, 2013.
- D. Cash, S. Jarecki, C. S. Jutla, and so on, "Highly-scalable searchable symmetric encryption with support for boolean queries," Advances in Cryptology-CRYPTO 2013, Springer, pages 353-373, 2013.
- S. Kamara, T. Moataz, "Boolean searchable symmetric encryption with worst-case sub-linear complexity," in Proc. of Annual International Conference on the Theory and Applications of Cryptographic Techniques, Springer, pages 94-124, 2017.
- J. Li, Q. Wang, C.Wang, and et al., "Fuzzy keyword search over encrypted data in cloud computing," in Proc. of INFOCOM, 2010 Proceedings IEEE, pages 1-5, 2010.
- A. Boldyreva, N. Chenette, "Efficient fuzzy search on encrypted data," Fast Software Encryption, Springer, pages 613-633, 2014.
- W. K. Wong, D. W. Cheung, B. Kao, and et al., "Secure knn computation on encrypted databases," in Proc. of the 2009 ACM SIGMOD International Conference on Management of data, ACM, pages 139-152, 2009.
- N. Cao, C. Wang, M. Li, and et al., "Privacy-preserving multi-keyword ranked search over encrypted cloud data," IEEE Trans. Parallel Distrib. Syst., 25(1), 222-233, 2014. https://doi.org/10.1109/TPDS.2013.45
- Z. J. Fu, F. X. Huang, K. Ren, and et al., "Privacy-preserving smart semantic search based on conceptual graphs over encrypted outsourced data," IEEE Trans. Information Forensics and Security, 12(8):1874-1884, 2017. https://doi.org/10.1109/TIFS.2017.2692728
- P. V. Liesdonk, S. Sedghi, J. Doumen, and et al., "Computationally efficient searchable symmetric encryption," in Proc. of Workshop on Secure Data Management, Springer, pages 87-100, 2010.
- S. Kamara, C. Papamanthou, and T. Roeder, "Dynamic searchable symmetric encryption," in Proc. of the 2012 ACM conference on Computer and communications security, ACM, pages 965-976, 2012.
- K. Kurosawa and Y. Ohtaki, "How to update documents verifiably in searchable symmetric encryption," in Proc. of Cryptology and Network Security-12th International Conference, CANS 2013, pages 309-328, Paraty, Brazil, November 20-22, 2013.
- D. Cash, J. Jaeger, S. Jarecki, and et al., "Dynamic searchable encryption in very-large databases: Data structures and implementation," NDSS, volume 14, pages 23-26, Citeseer, 2014.
- M. Naveed, M. Prabhakaran, and C.A. Gunter, "Dynamic searchable encryption via blind storage," Security and Privacy (SP), 2014 IEEE Symposium on, IEEE, pages 639-654, 2014.
- C. Guo, X. Chen, Y. M. Jie, and et al., "Dynamic multi-phrase ranked search over encrypted data with symmetric searchable encryption," IEEETrans. Services Computing, 2017.
- E. Stefanov, C. Papamanthou, and E. Shi, "Practical dynamic searchable encryption with small leakage," NDSS, volume 14, pages 23-26, 2014.
- R. Bost, P. A. Fouque, and D. Pointcheval, "Verifiable dynamic symmetric searchable encryption: Optimality and forward security," IACR Cryptology ePrint Archive, 2016:62, 2016. http://eprint.iacr.org/2016/062
-
R. Bost, "
${\Sigma}o{\varphi}o{\zeta}$ : Forward Secure Searchable Encryption," in Proc. of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, pages 1143-1154, 2016. - S. Kamara, C. Papamanthou, "Parallel and dynamic searchable symmetric encryption," in Proc. of International Conference on Financial Cryptography and Data Security, Springer, pages 258-274, 2013.
- J. Alderman, K. Martin, and S. Louise Renwick, "Multi-level access in searchable symmetric encryption," in Proc. of International Conference on Financial Cryptography and Data Security, Springer, pages 35-52, 2017.
- C. Bosch, P. Hartel, W. Jonker, and et al., "A survey of provably secure searchable encryption," ACM Computing Surveys (CSUR), 47(2), 18, 2015.
- K. Kurosawa and Y. Ohtaki, "UC-secure searchable symmetric encryption," Financial Cryptography and Data Security, Springer, pages 285-298, 2012.
- R. Cheng, J. Yan, C. Guan, and et al., "Verifiable searchable symmetric encryption from indistinguishability obfuscation," in Proc. of the 10th ACM Symposium on Information, Computer and Communications Security, ASIA CCS '15, pages 621-626, Singapore, April 14-17, 2015.
- S. G. Dai, H. G. Li, and F. G. Zhang, "Memory leakage-resilient searchable symmetric encryption," Future Generation Comp. Syst., 62, 76-84, 2016. https://doi.org/10.1016/j.future.2015.11.003
- H. G. Li, F. G. Zhang, and C. I. Fan, "Deniable searchable symmetric encryption," Information Sciences, 402:233-243, 2017. https://doi.org/10.1016/j.ins.2017.03.032
- S. Nakamoto, Bitcoin: A peer-to-peer electronic cash system, 2008.
- M. Swan, "Blockchain: Blueprint for a new economy [M]," O'Reilly Media, Inc., 2015.
- M. Andrychowicz, S. Dziembowski, D. Malinowski, and et al., "Fair two-party computations via bitcoin deposits," in Proc. of International Conference on Financial Cryptography and Data Security, Springer, pages 105-121, 2014.
- D. Chaum, "Blind signatures for untraceable payments," Advances in cryptology, pages 199-203, Springer, 1983.
- D. Chaum, "Blind signature system," Advances in cryptology, Springer, pages 153-153, 1984.
- M. Andrychowicz, S. Dziembowski, D. Malinowski, and et al., "Secure multiparty computations on bitcoin," in Proc. of 2014 IEEE Symposium on Security and Privacy, IEEE, pages 443-458, 2014.
Cited by
- Analysis of Blockchain Ecosystem and Suggestions for Improvement vol.19, pp.1, 2019, https://doi.org/10.6109/jicce.2021.19.1.8