1. Introduction
Private set intersection (PSI) is a typical problem among the secure multi-party computation domain [1], which enables two or more participants to securely compute the intersection of their private sets without revealing any other information about their private sets [2]. For example, for two mutually distrustful participants P1 and P2, their private sets are S1 and S2, which |S1| = m, |S2| = n, and they want to compute S1∩S2. Private set intersection cardinality (PSI-CA) and private set union cardinality (PSU-CA) are the important variants of PSI which have caused attention of some researchers. The PSI-CA problem is when the participants are P1 and P2, and the sets they hold are S1 and S2, trying to compute |S1∩S2|. In many scenarios where data privacy needs to be protected, the technology of PSI-CA has significant meanings and has applied in many realistic scenarios such as advertisement conversion rate calculation [3], social network contacts exploration [4], gene sequence match detection [5], infectious disease patient tracing [6].
Take the problem of calculating the advertisement conversion rate for instance, the calculation of advertisement conversion rate is a kind of important application in PSI-CA which means the proportion of users who are influenced by the ad to make a purchase or register to the total number of clicks on the ad. In the real world, the merchants hold the information of people who purchase the products while the advertisers own the information of people who click the advertisement. In order to calculate the conversion rate, the advertisers need to compute the intersection of the set it holds with the set of the merchant side, so as to find the total number of people who have seen the advertisement and completed the transaction. Finally, the advertisement could calculate the advertisement conversion rate. In this scenario, PSI-CA is more applicable to solve this problem compared to PSI, because using PSI-CA could compute directly the number of people who have seen the advertisement and completed the transaction, and then divide that number by the total number of clicks on the advertisement. However, using the technology of PSI could only obtain the concrete information of users who have seen the advertisement and finished the transaction. But it could definitely leakage private data of users.
Nowadays, most researchers are dedicating to study PSI protocols and so neglecting the research on PSI-CA and PSU-CA. Therefore, it’s high time to expand the content of the private sets protection field to meet the current privacy protection needs of society better. What’s more, most of the currently available PSI protocols only meet the security under the semi-honest model, and it can’t guarantee security under the malicious model. Therefore, it can hardly apply in practical scenarios.
In addition, most of the current PSI-CA protocols are constructed by Bloom Filter and the homomorphic encryption algorithm. However, the Bloom Filter’s problem of false positive makes the final results exist error. In some practical application environments with high precision requirements for data, such as national defense and military. This kind of error is not allowed to exist. These existing protocols are therefore difficult to apply in this type of environment.
To address these issues above, using Elgamal cryptography and Bloom filter, we firstly present a local two-party PSI-CA protocol. Bloom filter is a special data structure which enables users to map the set elements to an array according to hash functions. To enhance the security of data, we propose two new operations on Bloom filter called IBF and BIBF. Using the variant of Elgamal cryptography for the sake of the security of data during interaction. The plaintext is not the message when decrypted, which is located in the index part of the decryption result. This property makes it easy for the participants to determine whether the element belongs to the set intersection. Utilizing zero-knowledge proof, our protocol satisfy security not only under the semi-honest model, but also the malicious model. After the protocol has been executed, only one side gets the result of intersection cardinality, the other gets nothing. To further improve the accuracy of the final results, we present an improved PSI-CA protocol. Innovatively using key-value pair packing technology and Garbled Bloom Filter, we have achieved a significant reduction in the mean relative error of the protocol. Finally, the accuracy of the resulting intersection cardinality is significantly improved.
1.1 Our Contributions
In this paper, we present two efficient two-party PSI-CA protocols with high security. Our main contributions are:
(1) Present two new operations on Bloom filter called IBF and BIBF, which could improve privacy of data;
(2) Design two efficient two-party PSI-CA protocols, the first protocol is secure under the malicious adversary model, which could resist the malicious behavior. The improved protocol has higher result accuracy compared to the previous one;
(3) Using ideal-reality simulation paradigm, we prove that our first protocol is secure under the malicious adversary model and provide a complete security proof process.
This paper is organized as follows: in section 2, we introduce some related work about PSI and PSI-CA, section 3 presents some main techniques and security model used in our scheme, while section 4 presents our two PSI-CA protocols. In section 5, we prove the security under malicious model of the protocol. Section 6 presents the analysis of efficiency, the accuracy of results and the functional comparison with other schemes. Finally, we summarize the paper and prospect the future research directions.
2. Related Work
Using homomorphic encryption and inadvertent polynomial valuation, the research of protocol was first put up by Freedman et al. [7] in 2004. And then a lot of researches on PSI have followed. In 2019, Pinkas et al. [8] presented a notation of multi-point OPRF and depended on the construction of high order polynomials, which could reduce communication cost while reducing the number of times the sender encrypts elements. Song et al. [9] presented a series of protocols on the set operations, which dramatically reduce the computational associated with traditional public key operations using oblivious transfer. However, the above protocols are all traditional local two-party PSI protocols. To better accommodate the involvement of multiple parties, scholars have presented a series of multi-party PSI protocols. Vos et al. [10] realized the “union” operation of private set elements based on elliptic curves, and implement a multi-party PSI protocol for large and small sets respectively. Zhang et al. [11] presented a three-party PSI protocol against semi-honest model, which based on bilinear mapping and three-party key negotiation protocol.
To better resist the malicious behaviors of the participants, using garbled bloom filter, BenEfraim et al. [12] presented a malicious secure multi-party protocol that can be used against any number of corrupt parties.
In the era of big data, with the surge in data processing, the advantages of cloud computing in the era of big data are coming to the fore, many more solutions have emerged as scholars have begun to investigate on outsourcing large amounts of heavy computing tasks to cloud servers. Abadi et al. [13] presented an outsourcing PSI protocol called O-PSI, which let the cloud server perform a large number of complex calculations. After that, Abadi et al. [14] presented a verifiable delegated PSI protocol named VD-PSI. This protocol introduced a verification protocol into the outsourcing PSI protocol, where the participants can verify the correctness of the results after receiving them. Based on the O-PSI protocol, Yang et al. [15] presented a delegated PSI protocol, which has more advantages in computational efficiency compared to O-PSI. In 2022, Wei et al. [16] presented a PSI protocol based on semi-trusted cloud server with the help of the oblivious pseudo-random functions.
PSI-CA, as a branch of the PSI, has not had much related research work compared to PSI. Egert et al. [17] presented a local two-party PSI-CA protocol based on Bloom filter and Elgamal cryptography, but it only satisfies the security under the semi-honest model. Mihaela et al. [18] proposed a two-party PSI-CA scheme based on Paillier homomorphic encryption algorithm. This protocol is secure under honest but curious model. However, the computation cost is too high because the Paillier homomorphic encryption algorithm. Davidson et al. [19] proposed a toolkit about the set operation, including the calculation of set union, set intersection and the cardinality of the intersection and union. It enriches the functionality of set operations but it can’t resist the malicious adversary attacks. To resist the malicious behaviors of adversary, combining zero-knowledge proof and homomorphic encryption, Debnath et al. [20] proposed a two-party PSI-CA protocol which has poor performance in terms of efficiency. Using zero-knowledge proofs, GM cryptography algorithm, Debnath et al. [21] also proposed another PSI-CA scheme under semi-honest model. However, all of these protocols have the disadvantage of not being able to resist the malicious behaviors of the participants or having high computation cost.
3. Preliminaries
Our protocol primarily utilizes Elgamal cryptography and Bloom filter and ideal-realistic simulation paradigm in the security proof process. Therefore, in this section, we mainly introduce Elgamal cryptography, Bloom filter, ideal-realistic simulation paradigm and the malicious model.
3.1 Elgamal cryptography
Elgamal cryptography is an algorithm based on DDH assumption which was proposed by Tather Elgamal [22] in 1985. We use a variant of Elgamal cryptography in our protocol. The concrete algorithm process is as follows:
(1) Key Generation: as for a multiplicative cyclic group G of order q, where g is a generator of the group G. Choose x ← Zq randomly, compute y = gx mod q. Then (pk, sk) = (y, x).
(2) Encryption: for plaintext m, compute c = Encpk(m) = (c1, c2), where c1 = gr mod q, c2 = gmyr mod q, r ← Zq.
(3) Decryption: as for ciphertext c, it can be decrypted as gm = Decsk(c) = (c1)-x⋅(c2)mod q. Note that the result of the decryption of this algorithm is gm, not plaintext m.
The Elgamal algorithm has additive homomorphic property. Let E be the Elgamal encryption algorithm, As for the ciphertext C1 = E(m1) and C2 = E(m2), it has E(m1 + m2) = E(m1)⋅E(m2) = C1⋅C2.
3.2 Bloom Filter
Bloom filter is a special data structure which was proposed by Burton Bloom [23] in 1970. It can be used to represent set elements and easy to query them. Bloom filter is composed by an array of m bits and k hash functions {h1(),...,hk()}. Initially all the bits in the filter are set to zero. If an element x ∈ S intends to insert into the Bloom filter, it should set the bits hi(x) to one, where 1 ≤ i ≤ k. Fig. 1 presents the concrete algorithm about the insertion operation. As for an element, only when all the k positions it is mapped to are set to one [24], the element can be judged to belong to the set, otherwise it does not belong to the set. Fig. 2 presents the algorithm for the membership test process.
Fig. 1. Algorithm for Bloom filter insertion
Fig. 2. Algorithm for Bloom filter member test
However, the Bloom filter exists false positive problem, which is a situation that an element does not belong to the set S but can be tested successfully in the Bloom filter. False positive probability rate is related to the number of hash functions, number of bits in the Bloom filter and the number of elements to be inserted.
We present two operations on Bloom filter named IBF and BIBF.
Let BF be a Bloom filter, IBF is the inversed BF by bits. If a bit in BF is set to 1, then set it to 0. Otherwise, if a bit is set to 0, then set it to 1. Fig. 3 presents the algorithm for the process of inversing Bloom filter.
Fig. 3. Algorithm for inversing the Bloom filter.
On the basis of IBF, blinding each bit of the IBF by multiplying it by a random number, the obtained result is called BIBF. Fig. 4 shows the algorithm for the process of blinding the inversed Bloom filter.
Fig. 4. Algorithm for blinding inversed Bloom filter.
Garbled Bloom Filter is the variant of the traditional Bloom Filter. Comparing to the traditional Bloom Filter, the Garbled Bloom Filter has an array which contains random string rather than the character of 0 or 1. This feature could not only decrease the error caused by the problem of false positive but also improve the security of data storage. In our improved protocol, before the client C construct the Garbled Bloom Filter GBFC, it should construct a set of key-value pair and then pack them into the GBFC. As for a key-value pair (x, y), it has \(\begin{align}y=\sum_{i=1}^{t} G B F\left(h_{i}(x)\right)\end{align}\). The positions in the GBFC that do not satisfy this condition are stored as random strings.
3.3 Ideal-realistic Simulation Paradigm
The ideal-realistic simulation paradigm is the main method used for security proofs in the domain of secure multi-party computing. It compares the implementation of the PSI-CA protocol by simulating an ideal model with a realistic situation, thus, it can indirectly prove the security of the protocol [25].
In the ideal model, the function of the protocol is computed by the trusted third party, and then sends the result to the participant. However, in the real model, it splits the function into multiple message functions and communicates between the participants to complete the computation. Finally, the security of the PSI-CA protocol is demonstrated by proving that the view of the ideal world achieves indistinguishability from the view of the real world.
3.4 the Malicious Model
The malicious model is another typical adversary model in secure multi-party computing. In the malicious adversary model, comparing to the honest model or semi-honest model, the participant will not execute the protocol honestly, but will perform malicious operations in the course of executing the protocol such as tampering the input information, terminating the protocol early, and refusing to participate in the protocol [25].
Our protocol contains two party named the client and the server which the client holds the set X and the server holds the set Y. Also, |X| = m, |Y| = n. Therefore, we define a two-party protocol π computing function f where f:({0,1}*)m x ({0,1})n → f f|∩| × ⊥, where {0,1}* denotes the field of input elements, m and n denote the cardinality of two sets respectively, and f|∩| denotes the cardinality of intersection of two sets. We can conclude that the client obtains the cardinality of intersection |X∩Y| and the server obtains nothing.
Specifically speaking, the malicious party may execute the following types of attacks:
(1) A malicious party can tamper with the starting input. The malicious client C could forge the Bloom filter BFC represented by its set in the first phase of the protocol to obtain more information about the set held by the server S. The malicious client will try to insert all elements of the universe set U into the Bloom filter, so that each bit of the resulting Bloom filter is set to 1. When performing subsequent steps, there has |X∩Y| = |U∩Y| =|Y|. That is, the size of the intersection is the size of the server’s set. So, it will leak the cardinality of the set held by server S.
(2) A malicious party could tamper the intermediate results or terminate the protocol in advance. It is possible for both S and C to execute the operation.
In order to resist both of these attacks, using the technology of zero-knowledge proof, constructing proofs to guarantee the correctness of transmitted messages before interaction. When the receiver obtains the message, it should verify the validity of the proof firstly, if the verification is successful, the receiver receives the message. Otherwise, the party terminates the protocol. This method can effectively resist the malicious behaviors of malicious parties.
3.5 the Zero-knowledge Proof
Our protocol uses zero-knowledge proofs techniques to ensure the security of the protocol under malicious models. The following describes a general zero-knowledge proof of the basic construction process [26].
In our scheme, the form of the proof is shown below
π = Pok{𝑎1,..,𝑎1)|∧mi=1 Ki = fi(𝑎1,..,𝑎l)}
The specific interaction process between the prover and the verifier is described below.
(1) Commitment:
Firstly, the Prover picks t1,...,tl uniformly at random, then the prover computes a commitment \(\begin{align}\overline{K_i}\end{align}\) :
\(\begin{align}\overline{K_{i}}=g_{i}\left(t_{1}, \ldots, t_{l}\right), i=1, \ldots, m\end{align}\)
After that, the prover sends the commitment \(\begin{align}\overline {K_i}\end{align}\) to the verifier.
(2) Challenge:
Verifier picks a challenge number h randomly from space C, and sends h to the prover. After that, the prover computes nj = tj + c⋅𝑎j, where j = 1,...l. The prover then sends {n1,...,nl} to the verifier.
(3) Verify
After receiving {n1,...,nl}, The verifier checks if \(\begin{align}g_{l}\left(n_{1}, \ldots n_{l}\right)=\overline{K_{i}} \cdot K_{i}^{h}\end{align}\) exist or not, where l = 1,...,m. If the equation exists then the verifier accepts the result otherwise rejects.
4. Our PSI-CA Protocol
In our PSI-CA protocol, the parties are the client C and the server S, and they hold sets X = {x1,...,xm} and Y = {y1,...,yn} respectively, which |X| = m, |Y|= n. C wants to find the set intersection cardinality with S. After the protocol is executed, the client C gets the output as the intersection cardinality.
The following Table 1 shows the relevant symbols and descriptions required in the protocol.
Table 1. Description of symbols
The PSI-CA protocol under malicious model.
Input: The client C inputs the private input set X = {x1,...,xm}, the server inputs the private input set Y = {y1,...yn}. And the public parameters P = (G,q,g) as their common input. The security parameter κ, λ.
Output: The client C outputs X∩Y; the server S outputs ⊥.
The general structure of our protocol is shown in Fig. 5:
Fig. 5. Structure of PSI-CA protocol
Step 1. Setup phase
(1) For a multiplicative cyclic group G with order q, and g is its generator. The client C picks an uniformly random value \(\begin{align}x \stackrel{R}{\longleftarrow} Z_{q}\end{align}\) and computes y = gx mod q. The client C ’s key pair is (pkC, skC) = (y, x).
(2) The client C inserts every element xi ∈ X(1 ≤ i ≤ m) from the set X into the Bloom filter, executes the algorithm 1 shown in Fig. 1. And then gets the result BFC.
(3) The client C performs the inversing algorithm shown in Fig. 3 for each bit of the BFC. This gets IBFC.
(4) Picking uniformly random values \(\begin{align}r_{1}, r_{2}, \ldots, r_{m} \stackrel{R}{\longleftarrow} Z_{q}^{*}\end{align}\) to blind every bit of IBFC, where \(\begin{align}\overline{x_{i}}=r_{i} \cdot I B F_{C}\left(x_{i}\right)\end{align}\), 1 ≤ i ≤ m.
The client then gets BIBFC.
(5) Using zero-knowledge proof, the client C constructs the proof π1 = PoK{r1,...,rm)|∧mi=1 (BIBFC[i] = ri x IBFC[i])}, the construction and verification processes are illustrated in preliminary. Then, let the proof π1, \(\begin{align}\overline {X} = \{\overline {X_1}, \overline {X_2}, \ldots, \overline {X_m}\}\end{align}\) and k hash functions all send to the server S.
Step 2. Computation phase
(1) After receiving the message from the client C, S verifies the correctness of π1 firstly. If the verification passes, then S receives message and continues the subsequent steps. Otherwise the server S aborts it.
(2) The server S hashes every element from Y by k hash functions, where ∀yj ∈ Y, computing h1(yj),...,hk(yj), 1 ≤ j ≤ n.
(3) The server S finds the elements 𝑎h1(yj),...,𝑎hk(yj) from \(\begin{align}\bar{X}=\left\{\overline{x_{1}}, \overline{x_{2}}, \ldots, \overline{x_{m}}\right\}\end{align}\).
(4) Encrypting the elements with the client’s public key pkC. That is \(\begin{align}E\left(\overline{y_{j}}\right)=\left\{\left(g^{z_{j}}, g^{a_{h_{1}}\left(y_{j}\right)} \cdot y^{z_{j}}\right), \ldots,\left(g^{z_{j}}, g^{a_{h_{k}}\left(y_{j}\right)} \cdot y^{z_{j}}\right)\right\}, 1 \leq j \leq n\end{align}\), where \(\begin{align}z_{j} \stackrel{R}{\longleftarrow} Z_{q}^{*}\end{align}\). The obtained results constitute the set \(\begin{align}E(\bar{Y})=\left\{E\left(\overline{y_{1}}\right), \ldots, E\left(\overline{y_{n}}\right)\right\}\end{align}\).
(5) Using zero-knowledge proof, the server S constructs the proof π2 = PoK{(z1,...zj)|∧nj=1(cj = gZj)}. Send the proof π2, \(\begin{align}E(\bar{Y})=\left\{E\left(\overline{y_{1}}\right), \ldots, E\left(\overline{y_{n}}\right)\right\}\end{align}\) to the client C.
Step 3. Intersection computation phase
(1) When receiving the message from the server S, the client C verifies the correctness of π2 firstly. If the verification passes, then receives the following message and continues the subsequent steps. Otherwise the server S aborts it.
(2) The client C performs the following operations on each element of the set \(\begin{align}E(\bar {Y})\end{align}\) : (gZi)-x(g𝑎hj(yi)yZi) = g𝑎hj(yi), 1 ≤ i ≤n, 1 ≤ k ≤ j, where x is the private key of C.
If the result of decrypting an item of the set \(\begin{align}E(\bar {Y})\end{align}\) is all 1, then the counter adds 1.
The PSI-CA scheme could be extended to the PSU-CA protocol. Taking advantage of the relationship |X∪Y| = |X| + |Y| - |X∩Y|, the client could find the private set union cardinality with ease.
Fig. 6 displays the specific interaction process between the two participants in our protocol:
Fig. 6. Our first PSI-CA protocol
The above protocol uses the Bloom Filter to display the set. However, the Bloom Filter exists the problem of false positive and therefore there remain errors between the calculated results and the true results. We mainly use the Garbled Bloom Filter to solve the problem. Compared to the Bloom Filter, GBF's array holds strings rather than individual characters in each bit, this feature also further enhances the security. We next propose an improved PSI-CA protocol by using Garbled Bloom Filter and the additive homomorphic property of Elgamal algorithm.
Fig. 7 presents the process of improved PSI-CA protocol:
Fig. 7. The improved PSI-CA protocol
5. Security analysis
We assume that AC and AS are the real world adversaries. SIMC and SIMS is the corresponding adversaries in the ideal world. The AC and AS can corrupt C, SIMC and S, SIMS respectively. Let \(\begin{align}\bar {C}\end{align}\) and \(\begin{align}\bar {S}\end{align}\) be the honest party in ideal world. In the real world, the trusted third party generates the public parameter P = (G, q, g). However, in the ideal world, this process is realized by SIMC and SIMS. We define the combined output of C, S, AC(AS) is REALΘ,AC(Z)AS(Z))(X, Y) in real world, and the combined output of \(\begin{align}\bar {C}\end{align}\), \(\begin{align}\bar {S}\end{align}\), SIMC(SIMS) is IDEALf,SIMC(Z)(SIMS(Z))(X,Y) in ideal world.
Theorem. If the Elgamal cryptography algorithm is semantically secure, the proof protocol in our scheme is zero-knowledge proof, then our PSI-CA protocol could securely compute the function f : (X,Y) → (|X∩Y|, ⊥).
Proof. In order to prove the security of the PSI-CA protocol, we consider two cases which the client C is corrupted by AC firstly and the server S is corrupted by AS.
Case1. The client C is corrupted by AC
We let Z be a distinguisher that can control the adversary AC. The Z feeds the input of the receiver S and sees the output of S. In the real world, Z’s view includes AC’s view and S’s output. In the ideal world, Z’s view includes AC’s view and \(\begin{align}\bar {S}'s\end{align}\) output. We need to prove that Z’s view in the real world is indistinguishable with the view in the ideal world. Considering a range of games Game0, Game1, Game2, where Gamei+1 could slightly modify Gamei (i = 0,1). Let the probability that Z can successfully distinguish the view in Gamei from the view in the real protocol be Pr[i]. And let the Si be the simulator in Gamei.
Game0 : This game corresponds to the execution process of the protocol in the real world. And the simulator S0 has all information about the server S, also, it can interact with AC. Therefore, it exists Pr[REALθ,AC(Z)(X,Y)] = Pr[Game0].
Game1 : Game1 has the same process as Game0, if the proof π1 is valid, then the simulator S1 executes the algorithm for π1 with the client C to calculate the multiplier {r1,...,rm}. Using {r1,...,rm}, the simulator S1 builds X = {x1,...,xm}. The simulator S1 then extract {r1⋅IBFC[1],...,rm⋅IBFC[m]} from BIBFC. Then the simulator S1 computes \(\begin{align}I B F_{C}=\left\{I B F_{C}[1]=\frac{B I B F_{C}[1]}{r_{1}}, \ldots, I B F_{C}[m]=\frac{B I B F_{C}[m]}{r_{m}}\right\}\end{align}\) and then BFC. After that, the simulator S1 computes X = {x1,...,xm}. Since the simulation soundness property of the proof π1, Z’s views in Game0 and Game1 are indistinguishable. Thus, |Pr[Game1]-Pr[Game0]| ≤ θ1(k), where θ1(k) is a negligible function.
Game2 : The first few steps of Game2 are exactly the same as those of Game1. The only difference is that after constructing the set X = {x1,...,xm}, the simulator S2 performs the steps as followed.
(1) calculate |X∩Y| ;
(2) construct the set Y' = {y'1,...,y'n}, where the set Y′ contains |X∩Y| random elements from the set X and n − |X∩Y| random elements from G ;
(3) using the set X, construct BIBFC ;
(4) calculate \(\begin{align}E\left(\overline{Y^{\prime}}\right)=\left\{E\left(\overline{y_{1}^{\prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime}}\right)\right\}\end{align}\), where \(\begin{align}E\left(\overline{y_{j}^{\prime}}\right)=\left\{\left(g^{z_{j}}, g^{a_{l_{1}\left(y_{j}\right)}} \cdot y^{z^{z_{j}}}\right), \ldots,\left(g^{z_{j}}, g^{a_{h_{k}\left(y_{j}\right)}} \cdot y^{z^{z_{j}}}\right)\right\}\end{align}\);
(5) sends \(\begin{align}E\left(\overline{Y^{\prime}}\right)=\left\{E\left(\overline{y_{1}^{\prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime}}\right)\right\}\end{align}\) as E(Y) = {E(y1),...,E(yn)} and simulates the proof π2 ; Because the related Elgamal encryption scheme is semantically secure, the distributions of \(\begin{align}E\left(\bar{Y}^{\prime}\right)=\left\{E\left(\overline{y_{1}^{\prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime}}\right)\right\},\left\{y_{1}^{\prime}, \ldots, y_{n}^{\prime}\right\}\\\end{align}\) in Game1 and Game2 are identical. Also, since the zero-knowledge simulatability of the proof π2 and indistinguishability of <\(\begin{align}E\left(\bar{Y}^{\prime}\right)=\left\{E\left(\overline{y_{1}^{\prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime}}\right)\right\},\left\{y_{1}^{\prime}, \ldots, y_{n}^{\prime}\right\}\\\end{align}\)> in Game1 and Game2. Z’s views in Game1 and Game2 are indistinguishable. Therefore, it exists |Pr[Game2]-Pr[Game1]| ≤ θ2(k), where θ2(k) is a negligible function.
In the real world, the adversary SIMC could simulate the honest party S and includes all steps from Game2. The execution of the protocol in the real world is as follows:
(1) Firstly, SIMC generates the public parameter P = (G,q,g). Next, SIMC invokes AC, and input X = {x1,...,xm} ;
(2) After receiving π1 and \(\begin{align}\overline{X} = \{\overline{x_1}, \overline{x_2}, \ldots, \overline{x_m}\}\end{align}\), where \(\begin{align}\overline{x_i} = r_i{\cdot}IBF_C{(x_i)}\end{align}\) verifies the validity of the proof π1. If the verification succeeds, SIMC hashes every element from the set Y, and finds 𝑎h1(yj),...,𝑎hk(yj) from the set \(\begin{align}\overline{X} = \{\overline{x_i}, \overline {x_2}, \ldots, \overline{x_m}\}\end{align}\);
(3) SIMC sends X and \(\begin{align}\bar {S}\end{align}\) sends Y to the trusted third party T, T uses X and Y as input and computes the functionality f, returns |X∩Y| to SIMC.
(4) After SIMC receiving |X∩Y|, the simulator performs the following operations:
(i) SIMC builds \(\begin{align}\overline{Y} = \{\overline{y_1}, \ldots, \overline{y_n}\}\end{align}\), where \(\begin{align}\overline {Y} = \{\overline{y_1}, \ldots, \overline{y_n}\}\end{align}\) contains |X∩Y| random elements from the set X and n − |X∩Y| random elements from the group G ;
(ii) using the set X, constructs BIBFC ;
(iii) computes \(\begin{align}E\left(\overline{Y^{\prime \prime}}\right)=\left\{E\left(\overline{y_{1}^{\prime \prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime \prime}}\right)\right\}\end{align}\), where
\(\begin{align}E\left(\overline{y_{j}^{\prime \prime}}\right)=\left\{\left(g^{z_{j}}, g^{a_{h_{1}}\left(\overline{y_{j}^{\prime}}\right)} \cdot y^{z_{j}}\right), \ldots,\left(g^{z_{j}}, g^{a_{h_{k}\left(\overline{y_{j}}\right)}} \cdot y^{z_{j}}\right)\right\}\end{align}\);
(iv) sends \(\begin{align}E\left(\overline{Y^{\prime \prime}}\right)=\left\{E\left(\overline{y_{1}^{\prime \prime}}\right), \ldots, E\left(\overline{y_{n}^{\prime \prime}}\right)\right\}\end{align}\) as \(\begin{align}E\left(\overline{Y^{\prime}}\right)=\left\{E\left(\overline{y_{1}^{\prime }}\right), \ldots, E\left(\overline{y_{n}^{\prime}}\right)\right\}\end{align}\) and simulates the proof π2.
Hence, the simulator SIMC provides AC the same simulation as the simulator S2 in Game2. Therefore, it has Pr[IDEALf,SIMC(Z)(X,Y)] = Pr[Game2] and
|Pr[IDEALf,SIMC(Z)(X,Y)] - Pr[REAL0,AC(Z)Game2]|
|Pr[Game2] - Pr[Game0]|
≤ Σ1i=0(|Pr[Gamei+1] - Pr[Gamei]|)
≤ θ2(k) + θ1(k) = θ(k),
where θ(k) is a negligible function. So, it exists
\(\begin{align}I D E A L_{f, S I M_{\mathrm{C}}(z)}(X, Y) \stackrel{c} \equiv R E A L_{\theta, A_{c}(z)}(X, Y)\end{align}\) (5)
where \(\begin{align}\stackrel{c} \equiv\end{align}\) means computationally indistinguishable.
The proof process of the case which AS corrupts S is similar as the first case, therefore, the proof process is not described in detail.
6. Efficiency
6.1 Implementation details
We run our experiments on a laptop with an Intel i5-8300H 2.30Ghz, 8GB RAM, and Ubuntu 18.04.4 system. We have performed the PSI-CA protocols in C++. We choose the set element size of 128 bits, the safety parameter λ=40, and the computational security parameter κ=40.
6.2 Performance analysis
We choose the protocols proposed by Nan CHENG et al. [27] and Li H et al. [28] to compare the computation cost with our PSI-CA protocol. And the set sizes of 210, 211, 212, 213 and 214 are selected for the five cases. We separately test the time cost by different participants to execute the protocol. A comparison of the overall computation cost of our PSI-CA protocol with other representative protocols can be obtained as shown in the following Table 2:
Table 2. Computational cost comparison
Fig. 8 and Fig. 9 show the time cost of the first protocol changes as the cardinality of the set held by each participant increases.
Fig. 8. Time cost of P1
Fig. 9. Time cost of P2
We can conclude that as the sets cardinality increases, the time cost by each participant in all three schemes increases linearly. Our scheme requires a slightly higher time cost than the other two schemes, but it can resist malicious behaviors by the malicious participants. And the interactive zero-knowledge proof protocol used in our scheme is bound to incur additional time cost. Therefore, the reader needs to make a reasonable tradeoff between security and time cost for this purpose.
In addition, we have also conducted comparative experiments on the accuracy of the results computed by the two protocols proposed in this paper under different conditions. The set size held by each participant is 210. The experimental results are shown in Table 3.
Table 3. Comparison of the accuracy of the results of the two protocols
Through two comparative experiments on the two protocols, it is not difficult to find that the mean relative error of the improved protocol's calculation results is smaller than the former, resulting in more accurate calculation results.
Also, we select the scheme of Lv S et al. [3], Nan CHENG et al. [27] for the functional comparison with our first protocol. In order to ensure fairness, our selected protocols are all local two-party PSI-CA protocols in recent years. The selected properties are: security model, the adversary model, security assumption, etc. We present the functional comparison results in Table 4. In recent years there has been less researches on PSI-CA protocols under the malicious model, and most PSI-CA protocols are based on the semi-honest model. Table 4 shows that our protocol is secure under the standard model, and the adversary model used is malicious model, based on DDH assumption, and the set size is hidden. Therefore, our protocol has higher practical application value compared with other protocols.
Table 4. Function comparison
Note: the explanation of the meaning of abbreviations in the table. STD: Standard Model. ROM: Random Oracle Model. DDH: Decisional Diffie-Hellman assumption. ECDLP: Elliptic Curve Discrete Logarithm Problem.
7. Conclusion
In this paper, we come up with two local two-party PSI-CA protocol, and prove the security of the first protocol under the malicious model. Our schemes solve the main problem of most protocols in the PSI-CA research field being unable to resist malicious behaviors. And we innovatively utilize Garbled Bloom Filter to improve the accuracy of intersection cardinality. Meanwhile, the performance of our protocol is analyzed and compared functions with other protocols to demonstrate the excellence and practicality of our protocol. In the future research, we will consider how to refine our improved protocol to make it more practical for security under the malicious model.
References
- A. C. Yao, "Protocols for secure computations," in Proc. of 23rd Annual Symposium on Foundations of Computer Science, pp. 160-164, 1982.
- GAO Ying, WANG Wei, "A Survey of Multi-party Private Set Intersection," Journal of Electronics & Information Technology, vol. 45, no. 5, pp. 1859-1872, 2023.
- Lv S, Ye J, Yin S, "Unbalanced private set intersection cardinality protocol with low communication cost," Future Generation Computer Systems, vol. 102, pp. 1054-1061, 2020.
- MEZZOUR G, PERRIG A, GLIGOR V D, "Privacy-Preserving Relationship Path Discovery in Social Networks," in Proc. of International Conference on Cryptology and Network Security, pp. 189-208, 2009.
- SHEN L, CHEN X, WANG D, "Efficient and Private Set Intersection of Human Genomes," in Proc. of the 2018 IEEE International Conference on Bioinformatics and Biomedicine, pp. 761-764, 2018.
- OU R, HAO M, "Efficient Private Set Intersection Using Point-Value Polynomial Representation," Security and Communication Networks, vol. 2020, no. 1, pp. 1-12, 2020.
- FREEDMAN M J, NISSIM K, PINKAS B, "Efficient Private Matching and Set Intersection," in Proc. of International Conference on the Theory and Applications of Cryptographic Techniques, pp. 1-19, 2004.
- PINKAS B, ROSULEK M, TRIEU N, "SpOT-Light: Lightweight Private Set Intersection from Sparse OT Extension," in Proc. of 39th Annual International Cryptology Conference, vol. 11694, pp. 401-431, 2019.
- SONG X, GAI M, ZHAO S, "Privacy-Preserving Statistics Protocol for Set-Based Computation," Journal of Computer Research and Development, vol. 57, no. 10, pp. 2221-2231, 2020.
- VOS J, CONTI M, ERKIN Z, "Fast multi-party private set operations in the star topology from secure ANDs and ORs," Cryptology ePrint Archive, 2022.
- Zhang Lei, He Chongde, Wei Lifei, "Efficient and Malicious Secure Three-Party Private Set Intersection Computation Protocols for Small Sets," Journal of Computer Research and Development, vol. 59, no. 10, pp. 2286-2298, 2022.
- Ben-Efraim A, Nissenbaum O, Omri E, "PSImple: Practical multiparty maliciously-secure private set intersection," in Proc. of the 2022 ACM on Asia Conference on Computer and Communications Security, pp. 1098-1112, May, 2022.
- Abadi A, Terzis S, Dong C, "O-PSI: delegated private set intersection on outsourced datasets" in Proc. of ICT Systems Security and Privacy Protection: 30th IFIP TC 11 International Conference, pp. 3-17, 2015.
- Abadi A, Terzis S, Dong C, "VD-PSI: verifiable delegated private set intersection on outsourced private datasets" in Proc. of Financial Cryptography and Data Security: 20th International Conference, pp. 149-168, 2017.
- YANG X, LUO X, WAN X A, "Improved outsourced private set intersection protocol based on polynomial interpolation," Concurrency and Computation: Practice and Experience, vol. 30, no. 1, 2017.
- Wei LF, Wang Q, Zhang L, Chen CC, Chen YJ, Ning JT, "Efficient Private Set Intersection Protocols with Semi-trusted Cloud Server Aided," Journal of Software, vol. 34, no. 2, pp. 932-944, 2023.
- Egert R, Fischlin M, Gens D, "Privately computing set-union and set-intersection cardinality via bloom filters," in Proc. of Information Security and Privacy: 20th Australasian Conference, pp. 413-430, 2015.
- Ion M, Kreuter B, Nergiz E, "Private intersection-sum protocol with applications to attributing aggregate ad conversions," Cryptology ePrint Archive, 2017.
- Davidson A, Cid C, "An efficient toolkit for computing private set operations," in Proc. of Information Security and Privacy: 22nd Australasian Conference, pp. 261-278, 2017.
- DEBNATH S K, DUTTA R, "Efficient Private Set Intersection Cardinality in the Presence of Malicious Adversaries," in Proc. of the International Conference on Provable Security, pp. 326-329, 2015.
- DEBNATH S K, DUTTA R, "Secure and Efficient Private Set Intersection Cardinality Using Bloom Filter," in Proc. of the International Conference on Information Security, pp. 209-226, 2015.
- ElGamal T, "A public key cryptosystem and a signature scheme based on discrete logarithms," IEEE transactions on information theory, vol. 31, no. 4, pp. 469-472, 1985. https://doi.org/10.1109/TIT.1985.1057074
- BLOOM, BURTON H, "Space/time trade-offs in hash coding with allowable errors," Communications of the ACM, vol. 13, no. 7, pp. 422-426, 1970. https://doi.org/10.1145/362686.362692
- TARKOMA S, ROTHENBERG C E, LAGERSPETZ E, "Theory and Practice of Bloom Filters for Distributed Systems," IEEE Communications Surveys & Tutorials, vol. 14, no. 1, pp. 131-155, 2012.
- WEI L, LIU J, ZHANG L, "Survey of Privacy Preserving Oriented Set Intersection Computation," Journal of Computer Research and Development, vol. 59, no. 8, pp. 1782-1799, 2022.
- DWIVEDI A D, SINGH R, GHOSH U, "Privacy preserving authentication system based on non-interactive zero knowledge proof suitable for Internet of Things," Journal of Ambient Intelligence and Humanized Computing, vol. 13, no. 10, pp. 4639-4649, 2022.
- Nan CHENG, Yun-Lei ZHAO, "Efficient Approach Regarding Two-Party Privacy-Preserving Set Union/Intersection Cardinality," Journal of Cryptologic Research, vol. 8, no. 2, pp. 352-364, 2021.
- Li H, Gao Y, "Efficient Private Set Intersection Cardinality Protocol in the Reverse Unbalanced Setting," in Proc. of Information Security: 25th International Conference, pp. 20-39, 2022.
- Chandran N, Dasgupta N, Gupta D, "Efficient Linear Multiparty PSI and Extensions to Circuit/Quorum PSI," in Proc. of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 1182-1204, 2021.
- Davi Resende A C, de Freitas Aranha D, "Faster unbalanced Private Set Intersection in the semihonest setting," Journal of Cryptographic Engineering, vol. 11, pp. 21-38, 2021. https://doi.org/10.1007/s13389-020-00242-7