1. Introduction
Division property is a novel technique proposed by Todo [1] at EUROCRYPT 2015, which is a powerful technique to find integral distinguishers and has been applied to many ciphers. Division property could precisely depict the implicit features between traditional ALL and BALANCED properties [2]. With the division property, Todo achieved the first theoretical key recovery attack on full MISTY1 [3]. Furthermore, the integral distinguisher of generalized Feistel structures can be also found by division property and got more rounds integral distinguishers against LBlock and TWINE in [4].
Although the division property is more powerful to find intgral distinguishers than other methods, it search integral distinguishers at the word level. At FSE 2016, Todo and Morii [5] proposed a new technique that decomposes word-based division property, and they call that technique as a bit-based division property. The bit-based division property includes bit-based division property (CBDP) and bit-based division property using three-subset (BDPT). Specifically, the parity of \(\bigoplus_{\boldsymbol{x} \in \mathbb{X}} \pi_{\boldsymbol{u}}(\boldsymbol{x})\) is 0 or unknown based on the CBDP, and the parity of \(\bigoplus_{\boldsymbol{x} \in \mathbb{X}} \pi_{\boldsymbol{u}}(\boldsymbol{x})\) is 0, 1 and unknown based on the BDPT. In BDPT, the unknown set in CBDP is divided into the odd-parity set and the unknown set that means more information can be traced. Know then, BDPT is more powerful to find integral distinguisher than two-subset. For SIMON-32 [6], the 13-round integral distinguisher is found by CBDP while one more round can be found by BDPT than CBDP.
According to the result of Todo et al., the bit-based division property is quite effective to find integral distinguishers. Unfortunately, as pointed out in [5], the bit-based division property is upper bounded by 2nfor an n-bit primitives, so it can not apply bit-based division property to some ciphers with large block sizes. At ASIACRYPT 2016, Xiang et al. [7] has solved that problem by utilizing MILP method. With this method, the primitives whose block sizes are larger than 32 can be analyzed. Later on, Sun et al. [8] convert the searching distinguisher problem into SAT/SMT problem [9]and use the automatic tool to solve it for ARX-based block ciphers.
It is noticed that there is no method to convert the BDPT into the automatic search model. Additionally, no research has been found that surveyed the method to add some new vectors into \(\mathbb{K}^{*}\) and remove some redundant vectors in \(\mathbb{L}\) based the automatic search tool when the cipher contains the Key-XOR operation. To overcome this problem, many further pieces of research have occurred. Hu et al. [10] propose a new method named the variant three-subset division property (VTDP) and they do not remove the vectors that appear even number of times. Thus, the VTDP is weaker than BDPT. At ASIACRYPT 2019, Wang et al.[11] introduced a new method named pruning techniques which can remove vectors that appear even number of times, and they modelled the BDPT based on MILP.
Our contributions. In this paper, we search the integral distinguisher of SIMON-32/-48/- 64/-96, SIMON (102)-32/-48/-64, SIMECK-32/-48/-64, LBlock, GIFT and Khudra and get some more accurate results than before. We transfer the problem of security analysis to mathematical problem and use the automatic tool to work out mathematical problem. The contributions are listed as follows:
Model for \(\mathbb{L}\) and S-box. To search integral distinguishers by automatic tool, we should convert the propagation of BDPT to mathematical models. For \(\mathbb{K}\), the models are the same as those in [8]. For \(\mathbb{L}\), we build the models for COPY, XOR, AND respectively. We also use the same method as [7,11] to get all division trials of S-box. After getting all division trails of Sbox, we use the Conjunctive Normal Form (CNF) that are the input of Cryptominisat solver to model the division trails.
We find the following new results.
1. Let SIMON-2n/SIMECK-2n be the SIMON/SIMECK block ciphers where 2n is block sizes, and n takes vaules form 16, 24, 32 and 48/16, 24 and 32. SIMON(102)-2n [12] is a variant of SIMON-2n family. SIMON(102)-2n alters the rotation from (1, 8, 2) to (1, 0, 2). For SIMON(102)-32/-48/-64, [10] points out that some bits are constant. In this paper, we can know that these bits are odd or balanced.
2. For GIFT, when the data complexity of the input is 261chosen plaintexts, we can find 4 more balanced bits than befoe in [13]. And when the data complexity of the input is 263 chosen plaintexts, we can find 2 more balanced bits than befoe.
3. For LBlock, under the same number of rounds as before, we can obtain more integral distinguishers on 17-round.
3. For LBlock, under the same number of rounds as before, we can obtain more integral distinguishers on 17-round.
5. For SIMON-2n and SIMECK-2n, the distinguishers that we find are the same as before. Our results and comparisons are shown in Table 1.
Table 1. Summarization of integral distinguishers for some block ciphers
1 For SIMON(102), [10] points out that some bits are constant. In this paper,we can determine that these bits are odd or balanced.
2 For LBlock, we can find some other integral distinguishers that are not the same as before.
3The platforms that experiments are implemented are listed as follows.
[10]:Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz and 96 GB memory.
[11]: Intel Celeron CPU 1007U I5 4590 @1.5GHz, 6.00 RAM, 64-bit Windows system.
This paper: Intel(R) Core(TM) i7-8700 CPU @3.19 GHz, 64-bit Windows system.
Organization of the paper. This overall structure is as follows: We introduce some notations and division property in Section 2. Section 3 introduces the models of some basic primitives based on CNF and gives a search algorithm. Section 4 shows some applications of our models. Section 5 Summarizes the results of the study.
2. PRELIMINARIES
2.1 Notation
All notations used in this paper will be introduced in this subsection. \(\mathbb{F}_{2}\) denotes the smallest finite filed, and the two elements are zero and one. \(\mathbb{F}_{2}^{n}\) denotes the n-bit string that only contains zero or one, and \(a[i]\) denotes the i-th bit of a, where a is an n-bit vector in \(\mathbb{F}_{2}^{n}\). The Hamming weigh \(w(a)\) of a is calculated as \(\sum_{0}^{n-1} a[i]\). For any \(\boldsymbol{a}=\left(a_{0}, a_{1}, \cdots, a_{m-1}\right) \in \mathbb{F}_{2}^{n_{0}} \times \cdots \times \mathbb{F}_{2}^{n_{m-1}}\) , \(\boldsymbol{W}(\boldsymbol{a})=\left(w\left(a_{0}\right), \cdots, w\left(a_{m-1}\right)\right)\) denotes the vectorial Hamming weight of a. Let \(\boldsymbol{k}=\left(k_{0}, k_{1}, \cdots, k_{m-1}\right)\) and \(\boldsymbol{k}^{*}=\left(k_{0}^{*}, k_{1}^{*}, \cdots, k_{m-1}^{*}\right)\). If \(k_{i} \geq k_{i}^{*}\) holds for all \(i \in\{0,1, \cdots, m-1\}\), we have \(k \succcurlyeq k^{*}\); otherwise we have \(k \succcurlyeq k^{*}\). \(\mathbb{K}\) and \(\mathbb{L}\) denote the set of \(\boldsymbol{k}\) and , respectively, and \(\boldsymbol{l}\), respectively, and \(l=\left(l_{0}, l_{1}, \cdots, l_{m-1}\right)\).
Bit product function.[1] \(\pi_{u}(x)\) denotes bit product function from \(\mathbb{F}_{2}^{n}\) to \(\mathbb{F}_{2}\). For any \(u \in \mathbb{F}_{2}^{n}\) and \(x \in \mathbb{F}_{2}^{n}, \pi_{u}(x)\) is defined as:
\(\pi_{u}(x)=\prod_{i=0}^{n-1} x[i]^{u[i]}\) (1)
When the value of \(u[i]\) equals zero, \(x[i]^{u[i]}\) equals one. Otherwise the result of \(x[i]^{u[i]}\) equals \(x[i]\).
For any \(\boldsymbol{u}=\left(u_{0}, \cdots, u_{m-1}\right) \in\left(\mathbb{F}_{2}^{n_{0}} \times \cdots \times \mathbb{F}_{2}^{n_{m-1}}\right)\) and \(\boldsymbol{x}=\left(x_{0}, \cdots, x_{m-1}\right) \in \left(\mathbb{F}_{2}^{n_{0}} \times \cdots \times \mathbb{F}_{2}^{n_{m-1}}\right), \pi_{u}(\boldsymbol{x})\) , \(\pi_{u}(\boldsymbol{x})\) is a function mapping from \(\mathbb{F}_{2}^{n_{0}} \times \cdots \times \mathbb{F}_{2}^{n_{m-1}}\) to \(\mathbb{F}_{2}\), and \(\pi_{u}(\boldsymbol{x})\) is defined as:
\(\pi_{\boldsymbol{u}}(\boldsymbol{x})=\prod_{i=0}^{m-1} \pi_{u_{i}}\left(x_{i}\right)\) (2)
Subset \(\mathbb{S}_{k}^{n}\). The \(\mathbb{S}_{k}^{n}\) is a subset of \(\mathbb{F}_{2}^{n}\) for any integer \(k \in\{0,1,2, \cdots, n\}\). The subset \(\mathbb{S}_{k}^{n}\) is a set of all \(u \in \mathbb{F}_{2}^{n}\) satisfying \(w(u) \geq k\), and it is defined as
\(\mathbb{S}_{k}^{n}=\left\{u \in \mathbb{F}_{2}^{n} \mid w(u) \geq k\right\}.\) (3)
2.2 Division property
Definition 1. (division property [1]). Let \(\mathbb{X}\) be a multiset with values in \(\left(\mathbb{F}_{2}^{n}\right)^{m}\). When the multiset \(\mathbb{X}\) has the division property \(\mathcal{D}_{\mathbb{K}}^{n, m}\), where \(\mathbb{K}\) is a set of m‐dimensional vectors whose ith element takes a value between 0 and n. it fulfills the following conditions:
\(\bigoplus_{\boldsymbol{x} \in \mathrm{X}} \pi_{\boldsymbol{u}}(\boldsymbol{x})=\left\{\begin{array}{l} \text { unknown, if there exists } \boldsymbol{k} \in \mathbb{K} \text { s.t. } \boldsymbol{u} \succcurlyeq \boldsymbol{k}, \\ 0, \quad \text { otherwise } \end{array}\right.\) (4)
If there are \(\boldsymbol{k}\) and \(\boldsymbol{k}^{\prime}\) belonging to \(\mathbb{K}\) and satisfying \(k \succcurlyeq \boldsymbol{k}^{\prime}\), the \(\boldsymbol{k}\) will be considered as a redundancy, and it should be removed from \(\mathbb{K}\).
It notes that division property is regarded as bit-based division property when n is restricted to 1.
Propagation rules
Rule 1 (COPY in CBDP). The COPY function is defined as: \(\left(y_{1}, y_{2}\right)=(x, x)\), where x takes value from \(\mathbb{F}_{2}^{1}\). \(\mathbb{X}\) denotes input multiset and \(\mathbb{Y}\) denotes output multiset. Assuming that the CBDP of the input multiset \(\mathbb{X}\) is \(D_{\{k\}}^{1}\), then the output multiset \(\mathbb{Y}\) has CBDP as \(\mathcal{D}_{\mathbb{K}^{\prime}}^{1,1}\) as follows:
\(\mathbb{K}^{\prime}=\left\{\begin{array}{ll} \{0,0\} & \text { if } k=0 \\ \{(1,0),(0,1)\} & \text { if } k=1 \end{array}\right.\) (5)
Rule 2 (XOR in CBDP). The XOR function is defined as: \(y=x_{1} \oplus x_{2}\), where \(\left(x_{1}, x_{2}\right)\) takes value from \(\left(\mathbb{F}_{2}^{1} \times \mathbb{F}_{2}^{1}\right)\). \(\mathbb{X}\) denotes input multiset and \(\mathbb{Y}\) denotes output multiset. Assuming that the CBDP of the input multiset \(\mathbb{X}\) is \(\mathcal{D}_{\{\boldsymbol{k}\}}^{1,1}\), then the output multiset \(\mathbb{Y}\) has CBDP \(\mathcal{D}_{K}^{1}\), that is calculated as follows:
\(\mathbb{K}^{\prime}=\left\{\begin{array}{ll} \{(0)\} & \text { if } \boldsymbol{k}=(0,0), \\ \{(1)\} & \text { if } \boldsymbol{k}=(0,1) or\boldsymbol{k}=(1,0), \\ \emptyset & \text { if } \boldsymbol{k}=(1,1). \end{array}\right.\) (6)
Rule 3 (AND in CBDP). The AND function is defined as: \(y=x_{1} \wedge x_{2}\), where \(\left(x_{1}, x_{2}\right)\) takes value from \(\left(\mathbb{F}_{2} \times \mathbb{F}_{2}\right)\). \(\mathbb{X}\) denotes input multiset and \(\mathbb{Y}\) denotes output multiset. Assuming that the CBDP of the input multiset \(\mathbb{X}\) is \(\mathcal{D}_{\{\boldsymbol{k}\}}^{1,1}\), then the output multiset \(\mathbb{Y}\) has CBDP \(\mathcal{D}_{K}^{1}\), that is calculated as follows:
\(\mathbb{K}^{\prime}=\left\{\begin{array}{ll} \{(0)\} & \text { if } \boldsymbol{k}=(0,0), \\ \{(1)\} & \text { otherwise }. \end{array}\right.\) (7)
For more details of those rules, please refer to [1,5].
Definition 2. (Division trail [7]). Let f denote the round function of block cipher. Assuming that input multiset \(\mathbb{X}\) has CBDP \(\mathcal{D}_{\{\boldsymbol{k}\}}^{1, n}\), and after i-round propagations, \(\mathcal{D}_{\mathbb{K}_{i}^{\prime}}^{1, n}\) denotes the CBDP of output multiset, thus, we have the trail of division property propagations: \(\{\boldsymbol{k}\} \stackrel{\text { def }}{=} \mathbb{K}_{0} \stackrel{f}{\rightarrow} \mathbb{K}_{1} \stackrel{f}{\rightarrow} \mathbb{K}_{2} \stackrel{f}{\rightarrow} \cdots \stackrel{f}{\rightarrow} \mathbb{K}_{i} \cdots \stackrel{f}{\rightarrow} \mathbb{K}_{r-1}\). Moreover, for any vector \(k_{i} \in \mathbb{K}_{i}\), there must exist a vector \(\boldsymbol{k}_{i-1} \in \mathbb{K}_{i-1}\) which can propagate to \(\boldsymbol{k}_{i}\) for any \(i \in\{0,1,2,3, \cdots, r-1\}\), then we call \(k_{0} \rightarrow k_{1} \rightarrow \cdots \rightarrow k_{r-1}\) is an r- round division trail.
2.3 SAT-aided CBDP
Division property needs to find all division trails, which leads to high computational and memory complexities. One of the most effective methods that solve above problem is the automatic search technique. Sun et al [8] transfer the problem of security analysis to SAT/SMT problem and use the automatic tool to work out that problem based on CBDP. In this subsection, we will review how to model some primitives, such as COPY, XOR, AND. SAT Mode 1 (COPY). The COPY function copies a bit x to y1 and y2. According to the Rule 1, we can get all valid transitions (0,0,0), (1,0,1) and (1,1,0). The following CNFs depict the division trail of COPY:
\(\left\{\begin{array}{l} \neg y_{1} \vee \neg y_{2}=1 \\ x \vee y_{1} \vee \neg y_{2}=1 \\ x \vee \neg y_{1} \vee y_{2}=1 \\ \neg x \vee y_{1} \vee y_{2}=1 \end{array}\right.\) (8)
SAT Mode 2 (XOR). The XOR function creates \(y=x_{1} \ominus x_{2}\). According to the Rule 2,we can get all valid transitions (0,0,0), (0,1,1), (1,0,1). The following CNFs depict the division trail of XOR:
\(\left\{\begin{array}{l} \neg x_{1} \vee \neg x_{2}=1 \\ x_{1} \vee x_{2} \vee \neg y=1 \\ x_{1} \vee \neg x_{2} \vee y=1 \\ \neg x_{1} \vee x_{2} \vee y=1 \end{array}\right.\) (9)
SAT Mode 3 (AND). The AND function creates \(\mathrm{y}=x_{1} \wedge x_{2}\). According to the Rule 3, we can get all valid transitions (0,0,0), (0,1,1), (1,0,1), (1,1,1). The following CNFs depict the division trail of AND:
\(\left\{\begin{array}{l} \neg x_{2} \vee y=1 \\ x_{1} \vee x_{2} \vee y=1 \\ \neg x_{1} \vee y=1 \end{array}\right.\) (10)
Initial division property and Stopping rules of CBDP. Using automatic tool to search integral distinguisher needs to set initial division property and proper stopping rules. Initial division property. Assuming that \(\left(a_{0}^{0}, a_{1}^{0}, a_{2}^{0}, \cdots, a_{n-1}^{0}\right) \rightarrow \cdots \rightarrow\left(a_{0}^{r}, a_{1}^{r}, a_{2}^{r}, \cdots, a_{n-1}^{r}\right)\) is an r-round division trail, where n is the length of cipher. Let the initial division property be denoted as \(\mathcal{D}_{\boldsymbol{k}}^{1, n}\) and \(\boldsymbol{k}=\left(k_{0}, k_{1}, \cdots, k_{n-1}\right)\). Then we can set S
\(a_{i}^{0}=k_{i}, i=0,1,2,3, \cdots, n-1\) (11)
Stopping rules. To check the division property of the m-th \((0 \leq m \leq n-1)\) output bit, we need to add proper constraints on \(a_{i}^{r}(i=0,1,2, \cdots, n-1)\) that
\(a_{i}^{r}=\left\{\begin{array}{ll} 1 & \text { if } i=m, \\ 0 & \text { otherwise }. \end{array}\right.\) (12)
After setting initial division property and stop rules, if automatic tool has a solution, the division property of the m-th output bit is unknown; otherwise, we regard that the division property of the m-th output bit is balanced.
2.4 SAT-aided method of CBDP
For improving the efficiency of searching distinguisher, we should convert the searching distinguisher problem into mathematical models. Fisrt, we need to get a CNF set which describes the r-round division trails. Then, we set the initial division property and stop division property. The stop division property is \(\mathcal{D}_{k_{r}}^{1, n}\), where \(\boldsymbol{k}_{r}=\left(a_{0}^{r}=0, a_{1}^{r}=0, \cdots, a_{i}^{r}=1, \cdots, a_{n-1}^{r}=0\right)\). If C has a solution, it indicates that the integral property of the i-th bit is unknown for the output of r-round cipher. Otherwise, the integral property of the i-th bit is balanced. The following Algorithm shows the detailed process.
2.5 BDPT
Definition 3. (BDPT [5]). Let \(\mathbb{X}\) be a multiset with values in \(\left(\mathbb{F}_{2}\right)^{m}\), \(k\) and \(l\) denote mdimensional vectors whose i-th element take 0 or 1. When the division property of multiset \(\mathbb{X}\) is \(\mathcal{D}_{\mathrm{K}, \mathbb{L}}^{1, m}\), it fulfils the following conditions:
\(\bigoplus_{\boldsymbol{x} \in \mathrm{X}} \pi_{u}(\boldsymbol{x})=\left\{\begin{array}{l} \text { unknown, if there exists } \boldsymbol{k} \in \mathbb{K} \text { s.t. } \boldsymbol{u} \succcurlyeq \boldsymbol{k}, \\ 1, \quad\quad\quad \text { else if there is } \boldsymbol{l} \in \mathbb{L} \text { s.t. } \boldsymbol{u}=\boldsymbol{l}, \\ 0, \quad\quad\quad\text { otherwise. } \end{array}\right.\) (13)
It notes that if there is a \(k \in \mathbb{K}\) satisfying \(u \succcurlyeq \boldsymbol{k}\), \(\bigoplus_{\boldsymbol{x} \in \mathbb{X}} \pi_{\boldsymbol{u}}(\boldsymbol{x})\) is unknown even if there is a \(l \in \mathbb{L} \text { satisfying } l=\boldsymbol{u}\).
Propagation rules
Rule 4 (COPY in BDPT [5]). The COPY function F is defined as: \(\boldsymbol{y}=F(\boldsymbol{x})\), where \(\boldsymbol{x}=\left(x_{1}, x_{2}, \cdots, x_{n}\right)\) and \(\boldsymbol{y}=\left(x_{1}, x_{1}, x_{2}, x_{3}, \cdots, x_{n}\right)\). Assuming that input multiset \(\mathbb{X}\) has BDPT \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\), and output multiset \(\mathbb{Y}\) has BDPT \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathrm{L}^{\prime}}^{1, n+1}\), then \(\mathbb{K}^{\prime}\) and \(\mathbb{L}^{\prime}\) are computed as:
\(\begin{array}\mathbb{K}^{\prime} \leftarrow\left\{\begin{array}{ll} \left(0,0, k_{2}, \cdots, k_{n}\right), & \text { if } k_{1}=0 \\ \left(1,0, k_{2}, \cdots, k_{n}\right),\left(0,1, k_{2}, \cdots, k_{n}\right), & \text { if } k_{1}=1 \end{array}\right.\\ \mathbb{L}^{\prime} \leftarrow\left\{\begin{array}{l} \left(0,0, l_{2}, \cdots, l_{n}\right), \quad \quad \quad \quad \quad \text { if } l_{1}=0\\ \left(1,0, l_{2}, \cdots, l_{n}\right),\left(0,1, l_{2}, \cdots, l_{n}\right), \\ \left(1,1, l_{2}, \cdots, l_{n}\right) . \end{array}\right\}\text { if } l_{1}=1 \end{array}\) (14)
from all \(k \in \mathbb{K}\) and all \(l \in \mathbb{L}\), respectively.
Rule 5 (AND in BDPT [5]). The AND function F is defined as: \(y=F(\boldsymbol{x})\), where \(\boldsymbol{x}=\left(x_{1}, x_{2},\right. \left.\cdots, x_{n}\right)\) and \(\boldsymbol{y}=\left(x_{1} \wedge x_{2}, x_{3}, \cdots, x_{n}\right)\). Assuming that input multiset \(\mathbb{X}\) has BDPT \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\), and output multiset \(\mathbb{Y}\) has BDPT \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathrm{L}^{\prime}}^{1, n-1}\), then \(\mathbb{K}^{\prime}\) and is computed from all \(k \in \mathbb{K}\) as:
\(\mathbb{K}^{\prime} \leftarrow\left(\left\lceil\frac{k_{1}+k_{2}}{2}\right\rceil, k_{3}, k_{4}, \cdots, k_{n}\right)\) (15)
Moreover, \(\mathbb{L}^{\prime}\) is computed from all \(l \in \mathbb{L}\) satisfying \(\left(l_{1}, l_{2}\right)=(0,0)\) or \((1,1)\) as:
\(\mathbb{L}^{\prime} \leftarrow\left(\left\lceil\frac{l_{1}+l_{2}}{2}\right\rceil, l_{3}, l_{4}, \cdots, l_{n}\right)\) (16)
Rule 6 (XOR in BDPT [5]). The XOR function F is defined as: \(\boldsymbol{y}=F(\boldsymbol{x})\), where \(\boldsymbol{x}=\left(x_{1}, x_{2},\right. \left.\cdots, x_{n}\right)\) and \(\boldsymbol{y}=\left(x_{1} \oplus x_{2}, x_{3}, \cdots, x_{n}\right)\). Assuming that input multiset \(\mathbb{X}\) has BDPT \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\), and output multiset \(\mathbb{Y}\) has BDPT \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathrm{L}^{\prime}}^{1, n-1}\), then \(\mathbb{K}^{\prime}\) is computed from all \(k \in \mathbb{K}\) which satisfies \(\left(k_{1}, k_{2}\right)=(0,0),(1,0)\) or \((0,1)\) as:
\(\mathbb{K}^{\prime} \leftarrow\left(k_{1}+k_{2}, k_{3}, k_{4}, \cdots, k_{n}\right)\) (17)
Moreover, \(\mathbb{L}^{\prime}\) is computed from all \(l \in \mathbb{L}\) satisfying \(\left(l_{1}, l_{2}\right)=(0,0),(1,0)\) or \((0,1)\) as:
\(\mathbb{L}^{\prime} \stackrel{x}{\leftarrow}\left(l_{1}+l_{2}, l_{3}, l_{4}, \cdots, l_{n}\right)\) (18)
where \(\mathbb{L} \stackrel{x}{\leftarrow} \boldsymbol{l}\) means
\(\mathbb{L} \leftarrow\left\{\begin{array}{ll} \mathbb{L} \cup l & \text { if the original } \mathbb{L} \text { does not include } \boldsymbol{l}, \\ \mathbb{L} \backslash l & \text { if the original } \mathbb{L} \text { includes } l . \end{array}\right.\)
Rule 7 (XOR with secret round key [5]). Let the input and output multiset be \(\mathbb{X}\) and \(\mathbb{Y}\) satisfying \(\mathcal{D}_{\mathbb{K} . \mathbb{L}}^{1^{n}}\) and \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathbb{L}^{\prime}}^{n}\) respectively. Assuming that the round key is XORed with the i-th bit \((0 \leq i \leq n)\), then \(\mathbb{K}^{\prime}\) and \(\mathbb{L}^{\prime}\) are calculated as follows:
\(\mathbb{K}^{\prime} \leftarrow\left(l_{0}, l_{1}, \cdots, l_{i} \vee 1, \cdots, l_{n-1}\right), \quad \mathbb{L}^{\prime}=\mathbb{L}\) (19)
for all \(l \in \mathbb{I}\) satisfying \(l_{i}=0\).
Rule 8 (The S-box for \(\mathbb{K}\) and \(\mathbb{L}\)). Let \(F_{2}^{m} \rightarrow F_{2}^{n}\) be a substitution function, where the input \(\boldsymbol{x}=\left(x_{1}, x_{2}, x_{3}, \cdots, x_{m}\right)\) and the output \(\boldsymbol{y}=\left(y_{1}, y_{2}, y_{3}, \cdots, y_{n}\right)\). Then is calculated as:
\(\begin{array}{cc} y_{1}= & f_{1}\left(x_{1}, x_{2}, x_{3}, \cdots, x_{m}\right), \\ \vdots \\ y_{n}= & f_{n}\left(x_{1}, x_{2}, x_{3}, \cdots, x_{m}\right). \end{array}\)
Assuming that the input multiset \(\mathbb{X}\) has the BDPT \(\mathcal{D}_{\mathbb{K} . \mathbb{L}}^{1,{n}}\), , the output multiset \(\mathbb{Y}\) has the BDPT \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathbb{L}^{\prime}}^{1, n}\). For each vector \(\boldsymbol{k} \in \mathbb{K}\), decide for each vector \(\boldsymbol{v} \in \mathbb{F}_{2}^{n}\) whether the polynomial \(\pi_{v}(\boldsymbol{y})\) contains any monomial \(\pi_{k^{\prime}}(\boldsymbol{x})\) where \(k^{\prime} \succcurlyeq k\). If so, \(\left(\boldsymbol{k}^{\prime}, \boldsymbol{v}\right)\) is regard as a division trail of the Sbox and \(\boldsymbol{v}\) will be appended to \(\mathbb{K}^{\prime}\). For each vector \(l \in \mathbb{L}\), , decide for each vector \(\boldsymbol{v} \in \mathbb{F}_{2}^{n}\) whether the polynomial \(\pi_{v}(\boldsymbol{y})\) contains monomial \(\pi_{l}(\boldsymbol{x})\). Then, \((\boldsymbol{l}, \boldsymbol{v})\) is regard as a division trail of the S-box and \(\boldsymbol{v}\) will be appended to \(\mathbb{L}^{\prime}\). For more details of S-box, please refer to [11,7].
2.6 Prunning techniques
The round function with many operations will generate many redundant division properties. With the increase of the round numbers or the size of the block cipher, it is infeasible to get all division trails. To overcome this problem, we introduce the prunning techniques which were proposed by Wang et al. [11]
Theorem 1 (Prunning techniques of \(\mathbb{K}\)[11]). For an -round cipher \(E\), let \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\) be the input division property of \(E\). If \(k \in \mathbb{K}\) cannot output the unit vector \(\boldsymbol{e}\) of \(E\) based on CBDP eventually, then \(\mathcal{D}_{\mathrm{K} \rightarrow \boldsymbol{k}, \mathrm{L}}^{1, n}\) will not generate the vector \(e \in \mathbb{K}_{r+1,0}\) based on BDPT, where \(\mathbb{K} \rightarrow k\) denotes removing \(k\) from \(\mathbb{K}\).
Theorem 2 (Prunning techniques of [11]). For an -round cipher \(E\), let \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\) be the input division property of \(E\). If \(k \in \mathbb{K}\) cannot output the unit vector \(\boldsymbol{e}\) of \(E\) based on CBDP eventually, then \(\mathcal{D}_{\mathbb{K}, \mathrm{L} \rightarrow l}^{1, n}\) will not generate the vector \(e \in \mathbb{K}_{r+1,0}\) based on BDPT, where \(\mathbb{L} \rightarrow l\) denotes removing \(\boldsymbol{l}\) from \(\mathbb{L}\).
The BDPT is more powerful to find integral property than CBDP. However, it is infeasible to search integral distinguisher of some ciphers with large block sizes. By the “fast propagation”, Wang [11] can resolve this problem against BDPT.
Definition 4 (Fast propagation [11]). Let \(\mathcal{D}_{K, \mathbb{L}}^{1, n}\) be input division property of cipher E, and we translate the BDPT into CBDP \(\mathcal{D}_{\bar{\mathbb{K}}}^{1, n}\) where \(\overline{\mathbb{K}}=\mathbb{K} \cup \mathbb{L} .\) The output division property of E is computed from all vectors of \(\overline{\mathbb{K}}=\mathbb{K} \cup \mathbb{L}\) based on CBDP.
3. Modeling for the BDPT
Due to much time complexity, it is infeasible to search integral property of some ciphers that have large block sizes. To overcome the limit of the huge complexity, the researchers begin to transfer the problem of security analysis to mathematical problem and use the automatic tool to work out mathematical problem, such as SAT, MILP, CP etc.
3.1 SAT-aided BDPT
In order to use automatic tools to improve efficiency, we should model the primitives of cipher by CNFs.
Assuming that f is the round function of the block cipher, let \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\) be the BDPT of input and \(\mathcal{D}_{\mathrm{\mathbb{K}}^{\prime}, \mathbb{L}^{\prime}}^{1, n}\), be the BDPT of output multiset. We use CNF to constrain division trails of \(\mathbb{K} \stackrel{f}{\rightarrow} \mathbb{K}^{\prime}\) and \(\mathbb{L} \stackrel{f}{\rightarrow} \mathbb{L}^{\prime}\) based on the propagation rules of BDPT, respectively. Those models for \(\mathbb{K}\) are the same as before [8], therefore, we do not introduce how to model for \(\mathbb{K}\). We only give models for \(\mathbb{L}\) in this subsection.
SAT Model 4 (COPY for \(\mathbb{L}\)). Let f be a COPY function. \(x \stackrel{f}{\rightarrow}\left(y_{1}, y_{2}\right)\) denotes a COPY operation where \(y_{1}=x\) and \(y_{2}=x\). Assuming that input multiset has \(\mathcal{D}_{l}^{1}\), then the output multiset has \(\mathcal{D}_{(l, l),(0, l),(1, l-1), \cdots,(l, 0)}^{1}\) from Rule 4. When \(l=1 \), the output multiset has \(\mathcal{D}_{(0,1),(1,0),(1,1)}^{1}\), otherwise, if \(l=0\), the output multiset has \(\mathcal{D}_{(0,0)}^{1}\). Therefore, the following CNFs are depicted for \(\mathbb{L}\):
\(\left\{\begin{array}{l} x \vee \neg y_{1}=1 \\ x \vee y_{1} \vee \neg y_{2}=1 \\ \neg x \vee y_{1} \vee y_{2}=1 \end{array}\right.\) (20)
Apparently, all solutions of the above CNFs corresponding to \(\left(x, y_{1}, y_{2}\right)\) are \((0,0,0)\), \((1,0,1)\) and \((1,1,0)\), \((1,1,1)\). SAT Model 5 (AND for \(\mathbb{L}\)). Let f be an AND function. \(\left(x_{1}, x_{2}\right) \stackrel{f}{\rightarrow} y\) denotes the AND operation where \(y=x_{1} \& x_{2}\). Assuming that input multiset has \(\mathcal{D}_{l}^{1,2}\) where \(l=\left(l_{0}, l_{1}\right)\), then the output multiset has \(\mathcal{D}_{\lceil\frac{t_{1}+l_{2}}{2}\rceil} ^{1}\) from Rule 5. When \(l_{1}=1\) and \(l_{2}=1 \), the output multiset has \(\mathcal{D}_{1}^{1}\), otherwise if \(l_{1}=0\) and \(l_{2}=0\), the output multiset has \(\mathcal{D}_{0}^{1} \). As result, the following CNFs are depicted for \(\mathbb{L}\):
\(\left\{\begin{array}{l} \neg x_{2} \vee y=1 \\ \neg x_{1} \vee x_{2}=1 \\ x_{1} \vee \neg y=1 \end{array}\right.\) (21)
Apparently, all solutions of the above CNFs corresponding to \(\left(x_{1}, x_{2}, y \right)\) are \((0,0,0)\)and \((1,1,1)\). SAT Model 6 (XOR for \(\mathbb{L}\)). Let f be an XOR function. \(\left(x_{1}, x_{2}\right) \stackrel{f}{\rightarrow} y\) denotes the XOR operation where \(y=x_{1} \wedge x_{2}\). Assuming that input multiset has \(\mathcal{D}_{l}^{1,2}\) where \(l=\left(l_{0}, l_{1}\right)\), then the output multiset has \(\mathcal{D}_{l_{1}+l_{2}}^{1}\) from Rule 6. When \(l_{1}=0\) and \(l_{2}=0\), the output multiset has \(\mathcal{D}_{0}^{1}\) As a result, the following CNFs are depicted for \(\mathbb{L}\):
\(\left\{\begin{array}{l} \neg x_{1} \vee \neg x_{2}=1 \\ x_{1} \vee x_{2} \vee \neg y=1 \\ x_{1} \vee \neg x_{2} \vee y=1 \\ \neg x_{1} \vee x_{2} \vee y=1 \end{array}\right.\) (22)
Apparently, all solutions of the above CNFs corresponding to \(\left(x_{1}, x_{2}, y\right)\) are \((0,0,0)\), \((0,1,1)\) and \((1,0,1)\). Although Model 6 can get all division trails of XOR, vectors appearing even number of times are not eliminated, so it needs to eliminate these vectors after getting all division trails.
Representing the division trails of S-box as CNFs. According to Rule 8, it’s easy to get all division trails through S-box. Assuming that s is a substitution function that consists of an S-box withn -bit input and m-bit output. Let the input and output multiset of be and satisfying \(\mathcal{D}_{\mathbb{K}, \mathbb{L}}^{1, n}\) and \(\mathcal{D}_{\mathbb{K}^{\prime}, \mathbb{L}^{\prime}}^{1, m}\), respectively. We allocate variables to represent the input and output of s at the bit level. Then, we create a truth table for the variables. The truth table has a total of \(2^{n+m}\) elements. If an element of the truth table satisfies one of the division trails \(\boldsymbol{k} \longrightarrow \boldsymbol{k}^{\prime}\) where \(k \in \mathbb{K}, k^{\prime} \in \mathbb{K}^{\prime}\), then this element of the truth table is set to be true. Finally, it is easy to use CNFs to depict the division trails of \(\mathbb{K} \longrightarrow \mathbb{K}^{\prime}\). We can also use the same method to depict the division trails of \(\mathbb{L} \longrightarrow \mathbb{L}^{\prime}\).
So far, we have modelled the propagation rules of \(\mathbb{L}\) and the division trails of S-box. Thus, for some block ciphers based on simple primitives, we can construct CNFs to simulate a round of division property propagation.
3.2 Initial division property
When we convert searching the distinguisher problem to SAT problem, we need to use automatic tool to solve it.
Initial division property. In order to search the integral distinguisher with maximum number of rounds, initial division property should contain more active bits and less constant bits. Assuming that a cipher whose block size is n, we set the initial division property as follows:
\(\left\{\begin{array}{l} \mathbb{K}=\{1\} \text { for } i \in\{0,1,2, \cdots, n-1\} \\ \mathbb{L}=\left\{l_{i}=1 \text { and } l_{j}=0(j \neq i)\right\} \text { for each } j \in\{0, \cdots, n-1\} \end{array}\right..\) (23)
3.3 Stopping rules
Theorem 3 (Set without integral property [7]). Let \(\mathcal{D}_{\mathbb{K}}^{n_{0}, n_{1}, \cdots, n_{m-1}}\) denote the CBDP of multiset \(\mathbb{X}\). If \(\mathbb{K}\) contains all n unit vectors, then the multiset \(\mathbb{X}\) doesn’t have integral property.
According to Theorem 3, if \(\mathbb{K}\) contains all n unit vectors, the multiset \(\mathbb{X}\) does not have any useful integral property. The stop rule 1 is applied to \(\mathbb{K}\).
Stopping Rule 1 [11]. For the cipher E, if \(k \in \mathbb{K}\)can generate the output unit vector \(e_{m}\) based on the CBDP. Then according to the definition of the CBDP, we know that the integral property of m-th output bit is unknown for E.
Stopping Rule 2 [11]. For the cipher E, if \(l \in \mathbb{L}\) can generate the output unit vector \(e_{m}\) based on the CBDP, then \(\mathbf{l}\) should be the input division property of the next part. If all vectors of \(\mathbb{L}\) can not generate \(e_{m}\), we conclude that the integral property of m-th bit is balanced for E. Stopping Rule 3 [11]. If \(\mathbb{K}_{r+1,0}=\emptyset\) and \(\mathbb{L}_{r+1,0} \neq \emptyset\), then there is a new integral distinguisher whose xor-sum is odd.
Stopping rule 1 can help us find integral property of some bits is unknown, Stopping rule 2 can help us find integral property of some bits is balanced, Stopping rule 3 can find a new integral distinguisher whose xor-sum is odd.
3.4 Optimized cipher structure
For many ciphers, the size of round function is more than 32. If we get all division trails of one round based on BDPT, the computational complexity may be over 232. We should divide the cipher into small parts to decrease the computational complexity. Let \(Q_{i}\) denote the i-th round function of a cipher \(E=Q_{r} \cdot Q_{r-1} \cdots \cdots Q_{1}\), then we can divide \(Q_{i}\) into \(l_{i}\) parts \(Q_{i}=Q_{i, l_{i}-1} \cdot Q_{i, l_{i}-2} \cdots \cdots_{i, 0}\). Therefore, we have
\(E=\prod_{i=1}^{r} \prod_{j=0}^{l_{i-1}} Q_{i, j}\) (24)
Let \(E_{i, j}=\left(Q_{i, j-1} \cdot Q_{i, j-2} \cdots \cdots Q_{i, 0}\right) \cdot\left(Q_{i-1} \cdot Q_{i-2} \cdots \cdots Q_{1}\right)\) and \(\overline{E_{i, j}}=\left(Q_{r} \cdot Q_{r-1} \cdots \cdots Q_{i+1}\right) \cdot\left(Q_{i, l_{i}-1} \cdot Q_{i, l_{i}-2} \cdots \cdots Q_{i, j}\right)\). Then, the cipher \(E\) is denoted as \(E=\overline{E_{i, j}} \cdot E_{i, j}\). For a cipher \(E\), we only search all the division trails of \(Q_{i, j}\). In [11], the authors give a more detailed explanation.
Example: For SIMON-32, the round function is:
\(\left(L_{i}, R_{i}\right)=\left(\left(L_{i-1}^{\lll 1} \wedge L_{i-1}^{\lll 8}\right) \oplus L_{i-1}^{\lll 2} \oplus R_{i-1} \oplus k_{i}, L_{i-1}\right)\) (25)
Assume that the input is denoted as \(\left(L_{1}=x_{0}, x_{1}, \cdots x_{15}, R_{1}=x_{16}, x_{17} \cdots x_{31}\right)\). If we use an optimized cipher structure on the SIMON-32, then we get the following equations:
\(\begin{array}{c} Q_{1,0}=\left(x_{1} \wedge x_{8} \oplus x_{2} \oplus x_{16}\right), \\ Q_{1,1}=\left(x_{2} \wedge x_{9} \oplus x_{3} \oplus x_{17}\right), \\ \vdots \\ Q_{1,15}=\left(x_{0} \wedge x_{7} \oplus x_{1} \oplus x_{31}\right), \\ Q_{1,16}=\left(R_{1} \oplus k_{1}, L_{1}\right) . \end{array}\)
Then, we can reduce the computational complexity from 232 to 24.
3.5 The method of automatic search
According to the definition of “fast propagation”, we can model for the cipher. For an r-round cipher \(E=\overline{E_{i, j}} \cdot E_{i, j}\) , the first round of \(\overline{E_{i, j}}\) may be a partial round \(\left(Q_{i, l_{i}-1} \cdot Q_{i, l_{i}-2} \cdots \cdots Q_{i, j}\right)\). Let input multiset \(\mathbb{X}\) of \(\overline{E_{i, j}}\) have the BDPT \(\mathcal{D}_{\mathbb{K}_{i, j}, \mathrm{~L}_{i, j}}^{1, n}\), then \(\mathbb{X}\) must have the division property \(\mathcal{D}_{\mathbb{K}_{i, j}^{1}, n_{L_{i}, j}}^{\mathbf{n}}\). So we can use the propagation rules of CBDP to judge whether the vectors of \(\mathbb{L}_{i, j}\) can generate a unit vector. If the vectors of \(\mathbb{L}_{i, j}\) cannot generate a unit vector, those vectors should be eliminated according to Prunning techniques. If the vectors of \(\mathbb{L}_{i, j}\) can generate a unit vector, we get all division property \(\mathbb{L}_{i, j+1}\) of \(Q_{i, j}\)by those vectors of \(\mathbb{L}_{i, j}\) based on the propagation rules of BDPT. According to the rule of Key-XOR operation, some vectors will be added into \(\mathbb{K}_{i, j+1}\) from the vectors of \(\mathbb{L}_{i, j+1}\). Then \(\mathcal{D}_{\mathbb{K}_{i, j+1}, \mathbb{L}_{i, j+1}}^{1, n}\) is used as an input division property of \(\overline{E_{i, j+1}}\). Repeating the previous steps until the results are obtained. Fig. 1 can help the reader comprehend the “fast propagation”.
Fig. 1. The structure of “fast propagation”
3.6 Automatic search algorithm
We use the c++ interface of Cryptominisat to solve the SAT problem. The Cryptominisat is an SAT solver and can take "assumptions" as a parameter, so that we can set the initial division property and output division property. The Algorithm 2 is presented with pseudo code.
Algorithm 2 is explained as follows:
Line 3-5 By an optimized cipher structure, we divide the cipher \(E\) into some smaller parts, so that the division property will not propagate too many division trails.
Line 6-10 For any \(k \in \mathbb{K}_{i, j}\), we set \(\boldsymbol{k}\) to be the initial division property, and judge if bit-th bit is balanced. If STwoDP\(\left(\overline{E_{i, j}}, \boldsymbol{k}, \text { bit }\right)\) is balanced, according to Prunning techniques of \(\mathbb{K}\), weremove it from \(\mathbb{K}_{i, j}\).
Line 11 We initialize \(\mathbb{L}_{i, j}^{\prime}\) to an empty set.
Line 12-16 For any \(l \in \mathbb{L}_{i, j}\), we set \(\boldsymbol{l}\) to be the initial division property, and judge if bit-th bit is balanced. If STwoDP\(\left(\overline{E_{i, j}}, \boldsymbol{l}, b i t\right)\) is balanced, according to Prunning techniques of \(\mathbb{L}\), we add the \(\boldsymbol{l}\) into \(\mathbb{L}_{i, j}^{\prime}\).
Line 17-19 If \(\mathbb{L}_{i, j}^{\prime}\) is an empty set, according to Stop rule 2, the bit-th bit is balanced. Line 20-22 If we do not know the xor-sum of bit-th, we search all division trails of \(Q_{i, j}\) in line with the propagation rules of BDPT. And we set \(\mathcal{D}_{\mathbb{K}_{i^{\prime}, j^{\prime}}, \mathbb{L}_{i^{\prime}, j^{\prime}}}\)to be the input division property of \(\overline{E_{i, j+1}}\).
Line 23-26 After r-round, if \(\mathbb{L}_{i, j}^{\prime}\) is not an empty set, according to Stop rule 3, we get a new distinguisher which xor-sum is odd.
Line 27 Return all balanced bits.
4. Applications
We apply our method to some light-weight block ciphers SIMON, SIMECK, LBlock, GIFT and Khudra. All our experiments are implemented on a server with Intel(R) Core(TM) i7-8700 CPU@3.19 GHz, 64-bit windows system. When the initial division properties are different, we can run the programs in a parallel way. In our description of results, the character ’?’ indicates unknown, ’b’ represents balanced bit, ’o’ stands for odd bit. The tool developed for this paper can be available at: https://github.com/zsq123/IntegralDistinguishersSolver。
4.1 Applications to SIMON and SIMECK
SIMON [6] employs Feistel structure and it is also a lightweight block cipher. The round function of SIMON only involves simple primitives. Let SIMON-2n denote the SIMON family with 2n-bit block sizes and \(n \in\{16,24,32,48,64\}\). The Fig. 2 shows the structure of SIMON-2n, where \(\left(X_{l}, X_{r}\right)\) denotes the input of the round function, \(\left(X_{l+1}, X_{r+1}\right)\)denotes the output of the round function. The core operation of the round function is represented in Fig. 3.
Fig. 2. The structure of SIMON-2n
Fig. 3. The core operation of the round function
In SIMON cipher, we divide the round function of SIMON into \(n+1\) parts
\(Q_{i}=Q_{i, n} \cdot Q_{i, n-1} \cdots \cdots Q_{i, j} \cdots \cdots_{i, 0}\) (26)
When \(0 \leq j \leq n-1\) , we have
\(Q_{i, j}=\left(x_{(j-1)}^{i, j} \bmod n \& x_{(j-8) \bmod n}^{i, j} \oplus x_{(j-2) \bmod n}^{i, j} \oplus x_{j}^{i, j}\right.\) (27)
Moreover,
\(Q_{i, n}=\left(X_{r}^{i, n} \oplus k_{i}, X_{l}^{i, n}\right)\) (28)
where \(k_{i}\) is the key. When we get the output division property based on the BDPT, we only iterate through 4 bits and the other \((2 n-4)\) bits remain unchanged. This can reduce the computational complexity.
Example: For Simon-32, if the input division property of \(Q_{1,15}\) is \(\text { S } \mathcal{D}_{\mathbb{K}_{1,15}}=\emptyset, \mathbb{L}_{1,15}=\left\{l_{1}\right\}\), where \(l_{1}\)=(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 , 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), the 15-th core operation division property of Simon-32 has 4 bits. And the division property is (1,1,0,0). Then according to the propagation rules of BDPT, the division property of the output multiset is (1,1,0,0), (0,0,0,1), (1,0,0,1), (0,1,0,1), (1,1,0,1), so the propagation from \(l_{1}\) generates five vectors as:
\(l_{2}\)=(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),\(l_{3}\)=(1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{4}\)=(1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{5}\)=(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{6}\)=(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1). The output division property of \(Q_{1,15}\) is \(\mathcal{D}_{\mathbb{K}_{1,16}=\emptyset, \mathrm{L}_{1,16}}=\left\{l_{2}, l_{3}, l_{4}, l_{5}, l_{6}\right\}\).
Due to \(Q_{1,16}\) Ahas Key-XOR operation, new vectors are generated from \(\mathbb{L}_{1,16}\) according to Rule 4, and some vectors in \(\mathbb{L}_{1,16}\) will become redundant because of the new vectors of \(\mathbb{K}_{1,16}\), so \(\mathcal{D}_{\mathbb{K}_{1,16}}=\left\{\boldsymbol{k}_{1}\right\}, \mathbb{L}_{1,16}=\left\{\boldsymbol{l}_{7}, \boldsymbol{l}_{8}, l_{9}, l_{10}\right\}\), where \(\boldsymbol{k}_{1}\) =(1,1,1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{7}\)=(0, 1,1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{8}\)=(1, 1,1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), \(l_{9}\)=(1, 1,1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), \(l_{10}\)=(1, 1,1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).
We use \(\mathcal{D}_{\mathbb{K}_{1,16}}=\left\{\boldsymbol{k}_{1}\right\}, \mathbb{L}_{1,16}=\left\{\boldsymbol{l}_{7}, \boldsymbol{l}_{8}, l_{9}, l_{10}\right\}\) as input division property of \(\overline{E_{2,0}}\) based on CBDP. For \(\mathbb{K}_{1,16}\), if there exists \(k \in \mathbb{K}_{1,16}\) that generates the unit vector \(e_{m}\), then the integral property of the m-th bit is unknown. Otherwise, we empty the elements of \(\mathbb{K}_{1,16}\)in the light of Prunning techniques of \(\mathbb{K}\). For , if there is no element of \(\mathbb{L}_{1,16}\)that can generate the unit vector \(e_{m}\)based on CBDP, then the integral property of m-th bit is balanced and the program terminates. If the element of \(\mathbb{L}_{1,16}\) that can generate the unit vector \(e_{m}\) based on CBDP, then we use the vectors of \(\mathbb{L}_{1,16}\) that can generate a unit vector as input division property of \(Q_{2,0}\) and find all division trails based on the BDPT. Next repeat the previous steps until the program terminates.
For SIMON-32, when the data complexity is 231 chosen plaintexts, we obtain a 14-round integral distinguisher as follows.
SIMON-32: \((7 f f f, f f f f) \stackrel{14 r}{\longrightarrow}(? ? ? ?, ? ? ?, ? ? ? ?, ? ? ?, ? b ? ?, ? ? ? ?, b ? ? ?, ? ? ? b)\).
Although we only get a 14-round integral distinguisher, we can get a 15-round integral distinguisher using the technique in [14] . When we apply the same method to SIMON-48/- 64/-96, the results for SIMON-48/-64/-96 based on BDPT are the same as before.
SIMECK [15] is another lightweight block cipher and it is very similar to the SIMON except the rotation constants, the rotation constants of SIMECK is (0,5,1). The round function of SIMECK is denotes as:
\(\left(L_{i}, R_{i}\right)=\left(L_{i-1} \wedge L_{i-1}^{\lll 5} \oplus L_{i-1}^{\lll 1} \oplus R_{i-1} \oplus k_{i}, L_{i-1}\right)\) (29)
Let SIMECK-2n denote the SIMECK block ciphers, where 2n is the length of cipher and n is chosen from 16, 24 and 32. We apply our automatic search algorithm to SIMECK-2n and get 15-/18-/21-round distinguishers for SIMECK-32/-48/-64, respectively. Unfortunately, our results are the same as those found by Xiang et al. [7].
In [12], the authors propose a variant of SIMON family and they name it as SIMON(102). SIMON(102) alters the rotation from (1, 8, 2) to (1, 0, 2). For SIMON(102)-32, we can find xor-sum of some bits are odd, it is more accurate than [10] that finds some bits are constant. In our algorithm, in the output of the 19-round the \(\mathbb{L}\) set is not empty, the vectors of \(\mathbb{L}\) is: (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) and (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). It shows that the parity of the 1-st bit and the 15-th bit are odd.
\(\operatorname{SIMON}(102)-32:(7 f f f, f f f f) \stackrel{19 r}{\rightarrow}(? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b o ? ?, ? ? ? ?, ? ? ? ?, ? ? ?)\).
Similarly, for SIMON(102)-48 and SIMON(102)-64, we also find more accurate integral distinguishers. We list them as follows:
\(\mathrm{SIMON}(102)-48:(7 f f f f f, f f f f f f) \stackrel{27 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b o ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? o)\).
\(\operatorname{SIMON}(102)-64:(7 f f f f f f f, f f f f f f f f) \stackrel{35 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b o ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? o)\).
To test the validity of our model, we do some experiment on SIMON(102)-32. We randomly generate 20 secret keys and exhaustively search odd or balanced bits. We find that all odd/balanced bits found by model are indeed odd/balanced bits. As a result, our new model is valid.
4.2 Application to GIFT
GIFT [16] is an SPN structure block cipher. Its linear layer consists of bit permutation. It is very similar to PRESENT. GIFT has two versions GIFT-64 which has 28-round with a block size of 64 bits and GIFT-128 which has 40-round with a block size of 128 bits. One round of GIFT-64 is shown in Fig. 4.
Fig. 4. One round function of GIFT-64
For GIFT-64, we divide the round function into 17 parts:
\(Q_{i}=Q_{i, 16} \cdot Q_{i, 15} \cdots \cdots Q_{i, j} \cdots \cdots Q_{i, 0}\) (30)
When \(0 \leq j \leq 15\), we have
\(Q_{i, j}=S\left(X_{j}\right)\) (31)
where \(X\) is the input of the round function and \(X=X_{15}\|\cdots\| X_{0}\).
When \(j-16\), we have \(X=X_{15}\|\cdots\| X_{0}\)
\(Q_{i, 16}=P(S b o x(X)) \oplus k_{i}\) (32)
where P is the linear permutation function of GIFT, Sbox(X) is the output of all S-boxes and \(k_{i}\) is the i-th round key.
We apply our algorithm to GIFT, the round of integral distinguisher that we find is the same as before. However, our result can find more balanced bits. When we input 263chosen plaintexts, we find a new integral distinguisher which has 2 more balanced bits than before. We list the result as follows:
\(\text { GIFT-64: }(f f f f f f f f, f f f f f f f d) \stackrel{9 r}{\longrightarrow} (b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?, b b ? ?)\).
When we input 261chosen plaintexts, we find a new integral distinguisher which has 4 more balanced bits than before. We list the result as follows:
\(\text { GIFT-64: }(f f f f f f f f, f f f f f f f 4) \stackrel{9 r}{\longrightarrow} (b ? ? ?, b ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b ? ? ?, b ? ? ?, b ? ? ?, b ? ? ?, b ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b ? ? ?, b ? ? ?)\).
4.3 Application to LBlock
LBlock [17] is a Feistel block cipher. The sizes of block and key are 64 and 80 respectively. And it uses 8 different S-boxes in the round function. The round function of LBlock is illustrated in Fig. 5.
Fig. 5. One round function of LBlock
We divide the round function \(Q_{i}\) into 9 parts
\(Q_{i}=Q_{i, 8} \cdot Q_{i, 7} \cdots Q_{i, 0}\) (33)
When \(0 \leq j \leq 7\), we have
\(Q_{i, j}=S_{j}\left(x_{j}^{i} \oplus k^{i}\right) \oplus y_{j}^{i}\) (34)
where \(S_{j}\left(x_{j}^{i} \oplus k^{i}\right)\) is the output of the -th S-box.
When \(j=8\) , we have
\(Q_{i, 8}=P\left(X_{i}\right)\) (35)
where P is the linear permutation function of the LBlock and \(X_{i}= \left(S_{0}\left(x_{0}^{i} \oplus k^{i}\right) \oplus y_{0}^{i}\right)\left\|\left(S_{1}\left(x_{1}^{i} \oplus k^{i}\right) \oplus y_{1}^{i}\right)\right\| \cdots \|\left(S_{7}\left(x_{7}^{i} \oplus k^{i}\right) \oplus y_{7}^{i}\right)\).
We apply our attack algorithm to LBlock. When we input 263chosen plaintexts, four integral distinguishers are discovered, which are listed as follows:
\(\text { LBlock-64 : }(f f f f f f f b, f f f f f f f f) \stackrel{17 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b b ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b b ? ?)\).
\(\text { LBlock-64 : }(f f f f f f b f, f f f f f f f f) \stackrel{17 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b b ? ?, b ? b ?, ? ? ? ?)\).
\(\text { LBlock-64 : }(f f f f b f f f, f f f f f f f f) \stackrel{17 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b ? b ?, ? b ? b, ? ? ? ?, ? ? ? ?, ? ? ? ?)\).
\(\text { LBlock-64 : }(\text { fbffffff }, f f f f f f f f) \stackrel{17 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b ? ? b, b b ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?)\).
The first one is the same as the 17-round distinguisher in [13], which is obtained under CBDP.
4.4 Application to Khudra
Khudra [18] is a lightweight block cipher using “Generalized type-2 transformations" of Feistel Structure (GFS) [19] with 64-bit block size. It has 18-round and 80-bit keys. For Khudra, the input of the first round is calculated as \(X_{0}=P_{3}\left\|\left(P_{2} \oplus w k_{2}\right)\right\| P_{1} \|\left(P_{0} \oplus w k_{1}\right)\) , where the plaintext \(P=P_{3}|| P_{2}|| P_{1}|| P_{0}\) and \(\text { wk }\) is the whitening-key. Next, X0is encrypted by the round function. The input of the i-th round is denoted as \(X_{i-1}\). Last, the ciphertext C is calculated as \(C=X_{18,0}\left\|\left(X_{18,1} \oplus w k_{3}\right)\right\| X_{18,2}||\left(X_{18,3} \oplus w k_{4}\right)\), where X18 is the output of the 18-th round function and \(X_{18}=X_{18,3}\left\|X_{18,2}\right\| X_{18,1}|| X_{18,0}\). The structure of Khudra is demonstrated in Fig. 6.
Fig. 6. The structure of Khudra
We divide the round function \(Q_{i}\) into 3 parts
\(Q_{i}=Q_{i, 2} \cdot Q_{i, 1} \cdot Q_{i, 0}\) (36)
Here
\(\begin{array}{c} Q_{i, 0}=F\left(X_{i-1,1}\right) \oplus R K_{i}^{\prime} \oplus X_{i-1,0}, \\ Q_{i, 1}=F\left(X_{i-1,3}\right) \oplus R K_{i} \oplus X_{i-1,2}, \\ Q_{i, 2}=X_{i}=\operatorname{Per}\left(Q_{i, 1}\left\|X_{i-1,1}\right\| Q_{i, 0} \| X_{i-1,3}\right), \end{array}\)
where F is the round function which has an iterated structure based on a 6-round GFS, Per is a linear permutation. If there exist \(k \in \mathbb{K}\) that can generate the unit vector \(e_{m}\), then the m-th bit must be unknown in line with the definition of division property. So if there exist k that can decide m-th is unknown based on CBDP, we do not need to judge all vectors of \(\mathbb{K}\).
Khudra includes the initial and the final transformations which contain the whitening-key. It is note that the distinguishers we found in this paper for Khudra does not take account of the above two transformations. When the data complexity is 263 chosen plaintexts, we find two integral distinguishers which are shown as follows:
\(\text { Khudra-64: (efffffff, ffffffff) } \stackrel{9 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b b b b, b b b b, b b b b, b b b b)\).
\(\text { Khudra-64: }(\text { ffffffff, } e f f f f f f f) \stackrel{9 r}{\longrightarrow} (? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, b b b b, b b b b, b b b b, b b b b, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?, ? ? ? ?)\).
5. Conclusions
In this paper, we construct an automatic model to search the integral characteristics and solve the complexity problem of searching integral distinguishers by converting the searching integral distinguisher problem into an SAT problem. First, we model the propagations of BDPT in CNF-formulas. Then we give the construction of SAT models of the S-box. Furthermore, we construct an algorithm which can detect the integral distinguishers efficiently and can help us get all balanced bits based on the BDPT.
We apply our automatic model to some block ciphers, such as SIMON-32/-48/-64/-96, SIMECK-32/-48/-64, SIMON(102)-32/-48/-64, GIFT, LBlock and Khudra. For SIMON-2n and SIMECK-2n, our results are the same as before. For SIMON(102)-32/-48/-64, we can find 20-/28-/36-round integral distinguisher. Although the rounds of integral distinguisher are the same as before for SIMON(102)-32/-48/-64, we can determine some bits are odd, while these bits can only be determined as constant in the previous result. For GIFT, our 9-round distinguisher has more balanced bits than previous longest integral distinguisher. With the same number of rounds as before, we can obtain more integral distinguishers on 17-round LBlock. For Khudra, we get the first 9-round integral distinguisher. As a result, our result show that our model is powerful in finding integral distinguishers for block ciphers.
Acknowledgements
This research is supported by the National Cryptography Development Fund (No. MMJJ20180201), the National Natural Science Foundation of China (No. 62072181), International Science and Technology Cooperation Projects (No. 61961146004) and the Fundamental Research Funds for the Central Universities.
References
- Y. Todo, "Structural evaluation by generalized integral property," in Proc. of the 34th Annual International Conference on the Theory and Applications of Cryptographic Techniques, vol. 9056, pp. 287-314, April, 2015.
- L. R. Knudsen and D. Wagner, "Integral cryptanalysis," in Proc. of International Workshop on Fast Software Encryption 2002, vol. 2365, pp. 112-127, July, 2002.
- Y. Todo, "Integral cryptanalysis on full MISTY1," Journal of Cryptology, vol.30, pp.920-959, 2017. https://doi.org/10.1007/s00145-016-9240-x
- H. Zhang and W. Wu, "Structural evaluation for generalized Feistel structures and applications to LBlock and TWINE," in Proc. of Cryptology -- INDOCRYPT 2015, vol. 9462, pp. 218-237, November, 2015.
- Y. Todo and M. Morii, "Bit-based division property and application to Simon family," in Proc. of the 23rd International Conference, FSE 2016, vol. 9783, pp. 357-377, July, 2016.
- R. Beaulieu, D. Shors, J. Smith, S. Treatman-Clark, B. Weeks, and L. Wingers, "The SIMON and SPECK lightweight block ciphers," in Proc. of the 52nd Annual Design Automation Conference, pp. 1-6, June, 2015.
- Z. Xiang, W. Zhang, Z. Bao, and D. Lin, "Applying MILP method to searching integral distinguishers based on division property for 6 lightweight block ciphers," in Proc. of ASIACRYPT 2016, vol. 10031, pp. 648-678, November, 2016.
- L. Sun, W. Wang, and M. Wang, "Automatic search of bit-based division property for ARX ciphers and word-based division property," in Proc. of ASIACRYPT 2017, vol. 10624, pp.128-157, November, 2017.
- S. A. Cook, "The complexity of theorem-proving procedures," in Proc. of the third annual ACM symposium on Theory of computing, pp. 151-158, May, 1971.
- K. Hu and M. Wang, "Automatic search for a variant of division property using three subsets," in Proc. of CT-RSA 2019, vol. 11405, 2019, pp. 412-432, February, 2019.
- S. Wang, B. Hu, J. Guan, K. Zhang, and T. Shi, "Milp aided method of searching division property using three subsets and applications," in Proc. of ASIACRYPT 2019, vol. 11923, pp. 398-427, November, 2019.
- S. K lbl, G. Leander, and T. Tiessen, "Observations on the SIMON block cipher family," in Proc. of CRYPTO 2015, vol. 9215, pp. 161-185, August 2015.
- Z. Eskandari, A. B. Kidmose, S. K lbl, and T. Tiessen, "Finding integral distinguishers with ease," in Proc. of Selected Areas in Cryptography - SAC 2018, vol. 11349, pp. 115-138, January, 2019.
- Q. Wang, Z. Liu, K. Varici, Y. Sasaki, V. Rijmen, and Y. Todo, "Cryptanalysis of reduced-round SIMON32 and SIMON48," in Proc. of INDOCRYPT 2014, vol. 8885, pp. 143-160, October, 2014.
- G. Yang, B. Zhu, V. Suder, M. Aagaard, and G. Gong, "The simeck family of lightweight block ciphers," in Proc. of Cryptographic Hardware and Embedded Systems 2015, vol. 9293, pp. 307-329, September, 2015.
- S. Banik, S. K. Pandey, T. Peyrin, Y. Sasaki, S. M. Sim, and Y. Todo, "GIFT: a small PRESENT - towards reaching the limit of lightweight encryption," in Proc. of Cryptographic Hardware and Embedded Systems 2017, vol. 10529, pp. 321-345, January, 2017.
- W. Wu and L. Zhang, "LBlock: a lightweight block cipher," in Proc. of Applied Cryptography and Network Security 2011, vol. 6715, pp. 327-344, June, 2011.
- S. Kolay and D. Mukhopadhyay, "Khudra: A new lightweight block cipher for FPGAs," in Proc. of Security, Privacy, and Applied Cryptography Engineering, vol. 8804, pp. 126-145, October, 2014.
- V. T. Hoang and P. Rogaway, "On generalized Feistel Networks," in Proc. of CRYPTO 2010, vol. 6223, pp. 613-630, August, 2010.