Approach 2:
Hard SAT Instance Generation Based on the Factorization Problem

Index of this page:
  1. Introduction to Approach 2
  2. Algorithms
  3. Sample Instances
Go back to previous page:
  1. SAT Instance Generation: Main Page
  2. Approach 1: Random Generation of Unique Solution 3SAT Instances

This page is created by T.Uchida and O.Watanabe. For question and/or comment of this page, please contact uchida@is.titech.ac.jp.


1. Introduction to Approach 2

The factorization problem is a problem to compute a sequence of prime factors of a given integer. In particular, we are interested in the factorization problem instances x that are products of two big prime numbers p and q. (Below we assume that a given 2l- bit number x is a product of two l bit prime numbers, and the factorization problem (FACT in short) is to compute these two prime numbers from x.)

The FACT has been studied by many researchers, and it has been believed to be hard even on average. Thus, one can expect to generate hard SAT instances by transforming from FACT instances, and here we take this approach.

For a given FACT instance x, we generate SAT instance (i.e., a Boolean formula F) as follows:

  1. Construct a circuit C, which we call a test circuit, that tests, for a given pair of numbers y and z, whehter y*z = x, and outputs 1 if and only if the input pair satisfies the test.
  2. Transform this circuit to a Boolean formla G so that G is satisfiable if and only if C yields 1 for some input pair.
  3. Transform this Boolean formula to a CNF formula F. (The obtained formula consists of clauses of size 2, 3, and 4.)
From our transformation from a circuit C to a Boolean formula F, it is easy to see that there is a set of Boolean variables of F that correspond to input gates of C, and from a satisfying assingment of F, we can easily compute an input to C that has C yield 1. On the other hand, C yields 1 if and only if it is given correct prime factors as an input pair. Thus, from a satisfying assingment of F, we can compute the prime factors of x. That is, finding satisfying assingment of F is at least as hard as factoring x.

It is also easy to see that the number n of variables of F is the same as the number of gates in C. Thus, for obtaining a small formula, we need to construct a small test circuit. Here we propose two algorithms to construct test circuits:

  1. GenFact: A straightforward algorithm. With this algorithm, we can construct a circuit of size O(l**2), where l is the bit length of each prime factor of x.
  2. GenFactC: A more efficient algorithm using the Chinese Remainder Theorem. With this algorithm, we can construct a circuit of size O(l**1.5).
The original generation algorithm (e.g., GenFact) generates size 1 and 2 clauses (clauses with one/two variables), which give a quite good clue to to some of heuristics. Thus, we modify the obtained formula so that it also produces formula consisting of only size 3 and 4 clauses.

The former algorithm gives us smaller instances for l < 40, which corresponds to formulas with 4,000 variables, and the latter algorithm is better for l > 40.

For further explanation, see our paper in References and Links.

A Variation

Through our experiments, we started thinking that the factorization problem may not be essential for generating reasonably hard SAT instances, and that some other arithmetic problems may work as well. Thus, for one variation of our approach, we considered the following problem. (The algorithm we provide here is called GenCSAdd.)

k-CyclicShift Addision Problem (CSAdd)
Input: z, y.
Queston: Compute x such that x+y1+y2+y3+...+yk=z,
where each yi is obtained from y by shifting i bits left in cycle.

Clearly, this problem is easy to solve. But it has a similar structure as the ordinary multiplication, and thus one can expect that it yields hard instances as well (at least for some type of heuristics). It turned out that the problem is as strong as those obtained from FACT. Furthermore, since the size of formula obtained from l bit number is linear in l, we can generate relatively small instances, or we can use larger l (which is harder) compared with FACT.

2. Algorithms

Four algorithms GenFact, GenFact3, GenFactC, and GenCSAdd3 can be obtained by downloading the following files and compling them.

The program (e.g., GenFact) is called as

GenFact [p] [q] ([L])

where [p] and [q] are two prime numbers of the same bit length l, and [L] (optional) is the number of input gates of the generated test circuit. (Thus, the default value of [L] is 2l, and L must be larger than 2l.)

Then the program creates a directory GenFact.p.q that contains the following files:

The number of variables and that of clauses (of each size) are determined by L. Some examples are given below. (l = bit size of the original problem, n = num. of variables, m = num. of clauses, ci = num. of size i clauses. GenFact(*) (resp., GenCSAdd(*)) means a modified version.)

algorithmlnm c1, c2, c3, c4
GenFact 5 75 335 c1, c2, c3, c4
6 108 504 c1, c2, c3, c4
7 147 706 c1, c2, c3, c4
8 192 944 c1, c2, c3, c4
l 3*l*l l(17l-18) 2l, 2l(l+1), l(7l-6), 8l(l-2)
GenFact(*) 5 81 361 0, 0, c3, c4
6 114 534 0, 0, c3, c4
7 153 736 0, 0, c3, c4
8 198 978 0, 0, c3, c4
l 3*l*l+d l(17l-18)+6d+x 0, 0, l(9l-4)+6d+y, 8l(l-2)
GenCSAdd 6 92 477 0, 0, c3, c4
7 104 549 0, 0, c3, c4
8 116 621 0, 0, c3, c4
l 2(k+1)l+k(k-1) 2(7k+1)l+k(7k-17)/2 2l+k, k(k+3), 6kl+k(5k-9)/2, 8l+k-9
GenCSAdd(*) 6 98 511 0, 0, c3, c4
7 110 587 0, 0, c3, c4
8 122 663 0, 0, c3, c4
Where d is # of dummy var. and k is # of additions (which is set 5 in our examples).

3. Sample Instances

In what follows we give some set of instances generated by using our generators GenFact and GenCSAdd. (All files are at GenFact: F.tar.gz, and GenCSAdd: A.tar.gz.)

algorithmln, msample file
GenFact 5 75, 355 F-5-19-29.txt, F-5-23-29.txt
F-5-23-31.txt, F-5-29-31.txt
81, 361 F34-5-19-29.txt, F34-5-23-29.txt
F34-5-23-31.txt, F34-5-29-31.txt
6 108, 504 F-6-37-59.txt, F-6-41-53.txt
F-6-53-61.txt, F-6-59-61.txt
114, 534 F34-6-37-59.txt, F34-6-41-53.txt
F34-6-53-61.txt, F34-6-59-61.txt
7 147, 706 F-7-101-127.txt, F-7-113-127.txt
F-7-79-103.txt, F-7-97-113.txt
153, 736 F34-7-101-127.txt, F34-7-113-127.txt
F34-7-79-103.txt, F34-7-97-113.txt
8 192, 944 F-8-137-241.txt, F-8-181-241.txt
F-8-227-241.txt, F-8-227-251.txt
198, 978 F34-8-137-241.txt, F34-8-181-241.txt
F34-8-227-241.txt, F34-8-227-251.txt
GenCSAdd 6 92, 477 A-6-40-51.txt, A-6-40-59.txt
A-6-55-47.txt, A-6-55-57.txt
110, 587 A34-6-40-51.txt, A34-6-40-59.txt
A34-6-55-47.txt, A34-6-55-57.txt
7 104, 549 A-7-105-119.txt, A-7-105-89.txt
A-7-70-119.txt, A-7-70-89.txt
98, 511 A34-7-105-119.txt, A34-7-105-89.txt
A34-7-70-119.txt, A34-7-70-89.txt
8 116, 621 A-8-140-153.txt, A-8-140-231.txt
A-8-205-153.txt, A-8-205-231.txt
122, 663 A34-8-140-153.txt, A34-8-140-231.txt
A34-8-205-153.txt, A34-8-205-231.txt


uchida@is.titech.ac.jp