Al Zimmermann: Primorial Soup

as73251 · 22.11.2017, 18:32

dimkadimon в сообщении #1267944 писал(а):

as73251 found 14 optimal solutions for N=11.

Может быть я Вас не понял: я имел в виду решения, которые укладываются в интервал, который указан 12d3'ом.

Herbert Kociemba · 22.11.2017, 19:36

dimkadimon в сообщении #1267944 писал(а):

as73251 found 14 optimal solutions for N=11.

Surely not. He found 14 solutions using the vertices he admitted. Only one of them was the optimal solution.

as73251 · 22.11.2017, 21:42

Herbert Kociemba в сообщении #1268085 писал(а):

dimkadimon в сообщении #1267944 писал(а):

as73251 found 14 optimal solutions for N=11.

Surely not. He found 14 solutions using the vertices he admitted. Only one of them was the optimal solution.

Я это понимаю: я неправильно понял dimkadimon'а, о чем и написал постом выше.

jcmeyrignac · 22.11.2017, 22:49

If you want to check what solution is missing, here are the last 4 digits of the 15 solutions for N=11:

8339
7387
0497
4831
6717
9647
4917
1169
7965
6541
3333
5279
9469
1521
4303

as73251 · 23.11.2017, 00:23

jcmeyrignac

Я использовал long double и у меня, например, для оптимального решения 4 последние цифры: 1900.

Herbert Kociemba · 02.12.2017, 13:50

The following considerations does not help to find better solutions. But I think it is still quite interesting.
Each node for problem size N is represented by list of N-1 different primes, where the primes are chosen among the first N(N-1)/2:= M primes. If we have two random nodes we can ask how big the probability is, that these are "compatible nodes" in the sense that they could appear in a solution for N.
This means, that the two nodes have exactly one prime in common.
The total number of node pairs is Binomial(M,N-1)^2. The number of node pairs which have exactly one prime in common is
Binomial(M,1)Binomial(M-1,N-2)Binomial(M-N+1,N-2). The quotient gives the probability p. Here are some {N, p}-pairs:

{4, 0.450},{5, 0.381},{6, 0.350},{7, 0.332},{8, 0.321},{9, 0.313},{10, 0.307},{11, 0.303},{15, 0.293},{20, 0.286},{25, 0.283},{30, 0.281}

The probabilies seem to converge and indeed I can show that the limit for N-> infinity exists and is 2/e^2 = 0.270671

To find a solution for problem size N we have a subset of "good nodes" from which we have to choose N nodes and it is necessary that each node is compatible to each other node within these N nodes. I computed the fraction of all node pairs which are "compatible" in the subsets (for N=11 the subset contains for example 11589 nodes)
and got the following:
{4, 0.444},{5, 0.406},{6, 0.372},{7, 0.331},{8, 0.319},{9, 0.311},{10, 0.306},{11, 0.302}

The results are very close to the theoretical distribution for all nodes. It seems that statistically the subset of "good nodes" does not show a different behaviour than the set of all nodes.

mertz · 02.12.2017, 23:31

That is very interesting.

Obviously an optimal solution is not going to start with one of the nodes having many consecutive primes. There needs to be gaps. Gaps will not be consistent from high primes to low primes. The high prime gaps will probably be more important.

Do you have any theories?

Herbert Kociemba · 04.12.2017, 03:15

mertz в сообщении #1271209 писал(а):

Do you have any theories?

No, I do not see any pattern which gives a certain good node an advantage in comparison to another good node.

I try to estimate how many good nodes are needed to have a fair chance to find a subset of N nodes among them which are a valid solution for problem size N.
Let M := N(N-1)/2 be again the number of primes. Then there are M!/N! different sets of N nodes which are valid solutions. The easiest way to see this if you think of the complete graph K_N. There are M! ways to name the edges but the N! permutations of the nodes do not change the adjacency list representation of the graph K_N.
Each node has N-1 neighbours and hence consists of N-1 different primes, so there are Binomial(M,N-1) different nodes.

We must take into account that if we have N-1 nodes of a valid solution then the N-th node is forced. So it is sufficient to search only for a set of N-1"compatible nodes" which we call a "partial solution" here. Each of the M!/N! N-node solutions produces N partial solutions with N-1 nodes, so there are M!/(N-1)! partial solutions in total.
On the other hand there are Binomial(Binomial(M,N-1) ,N-1) subsets with N-1 nodes build from the Binomial(M,N-1) different nodes (usually these are no partial solutions of course) .
So the probability, that a subset of N-1 random nodes is a partial solution is p:= M!/(N-1)!/ Binomial(Binomial(M,N-1) ,N-1).
Now if we have a pool of K "good nodes" we can build Binomial(K,N-1) different subsets with N-1 nodes. On average p*Binomial(K,N-1) of these subsets will be partial solutions provided that the "good nodes" do not show some special behaviour in this context. So we should choose K such that

M! Binomial(K,N-1) /((N-1)! Binomial(Binomial(M,N-1) ,N-1)) >>1

It should be mentioned that the forced N-th node may be eventually not in the pool of the K good nodes.

With this formula we get the following lower bounds for K:

(N, K): (6, 32), (7, 87), (8, 249), (9, 729), (10, 2164), (11, 6487), (12, 19572), (13, 59343), .....(28, 1.3e12)

If we take for N=11 a pool with 11589 good nodes the above formula estimates 332 partial solutions. This corresponds to 332/11 or about 30 full solutions (if we assume the forced node is in the pool) which gives the right order of magnitude compared with the 14 actual solutions .

Herbert Kociemba · 04.12.2017, 11:58

Herbert Kociemba в сообщении #1271771 писал(а):

If we take for N=11 a pool with 11589 good nodes the above formula estimates 332 partial solutions. This corresponds to 332/11 or about 30 full solutions (if we assume the forced node is in the pool) which gives the right order of magnitude compared with the 14 actual solutions .

I rerun N=11 and found 15 full solutions within the good nodes and three partial solutions. This means that there are 168 partial solutions in total, in 15*11=165 cases the forced node also is a good node and only in three cases not.

mertz · 04.12.2017, 17:28

Here are four examples from the 45k nodes. prime index/diff.
The math above above goes way over my head.
My optimal solutions for N < 12 and my best solutions for N=12, do not have any nodes that resemble these.

3 4 21 22 30 43 44 45 46 47 48
1 17 1 8 13 1 1 1 1 1 18

9 10 11 27 28 29 30 31 35 46 47
1 1 16 1 1 1 1 4 11 1 19

16 17 18 19 20 21 23 32 33 39 40
1 1 1 1 1 2 9 1 6 1 26

0 14 15 16 17 34 51 55 63 64 65
14 1 1 1 17 17 4 8 1 1 1

10k of the 45k could be eliminated because of too many sequential. The third node is an example of a very small range.
With only one computer, no way to test ideas.
Post contest discussion will be interesting.

dobrichev · 05.12.2017, 10:22

Reading Herbert's estimations from practical work flow perspective answers some questions.

For example the huge number (28, 1.3e12) is reducible after involving N-2, N-3 ... N-m multipliers in the formulas.

Herbert Kociemba · 06.12.2017, 15:04

If we do not want the rawscore for a node exceed a certain value R we can ask for the number the number $N_g$ of the "good nodes" which have this property. Surprisingly there is a quite simple fomula which gives an excellent approximation for $N_g$ .

For the calculation we do not work with products of the prime numbers but with the sums of their natural logarithms. For problem size N a node consists of N-1 of the N(N-1)/2:=M primes, so there are Binomial(M,N-1) different nodes. The energy E(n) of a node n then is the sum of the logarithms of all primes of this node in this logarithmic representation. We define all nodes to have the same probability and in this way E(n) is a random variable in the space of all nodes. We want to know more about the probability distribution of E.

Here is a look at the distribution for N=7 with M= 21 and Binomial(21,6) = 54264 nodes. The bin size of the histogram is 0.1.

I is not difficult to show that the the mean value $\mu$ is exactly $\mu = \frac{2}{N}\sum\limits_{i=1}^{M} Log(Prime(i))$ ,
which for example is 18.8219 for N=7.
This also is the target energy for a node in the logarithmic representation, $Exp(\mu)$ is the target energy for a node in the usual representation.

The variance $\sigma^2$ can be shown to be exactly
$\sigma^2 = \frac{2}{N(N^2-1)} \sum\limits_{i=1}^{M}[(N-1)Log(Prime(i))-\mu]^2$ .

Already for N=7 the graph shows a good approximation to a Gaussian distribution. This is a consequence of the central limit theorem. With the formula for this distribution (https://en.wikipedia.org/wiki/Normal_distribution)we can approximate the number $N_g$ of good nodes in a small interval $[\mu-\epsilon,\mu+\epsilon]$ around the mean value:

$N_g \approx Binomial(M, N-1) \frac{2\epsilon}{\sqrt{2\pi}\sigma}$

I also can show that we have to choose $\epsilon= \sqrt{2 *R*Exp(-\mu)}$
for all nodes which should contribute with no higher value than R to the total rawscore. If we choose for R the "Best Raw Score" in Al's table we count all nodes which could be candidates for a solution. So finally we get for the number of good nodes

$N_g \approx 2 Binomial(N(N-1)/2, N-1) \frac{\sqrt{R*Exp(-\mu)}}{\sqrt{\pi}\sigma}$

For N in [4..28] we have
$\mu=$ {5.15, 9.04, 13.7, 18.8, 24.4, 30.4, 36.7, 43.2, 50.0, 57.0, 64.3, 71.7, 79.2, 87.0, 94.8, 103., 111., 119., 128., 136., 145., 153., 162., 171., 180.}
$\sigma=$ {0.896, 1.39, 1.83, 2.21, 2.54, 2.84, 3.11, 3.36, 3.58, 3.80, 4.00, 4.19, 4.37, 4.53, 4.70, 4.85, 5.00, 5.14, 5.28, 5.42, 5.54, 5.67, 5.79, 5.91, 6.03}

and from this with the current best Raw Scores we get
$N_g \approx$ {9, 19, 32, 70, 683, 1073, 5274, 11985, 47345, 216334., 1.817*10^6, 2.352*10^7, 2.778*10^8, 3.823*10^9, 7.209*10^10, 1.432*10^12, 2.42*10^13, 8.321*10^14, 1.314*10^16, 2.369*10^17, 6.696*10^18, 2.036*10^20, 1.869*10^22, 6.952*10^23, 1.353*10^25}

Since I generated the good nodes for N= 4 to 13 with my program we can compare this with the real number of good nodes:

$N_g =$ {10, 17, 30, 68, 619, 1002, 5114, 11589, 45758, 210322}

The approximation seems to be very good and the quality increases with higher N.

Herbert Kociemba · 07.12.2017, 05:41

Herbert Kociemba в сообщении #1272594 писал(а):

$N_g \approx$ {9, 19, 32, 70, 683, 1073,

Copy error, sorry... $N_g \approx$ {9, 26, 32, 123, 683, 1073,..

With the ideas used in my last two posts it is possible to estimate the Best Raw Scores.

1. For a given problem size N compute the number $N_g$ of good nodes needed to get 1 expected (partial solution). For this take $N_g$ such that with M:=N(N-1)/2

$$ M! Binomial(N_g$, N-1) /((N-1)! Binomial(Binomial(M, N-1) , N-1)) $\approx$ 1$

2. Now solve this equation for R:

$N_g \approx 2 Binomial(N(N-1)/2, N-1) \frac{\sqrt{R*Exp(-\mu)}}{\sqrt{\pi}\sigma}$

R is the maximal raw score of a single node of the solution. We can take R as a low estimate for the best possible raw score of the complete solution. Since the solution has N nodes and the raw scores of a node can vary between 0 and R we take R*N/2 as the estimate for the best possible raw score. For N in [4,28] we get for the low estimate

{5, 49, 238, 1470, 8960, 56700, 377000, 2.55*10^6, 1.74*10^7, 1.25*10^8, 8.98*10^8, 6.61*10^9, 4.93*10^10, 3.69*10^11, 2.81*10^12, 2.21*10^13, 1.72*10^14, 1.37*10^15, 1.11*10^16, 8.99*10^16, 7.33*10^17, 6.06*10^18, 5.1*10^19, 4.34*10^20, 3.70*10^21}

The following graph shows, that the best raw score and the estimates agree up to N=12 where presumably the optimal solutions are know. It also shows that presumably for N>12 there is much room for better solutions...

Herbert Kociemba · 07.12.2017, 12:54

Herbert Kociemba в сообщении #1272792 писал(а):

It also shows that presumably for N>12 there is much room for better solutions...

It also shows that presumably for N>13 there is much room for better solutions...

as73251 · 13.12.2017, 12:12

as73251 в сообщении #1267770 писал(а):

Полный расчет на одном ядре для n=11 занял 32.5 мин.

Усовершенствованный алгоритмический подход позволил сократить время расчета до 17 минут на одном ядре. Если применить векторизацию, то время расчета на одном ядре можно уменьшить в несколько раз. Но это уже техническая, а не алгоритмическая проблема и поэтому не хочется на это тратить время. Интересно, что все идеи, которые привели к этому результату, применимы и для быстрого расчета. Попытаюсь, если позволит время, реализовать быстрый алгоритм до 23 декабря, а если он будет интересен форумчанам, то и позже.

Научный форум dxdy

Al Zimmermann: Primorial Soup