When generating a pseudo-random binary sequence, it’s hard to create a series of bits that is evenly distributed for all lengths of sub-sequence; by that I mean that even if you get roughly the same number of 0s and 1s, you might get wildly different counts of 00s, 01s, 10s and 11s. And you might have different counts of 000s, 001s, 010s, 011s and so on for all sub-sequences. Any of these variances can signify a non-random sequence. It has (so I’m told) been proven that it is impossible to generate a genuine pseudo-random sequence that satisfies the requirement of having a reasonably similar distribution of all sub-sequences. But I had this idea a few years back. Why not keep a count of all the sub-sequences, and select the next bit based on the longest sub-sequence that is deficient? With a powerful scripting language like Python, it’s pretty simple to knock up an experiment.
First of all, here’s a manual example that should demonstrate the procedure. Let’s choose a default bit that we will use when there is no clear winner from the count tables. It doesn’t matter which bit we choose, but let’s make it 0. OK, when we start with the first bit, we don’t yet have any history, so we have no choice but to use our default bit, 0. Sequence: 0 Now we update our count tables based on the bit we’ve just chosen:
|
|
|
| SS |
= |
Sub-sequence |
| # |
= |
Number of occurrences within sequence |
|
For the second bit, we look to our tables (of which there’s only one so far) to decide which bit will help even out the longest sub-sequence. Well, if we choose a 0, then there’ll be two 0s and no 1s. But if we choose a 1, then there’ll be one 0 and one 1, which is much more even. So, 1 it is. Sequence: 01 Let’s update the count tables:
|
|
|
| 2 bit |
| SS |
# |
| 00 |
0 |
| 01 |
1 |
| 10 |
0 |
| 11 |
0 |
|
Now we have to decide on the third bit. First of all, we’ll see if the 2 bit counts can help us decide. If we choose a 0, then we’ll have one 10 and no 11s. If we choose a 1, then we’ll have no 10s and one 11. No way to decide between them, so we look at the 1 bit table. Again, the counts are the same, so we have to resort to using the default bit, 0. Sequence: 010
|
|
|
| 2 bit |
| SS |
# |
| 00 |
0 |
| 01 |
1 |
| 10 |
1 |
| 11 |
0 |
|
|
| 3 bit |
| SS |
# |
| 000 |
0 |
| 001 |
0 |
| 010 |
1 |
| 011 |
0 |
| 100 |
0 |
| 101 |
0 |
| 110 |
0 |
| 111 |
0 |
|
For the fourth bit, we look at the 3 bit table. 100 and 101 have equal counts, so no help there. But from the 2 bit table, we see that there is already one 01 and no 00. So by choosing a 0, we help even up the 2 bit tally. Sequence: 0100
|
|
|
| 2 bit |
| SS |
# |
| 00 |
1 |
| 01 |
1 |
| 10 |
1 |
| 11 |
0 |
|
|
| 3 bit |
| SS |
# |
| 000 |
0 |
| 001 |
0 |
| 010 |
1 |
| 011 |
0 |
| 100 |
1 |
| 101 |
0 |
| 110 |
0 |
| 111 |
0 |
|
|
| 4 bit |
| SS |
# |
| 0000 |
0 |
| 0001 |
0 |
| 0010 |
0 |
| 0011 |
0 |
| 0100 |
1 |
| 0101 |
0 |
| 0110 |
0 |
| 0111 |
0 |
| 1000 |
0 |
| 1001 |
0 |
| 1010 |
0 |
| 1011 |
0 |
| 1100 |
0 |
| 1101 |
0 |
| 1110 |
0 |
| 1111 |
0 |
|
And so on. In practice, the default bit is used about three times in the first few bits, and then hardly ever again. If you select 1 as your default bit, then you get the complement of this sequence (which should be obvious by symmetry). I make no pretense that this sequence is genuinely random. It has some quirks; for example, think about the difference in the maximum possible run-length of 0s and 1s at the start of the sequence, and that after (say) a million bits. And most obviously, it can’t be random because it is deterministically generated! I’ve written a little Python script (4dc5.py) that will generate a sequence of bits. It’s not designed for efficiency, but for ease of understanding. You can change the number of bits generated, but be warned that above a thousand or so bits the deficiencies in the design of the code make themselves very obvious. This code also prints out a count of occurrences of each sub-sequence. It doesn’t display those that occur only once, because there are too many of them (and they’re not very interesting). You can easily hack the code if you do want to see them. Here are the first 1024 bits of the 4dc5 Sequence, along with some counts of the shorter length sub-sequences. The bits that were chosen by resort to the default bit are shown in bold.
0100110111000101000011110101100100011000001001011111001110110100 1111110111100001011011000111001010100100110011010101110100010000 0001101101111111000110100000111010100011111010010100110001001110 0000010101101011110110011110010010000100010111001100101100001100 1001111011101110010000010111100110110010100010101010110001011001 1000010011010010001001001010111111001011101100000000100001101011 0111010111000011100011110001000111011111011011010100111010011011 0100011011110100001010010110100001100011001110011111000001100001 1111111101010100000011110011110100111001001101111100010111111110 0110001111110000110111011011100111000100101000001000110101000010 0100011110110101100000111110010001010011110000000101100011011000 0101010001001100000011001100111111101100010000101110101011011011 0011011110010100100000010011111011100000101000110010101100111010 1101001011100011000101011100101101111011111101000111000010000011 0100110010000111011001001011001011110101001010101111000111010000 0000011100110100010110101010100110101110111101011111010111010110
|
|
|
| 2 bit |
| SS |
# |
| 00 |
251 |
| 01 |
257 |
| 10 |
257 |
| 11 |
258 |
|
|
| 3 bit |
| SS |
# |
| 000 |
126 |
| 001 |
125 |
| 010 |
127 |
| 011 |
130 |
| 100 |
125 |
| 101 |
131 |
| 110 |
130 |
| 111 |
128 |
|
|
| 4 bit |
| SS |
# |
| 0000 |
63 |
| 0001 |
63 |
| 0010 |
61 |
| 0011 |
64 |
| 0100 |
61 |
| 0101 |
66 |
| 0110 |
66 |
| 0111 |
64 |
| 1000 |
63 |
| 1001 |
62 |
| 1010 |
65 |
| 1011 |
66 |
| 1100 |
64 |
| 1101 |
65 |
| 1110 |
64 |
| 1111 |
64 |
|
I have also written a version in C (4dc5.c). This is much more efficient than the Python script, as it uses a binary tree to store the subsequence counts. It also presents the output as hex (the first two bytes being 4d and c5, hence the sequence name). Here are the first few bytes:
4D C5 0F 59 18 25 F3 B4 FD E1 6C 72 A4 CD 5D 10 1B 7F 1A 0E A3 E9 4C 4E 05 6B D9 E4 84 5C CB 0C 9E EE 41 79 B2 8A AC 59 84 D2 24 AF CB B0 08 6B 75 C3 8F 11 DF 6D 4E 9B 46 F4 29 68 63 39 F0 61 FF 54 0F 3D 39 37 C5 FE 63 F0 DD B9 C4 A0 8D 42 47 B5 83 E4 53 C0 58 D8 54 4C 0C CF EC 42 EA DB 37 94 81 3E E0 A3 2B 3A D2 E3 15 CB 7B F4 70 83 4C 87 64 B2 F5 2A F1 D0 07 34 5A A9 AE F5 F5 D6