A special National Cipher Challenge for extraordinary times › Forums › Bureau of Security and Signals Intelligence Forum › Index of coincidence
24th April 2020 at 5:30 pm #47887
Seeing as this is a more relaxed cipher challenge, I assume my fellow competitors will be more open to sharing their secrets? I’ve spent the last few days trying to come up with a good model for the distribution of the ioc of some text based on its length. A normal with mean = index of coincidence for english seems to work pretty well, but I have to set the variance to 0.00148 * (length ^ -0.654), which seems just a bit random. Its a very good fit and was obtained through getting my computer to read lots about war and peace (which it is now an expert in), but I was just wondering if there was a more mathematical approach to a perhaps more exact figure?28th April 2020 at 3:51 pm #47890
I don’t know that that is really the best use of IoC. See this page on how to use IoC to find the period of a periodic substitution cipher:
http://practicalcryptography.com/cryptanalysis/stochastic-searching/cryptanalysis-vigenere-cipher/30th April 2020 at 1:24 pm #48006
Who said anything about a good use? I do know that method and I was basically using that, but trying to add some automation to get the program to pick the first likely option, rather than multiple possibilities, so it calculates the index of coincidence for each column number and accepts it if its, say, 90% likely to happen and else just accepts the one with the highest IOC. But now I’m really just curious about the variance, it feels like it should have some nice stats behind it, but that might just be wishful thinking.30th April 2020 at 4:39 pm #48012
In my experience, the best option can be quite low if the ciphertext is short. A good cutoff is about 1.55 or 1.6 (in the normalization where 1 = random and 1.75 = English).
If you really want to know how it varies, you could take a novel and randomly choose texts from it and find the IoC of each. You will find that the variation will depend on the period as well as the length of the text. Prime periods will vary less. Non-prime periods will have secondary and tertiary peaks in the graph of IoC vs. period, which are interesting. (Height of such peaks is IoC(random) + (Ioc(English)-Ioc(random))*gcd(trueperiod,period)) I tried fitting to the peaks of the distribution, but if I remember correctly, it wasn’t any more reliable that the method you use.
There is another method called “twist algorithm” that is more reliable. There is an article by Barr and Simoson that proposes it. It doesn’t suffer as much from the problem of false positives, compared to using the IoC. (also see https://www.researchgate.net/publication/336192751_Finding_the_key_length_of_a_Vigenere_cipher_How_to_improve_the_twist_algorithm)
That’s as helpful as I can be on what little sleep Harry lets me get.
- You must be logged in to reply to this topic.