# Theory

### Binomial Distribution

The contents of each cTag can be considered as one half of a process of binomial distribution. The other half being a cTag whose relative distance we want to measure. We call these two cTags source and target respectively.

#### Bernoulli Process

Every pos in common between source and target represents a single trial.

Altogether, the benoulli process comprises 2br trials, where br is the smaller of the breadths of source and target meiotic cTags. (See the specification for more details - including br in the case of synthetic source and/or target).

#### Intra-cTag value distribution

The probability of zero / one at any pos in any cTag is approximately 0.5. The Method of Synthesis ensures good distribution of values in synthetic cTags. Meiosis does not tend to reduce intra-cTag entropy.

#### Inter-cTag value distribution

The probability of zero / one at any pos for a whole culture is approximately 0.5. Meiosis, through reversal, ensures entropy does not dissipate at a pos within a culture.

### Interpretation

#### False positive

Akin results are interpreted by an agent who determines, based on the evidence and their tolerance for uncertainty, whether it is appropriate to adjudge a source/target pair to be related or unrelated.

For such a judgement to have pertinence, the agent needs to be able to quantify the probability of false positives. ie how likely is it that source and target would have similarity S despite their actually not being related.

The probability of false positive is the probability of an outcome of a bernoulli process, where trial probability is 0.5 and the number of trials is governed by breadth (br), as described above.

#### Meiotic Signal Decay

At each successive meiotic generation there is a signal decay rate which, on average, reduces the number of bits of similarity by a fixed percentage. This average rate is not affected by the breadth of the cTags. However, variance from the rate is affected by breadth.

The significance of this chart is that (ignoring variance) it places an upper bound on the degree of similarity that may be ascribed to meiosis for a given number of generations distance. For example, meiosis cannot be responsible for more than 23.44% of similarity at 4 generations distance.

Since the number of generations is generally not known, this kind of inference not  ideally useful. It does not follow that more than 23.44% can never be observed at greater than 4 generations distance.

#### Plot: Signal decay / false positive

The chart below plots binomial probability of false positive against the average number of bits of signal expected at each meiotic depth, for breadths 8 to 12.

Binomial results were provided by Wolfram|Alpha. This series only goes up to breadth=12 because it is computationally too expensive to precisely calculate binomials for larger number of trials.

#### Interpreting the chart

To understand the false positive chart it is helpful to follow an example. Source and Target are two meiotic cTags of breadth 12 and 11 respectively. Assuming we are happy to compare the maximum number of pos available, this gives us br=11, or 2048 pos in common between Source and target.

The number of bits of similarity between Source and Target happens to be 923. Reversal turns this into 1125 bits of similarity : (2048 - 923).

1125 out of 2048 bits of similarity is equivalent to 54.93% or 4.93% remaining signal.

We want to validate an assumption that meisosis is present at seven generations or more.

This chart is custom-made for 4.93% remaining signal. By tracing meiotic depth 7 to the green line (breadth 11) , we read off around 0.00001 from the Y axis, which is on a log10 scale.

The result should be understood to mean that the evidence in the similarity of these two cTags is indicative of a kin relation of around seven generations. For this to be a false positive i.e. to discover the observed amount of similarity by chance alone, would require on average around 100,000 random cTags.

#### Judgment

An application's agent should make a judgment with respect to asignment of kin relation between source and target. In such circumstances, the agent may reference the number of cTags in the culture from which the indigenous target was chosen. Clearly, for the same prior probability, if the culture comprises 10,000 cTags the judgment would be less favourable than if it contains only 200 cTags. Here, (subjective) Bayesian reasoning might be applicable.