Mate Select
The MateSelect heuristic is specific to the Ingendex application. Though not a part of the specification of Akinity, it corresponds loosely to Akinity's Selection.
Purpose
The heuristic is designed to automatically select a suitor's mate from amongst a list of cTag candidates in an operational culture.
Usage
MateSelect is used in many of the interface operations. In most use cases involving intercourse, a foreign cTag needs a suitable local cTag with which to perform meiosis. If not specified by user choice, this selection can be done automatically using MateSelect.
Basis of selection
The basis of MateSelect's selection is proximity and mutual specialness within the group. The desired outcome is that MateSelect will select a candidate having properties :
-
Proximity. High similarity between suitor and selected mate overall and
-
Mutually special. High similarity between suitor and selected mate especially at those pos where the bit value of the suitor is not equal to the most common bit value at the Norm.
The latter property attempts to weight more heavily those similarities which are peculiar to the pair, but are uncommon in the whole culture. In this respect MateSelect resembles TF-IDF.
effectiveness
MateSelect does not necessarily always locate the optimal candidate in the culture due to effects such as local minima and turbulence. Alos, there ois not a clear definition of what would constitute optimal mate selection.
MateSelect is a default selection mechanism of Ingendex. It can be overridden by manual selection of a candidate.
It somewhat resembles a fitness function in a genetic algorithm. Unlike a genetic algorithm, however, MateSelect's choice is not towards some domain-specific desired output. Rather, it is over a domain-generic similarity measurement.
Norm is a value for each pos, denoting the relative occurrence of zero / one at the pos in the entire operational culture.
Inputs
Inputs to MateSelect are:
- Focus cTag
- Operational culture having aligned polarity
align polarities
Aligning polarity means choosing the polarity for each cTag which brings it to more similarity with the rest of the group than less. This is achieved by picking one cTag, (arbirarily the culture's founder), and aligning all other cTags' polarity with it, reversing any cTag as required.
Categorise Type A and Type B result types for each pos in the between 0 and the breadth of the cTag. (see table below).
| Norm value at Pos | Suitor Bit | Candidate Bit | Type |
| 0 | 0 | 0 | Type A |
| 0 | 0 | 1 | 0 |
| 0 | 1 | 0 | 0 |
| 0 | 1 | 1 | Type B |
| 1 | 0 | 0 | Type B |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 |
| 1 | 1 | 1 | Type A |
If bit values of suitor and candidate both differ from the bit value at Norm, then the candidate will be more strongly weighted at this pos. If they are both the same as norm then the candidate will be less strongly weighted at this pos. If suitor and candidate differ from each other then zero weight is given for this pos.
Scores for each pos are totalled to give the overall score for the candidate. The candidate with the highest overall score is a suitable candidate to be co-parent for the suitor.
The table below illustrates an example in which the average score for all pos in common between A and C is slightly higher the average score for all pos in common between A and B. Candidate C is therefore a better match for suitor A than is candidate B.
Yellow highlighted values are fixed inputs. Grey highlighted values are calculated inputs. Pink highlighted values are calculated outputs.
Pos
is the position of a boolean value in a cTag. All pos in a cTag are included in the calculation. If a pos is not present in a cTag (because its breadth is relatively low) then that pos is simply excluded from the calculation. In the example below, candidate C's pos from 512-1023 are not used because suitor A's breadth is insufficient.
Qty
is the total number of cTags in the culture having a bit value asigned at the pos. In version 0.1 of Akinity all cTags must be at least breadth=8, therefore the Qty at each of the first 256 pos is equal to the number of (enabled) cTags in the culture.
Norm
Norm is a pseudo cTag that can be calculated dynamically from the members of a culture. It is calculated after first aligning the polarity of all the operational culture's cTags.
Norm serves a similar purpose in MateSelect as does a corpus in TF-IDF
Most Common Value
is the boolean value having the higher distribution at this pos across the entire culture. If zeros and one are equally distributed in the culture then MCV is determined by the modulo 2 of pos.
# similar to MCV
is the quantity of cTags in the culture having the same value as the Most Common Value. Naturally this number cannot be less than than half the Qty.
Pos entropy
s calculated from Qty and # similar to mcv using the standard formula. Since the cTags are already aligned at input, only the first half of the calculation is needed (ie not the inverse probability component).
| all cTags in the culture (aligned) | Suitor (aligned) | Candidates (aligned) | ||||||||
| breadth | Pos | Qty | Norm | A (breadth=9) | B (breadth=8) | C (breadth=10) | ||||
| most common value | # similar to MCV | Pos entropy | bit value | bit value | Score | bit value | Score | |||
| 8 | 0 | 1205 | 0 | 725 | 0.88200942 | 0 | 0 | 0.88200942 | 0 | 0.88200942 |
| 1 | 1205 | 0 | 713 | 0.89590570 | 1 | 0 | 0 | 1 | 1.10409430 | |
| .. | 1205 | |||||||||
| 254 | 1205 | 0 | 605 | 0.99815069 | 1 | 0 | 0 | 0 | 0 | |
| 255 | 1205 | 0 | 699 | 0.91150619 | 1 | 1 | 1.08849381 | 0 | 0 | |
| 9 | 256 | 982 | 1 | 567 | 0.91502286 | 0 | n/a | n/a | 0 | 1.08497714 |
| .. | 982 | |||||||||
| 511 | 982 | 0 | 491 | 1.00000000 | 0 | 1 | 0 | |||
| 10 | 512 | 387 | 1 | 210 | 0.95714879 | n/a | 0 | n/a | ||
| .. | 387 | |||||||||
| 1023 | 387 | 0 | 240 | 0.85494470 | 0 | |||||
| .. | n/a | |||||||||
| 15 | 16384 | 2 | 0 | 2 | 0.00000000 | |||||
| .. | 2 | |||||||||
| 32767 | 2 | 1 | 1 | 1.00000000 | ||||||
| Average score | 0.49262581 | 0.51184681 | ||||||||
Score
For each candidate, Score at every pos is calculated according to the formula:
Result Type A = Pos entropy
Result Type B = 2 - Pos entropy
Other (non-matching) Result Types = 0
The overall score for the candidate is calculated by taking the acerage of all pos scores for the candidate. Including zero scores, and excluding any pos that were ignored (marked n/a in the example).
Culture Traversal
Taking the initial candidate from the culture either at random or based on previous experience, MateSelect traverses the operational culture through X-Parent relations in both directions. Following a breadcrumb trail of increasing akin score, at each step moving up or down the ancestral tree as indicated. Zero, one or more than one next step may be taken from any previously calculated candidate.
Multi-threaded
The traversal process is inherently multi-threaded and Ingendex uses this fact to run some processes in parallel.
Limits of traversal
The parent process collates and maintains a record of all its child processes' results so far. Including a high-watermark of best unexplored result so far.
The MateSelect process operates within parameters given by the application, such as the desired maximum number of cTags in Scope (ScopeMaxcTags) and the maximum distance of any cTag from the Focus. Optioanally a timeout may be passed from the application.
Turbulence
Mateseek does not operate beyond the turbulence threshold, where the signal of breadcrumb trails is consideredat significant enough risk of producing incorrect results
Conclusion
The formula used here seems a useful
starting point, but the result quality is not verified. Much further
research may be done in this area. Competing selection strategies may
become a strong differentiator between different applications.