Specification

Specification

Version

This specification document is under version control. The version number of this document is 0.1.

document sources

The html master of this specification is here. The pdf translation of this document is here.

A complete list of past and present versions of the specification is here.

The most recently published version of the specification is here.

This specification document extensively references a glossary, which is also under version control. Version 0.1 of the glossary SHOULD be considered as part of this specification.

compatibility

If a system component is said to be compatible with version p.q of Akinity, what is meant is that the component wholly conforms to version p.q of the specification document.

draft / released

Version 0.1 is the first draft of the Akinity specification.

Versions  0.1 up to 1.0 signifies draft. Version 1.0 and higher signifies released.

Released versions of Akinity are to be  be fully compatible with all earlier released versions. The same is not necessarily true for draft versions. Nor  are released versions necessarily compatible with draft versions.

stable / unstable

Stable versions of the specification have been through a more rigourous quality control process than unstable versions. Unstable versions are marked using the version identifiers such as Aplpha, Beta.

All draft versions should be considered unstable.

Language

This specification is written in English. Translations should make reference to the English master.

Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT","SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [RFC2119].

Sections

This document specifies how to implement the Akinity system. It is in two sections. The first section specifies Akinity's four main components: cTag; synthesis; meiosis; akin. Implementation of cTag is REQURED in every implementation of Akinity. Implementation of any or all of the other three components is OPTIONAL.

The second section covers general matters that are not specifically dealt with in the first section. 

Data Structure

cTag

cTag is the basic component of information exchange. Implementation of cTag is REQUIRED in any application of the Akinity system.

Interoperability

Akinity expects that any application which supports the technology in which a cTag has been implemented, should be able to recognise a cTag and process it in accordance with the specified cTag data format. Therefore it is crucial that developers implementing Akinity should follow this specification when creating or processing cTags in any technologies.

Substrate

cTag is a generic data  structure, independent of the underlying technology substrate of its imlementation.

In principle, a cTag may be implemented in any digital data format that supports text and (some proxy for) binary data. In practice, XML is the only significant implementation of cTag to be released with this version of the specification. JSON has also been implemented, but only as an automatic translation from XML.

It is anticipated that for future versions of this specification, akinity.org will continue to maintain a reference implementation of cTag in one substrate only, which will be XML. The XML implementation will continue to be referenced by the specification, However the predominance of XML will diminish as additional technologies are deployed in the field. akinity.org maintains a process so that developers implementing cTag in other technology substrates can submit their reference implementation for dissemination and recommendation by akinty.org as a best practice,

Developers planning to implement cTag in XML, JSON or another substrate should study both this specification and its reference implementation in XML. In particular the XML Schema, which constrains a cTag implemented in XML should be properly understood. If available, any best practices for the applicable substrate published by akinity.org should also be studied.

Whenever a non-XML substrate is to be implemented, all cTag functionality available in XML MUST be translated to that substrate.

Reference implementation

This specification relies heavily on the XML reference impementation to specify cTag.

XML Schema

The XML schema should be studied in order to grasp the cTag data structure.

This schema should be used either directly for XML implementations or as a template for implementations in any alternative substrates.

Documentation

XML Schema Documentation for the reference implementation provides a readble web interface to the xsd specification. It includes comments and notes intended to aid developers' understanding.

Schema version

A versioned instance of the XML Schema constrains every valid cTag implemented in XML for that version of Akinity.

Schema applicability

The cTag validity section below applies to every cTag contained in an XML document. Whether the cTag is stand-alone, a discrete list or a linked list of ancestral relations.

cTag validity

For any cTag implemented in XML to be valid the containing XML document MUST:

  • reference exactly one cTag.xsd file at the akinity.org domain, according to the schema location specified below
  • reference a cTag.xsd file whose version is at least as high as the highest version of any cTag in the document
  • contain only valid XML. Well-formed and valid with respect to the cTag.xsd file
Schema location

The absolute path to a file is given as: 

http://akinity.org/version/[version#]/cTag/XML/cTag.xsd

where [version#] is a substitution variable indicating a version of Akinity (not necessarily a stable release).

The example below is a valid reference to a XML schema because the path format is valid and the specified version exists:

http://akinity.org/version/0.1/cTag/XML/cTag.xsd

It is RECOMMENDED that XML cTag should reference the xsd using a version path (as above).

The alternative path below will always contain the latest stable version of the xml schema document. If no released version is availbale yet, it will contain the latest draft.

This path will work for all stable released versions, but it is not guaranteed to always work with draft versions of the Akinity specification. Any cTag using this path is at risk of breaking compatibility with draft cTags, should full backward compatibillty with draft cTags not be maintained in some version of the specification.

http://akinity.org/cTag/XML/cTag.xsd

Sample cTags in XML

Sample cTags in JSON

Binary in XML

According to the XML Schema document (XSD) which constrains this reference application, binary components of an XML cTag may be encoded in either hex or base64 text, so as to conform to XML, whose specification does not support binary data.

Extended functionality

If an application supports the relevant technical formats, cTags implemented in different substrates must be no less interoperable than cTags implemented in a common substrate. This implies that Akinity must meet a minimum standard of functionality in any substrate and must not exceed the minimum. Minimum funcrtionality is defined by the capabilities of the XML implementation.

If additional functionality is required Akinity may be extended. since it is not part of the core specificatin, extension is the responsibility of an application's designers and not of Akinity.org.

Akinity.org is the master repository for an open reference implementation for any officially supported technology.

Media Type

cTag is denoted by Mime Media Type of the form :

application/x-cTag

where the implementatoin technology such as XML can be appended e.g.

application/x-cTag+xml

This media type is currently informal (indicated by the /x-), However there is a plan to push for standardisation at an appropriate time.

Functions

Conception and Akin are the key functions in Akinity. Respectively they produce and use cTags.

Conception comprises two function:

Synthesis takes data input and produces a new cTag

Meiosis takes two cTags as input and produces a new cTag

Akin takes two cTags as input and returns a distance measured in entropy.

Optional implementation

Separately, each function is OPTIONAL for any application. However, at least one of these three functions MUST be implemented, since intercourse is defined as an application performing one of the key functions.

Determinate results

All three of the key functions produces results which are determinate. An application can and SHOULD validate its implementation of the functions against the results of the reference implementation for the same inputs.

Reference implementation

In addition to the cTag data structure, all the key functions have been implemented in the Javascript reference application. Application developers are invited to improve their understanding of the specification of these functions by referring to this Javascript implementation, which includes many in-line comments and annotations.

Synthesis

Synthesis is Akinity's way of asimilating digital data in any external (non-Akinity) schema. The synthesis algorithm takes some digital data as input and produces a new synthetic cTag as output.

Rule of consistency

Under the same version of Akinity, the contents of two synthetic cTags, which were independently created by synthesis from exactly the same source data, SHOULD always be consistent. This rule allows common input to synthesis to be recognisable, even where synthesis occurred in different applications in different cultures.

To ensure that the consistency rule always holds, it is important that developers of Akinity applications MUST adhere closely to the specification. Developers can validate adherence to the consistency rule by testing their application's synthetic cTags against the sample synthetic cTags provided.

Method of Synthesis

Version 0.1 of Akinity specifies a single method of synthesis (Mos). It is expected that the method will be unchanging. However, until version 1.0 it is possible that MoS could be re-specified.

The MoS in Version 0.1 is SHA-256.

Expansion

An important corollary to the Method of synthesis is Expansion. Expansion is necessary because applications and their users may have different requirements for precision. A cTag of breadthn+1 has twice as many bits in its contents as a cTag of breadthn and is consequently of higher precision. Two synthetic cTags can still be consistent despite having different breadths.

Pattern of Expansion

The Pattern of Synthesis (PoE) is a component of MoS. It permits any synthetic cTag of breadth=n to be expanded to breadth=n+x, up to the maximum breadth specified by the version of Akinity.

PoE takes as input an array of n bits, where n is the length of the output of MoS. For output, PoE gives an array of 2p bits, where p is the required cTag breadth.

In version 0.1 MoS is the SHA-256 algoithm, which produces output of length 256 bits. PoE therefore takes 256 bits and expands these contents to the required breadth.

PoE is a deterministic component of the deterministic MoS algorithm. For any organelle there is only one correct output of breadth=p.

Like a daisy chain, PoE takes the (non-cumulative) output of the previous discrete expansion step as input to the current step. Unlike a daisy chain, the PoE algorithm allows a degree of parallelism. Each previous step's output is input to two current steps of expansion. At the discretion of the application designer, these steps may be run in separate process threads.

In version 0.1, PoE is SHA-256. For more details on the specifics of this algorithm, refer to the Pattern of Expansion appendix to this specification.

Meiosis

Meiosis arguably is the most novel algorithm in Akinity. It is the function that simulates sexual reproduction for binary data.

Meiosis takes two cTags as input and it produces a new meiotic cTag as output.

Overview

For an overview of how meiosis works, let's consider just one pos of contents. For example pos #213. But the process is identical for every pos - as many pos as there are in the required breadth.

At pos 213, the X-parent and the Y-parent are each either one or zero. And the Z-child, likewise can have one of two values. This makes eight possible input/output scenarios, which are numbered below. The final column in the table below is the number of similarities scored to both parents under meiosis.

# X Y Z sim
0 0 0 0 2
1 0 0 1 0
2 0 1 0 1
3 0 1 1 1
4 1 0 0 1
5 1 0 1 1
6 1 1 0 0
7 1 1 1 2

If inputs and outputs were all at random we could expect, over many iterations of meiosis and many pos, to get high entropy between all cTags. However, these eight scenarios are not all favoured equally by meiosis. Meisosis produces low entropy between output and each input. It also tries to improve the liklihood of high entropy between future inputs by contolling its own output.

For scenarios where both parents at a pos have the same value, the outcome of meiosis is straight-forward:

Accept: (scenarios 0,7) output is the same as both inputs. Meiosis will always produce these outputs for these inputs.

Reject: (scenarios 1,6) output is different to both inputs. Meiosis will never produce output as in these scenarios.

The other scenarios (2,3,4,5), where the two parents each have a different value, are called ambivalent scenarios. Here, the outcome is determined by another scheme. Before going into this scheme, we can clearly see that whatever the scheme is, Z will inevitably resemble one of its two parents and not the other one, since the parents are different at this pos.

For all eight scenarios, there is a total of 8 * 2 = 16 opportunities for Z to resemble one of its two parents. Under a purely random output, the similarity scores (over many runs) would approach 8/16. But due to its discrimination for some scenarios, meiosis produces a lower entropy outcome than random. In the table below, rejected outcomes are replaced by the equivalent accepted ones.

# X Y Z sim
0 0 0 0 2
1 0 0 1 0
1 0 0 0 2
2 0 1 0 1
3 0 1 1 1
4 1 0 0 1
5 1 0 1 1
6 1 1 0 0
6 1 1 1 2
7 1 1 1 2

Meiosis scores 12 /16 similarity to both parents.

It is the gap between 8/16 (max entropy) and 12/16 (meiosis) which the akin function uses to discern apparent kin relation. This works because over many pos, the law of large numbers ensures a very low statistical chance of 12/16 simiarity (many times over) being caused by chance alone.

To protect the base case (high entropy when no kin relation), Meiosis must strive to ensure that all input scenarios are equally likely to occur. This is the main purpose of reversal, which keeps the system-wide probability of zero / one at any pos close to maximum entropy.

Reversal

Relative to each other, the two inputs to meiosis must be more similar, less similar or have exactly 50% similarity.

If the inputs are more similar to each other, the initial condition is deemed 'alike'. If the inputs are less similar, initial condition is 'unalike'. If neither 'alike' nor 'unalike' then initial condition is 'neutral'. See the median line in a binomial distribution chart.

e.g. breadth=8 denotes length=256 bits. From 0 to 256, there are 257 possible input similarity conditions. So 127 bits or fewer is 'unalike', 128 bits is 'neutral' and 129 bits or more is 'alike'. 

when initial condition is 'unalike' :

One of the inputs (Y-Parent) is reversed. This action is for the benefit f this meiosis process, it does not affect the underlying cTag data.

Output is not treated

when initial condition is 'neutral' :

Inputs are not treated

Output is not treated

when initial condition is 'alike' :

Inputs are not treated. Meiosis proceeds with the inputs in their original polarity.

After meiosis selects values for the new cTag at every pos, those values are all reversed. The new cTag is represented in this reversed polarity.

Offset

Akinity maintains relatively low entropy between related cTags, even through many generations of meiosis.

Meiosis achieves this low entropy by means of the scheme that is used to decide which of the two parents to resemble when their contents values are different (scenarios 2,3,4,5). This is the purpose of the offset.

The scheme for maintaining offset values through meiosis is specified in the wiki.

 

Align breadths

Meiosis has to operate on input cTags of equal breadth. By default this is he lower breadth of the two parents (X or Y). Meiosis accepts an optional parameter that specifies the child's breadth should be somehting other than the default.

The logic for determining the Z child's breadth depends on the value of cTag breadth of  parents X and Y and on the parameter specified (if any). The Z child's breadth also depends on whether X and Y are meiotic or synthetic.

In the decision table below min breadth <= a <= b <= c<= max breadth.


meiotic synthetic meiotic synthetic meiotic
param X Y Z

a
b
a
a b
c
a
b
any a
a
a
any b
a

b

a b


a
b b
b
a
c b
c
a
b c

When the required breadth of the Z child is less than a parent, that parent is first contracted

When the required breadth of the Z child is greater than the breadth of a synthetic parent, that parent is first expanded.

 

Organelle data

Meiosis can retain description and URI data from either or neither of the two parents through to the next generation. The child cannot inherit both parents' organelle data.

Meiosis takes two optional parameters.

The Transaction parameter

This is typically set by the user for a single instance of meiosis. If set, the Transaction parameter takes precedence over all others. There is no default for this parameter.

The Application parameter

This typically set by the user for all instances of meiosis. The Application parameter has valid values "NULL", "X", "X|Y", "Y", "Y|X". If not set, the default value for this parameter is "Y|X".

The decision table below shows whence organelle data in the Z child is derived:

parameter


Transaction Application X Y Z
text any any any text

“NULL” any any null

“X” text any text

“X” null any null

“X|Y” text any text

“X|Y” null text text

any null null null

“Y” any text text

“Y” any null null

“Y|X” any text text

“Y|X” text null text


Assign depth

The depth of the Z child created by meiosis is by default is the depth of the parent (X or Y) which has the lowest depth incremented by one. However, a lower depth can be specified in a parameter to meiosis.

In the decision table below 0 <= a <= b < c = maxDepth  (where maxDepth is the maximum depth allowed by the version of Akinity specified as the version required for the child).

Parameter X Y Z

a b a+1

b a a+1
b a a a+1

c+a c+b c
a b c a
a c b a


Ambivalent pos scheme

The assignments of new contents values made in this part of the algorithm represent the section of meiosis that maximises similarity over many generations. It works like two captains alternately picking their preferred football team from a pool of 20 other payers. In this case, the player pool comprises all 'ambivalent' pos. ie pos which were not effectvely the same and therefore were not automatically accepted in an earlier step of meiosis. ('Effectively' refers to contents values after reversal, if applicable).

The two captains are the parents. Each of whose objective is to pick in order to preserve maximum similarity to their own ancestry of the Z child.

Pick rounds

The selection process goes in rounds, each round consists of one pick for each parent. There are as many rounds as there are ambivalent scenarios. In every round, each captain picks the member that they prefer from all pos with ambivalent scenarios that have not yet been picked by either parent. There is a slight advantage to the captain who gets to picks first in each round, which is the X-Parent.

Picking rounds continue until all ambivalent pos have been exhausted.

Ranking criteria

To rank their selections, the parents each use available offset values from both parents at each pos . They follow the same procedures, but each according to its own interests first. There are three prioritised sort criteria.

  1. Offset values in 'own' cTag. i.e. X-Parent ranks according to the offset in the X cTag. Highest first. (self-interest)
  2. Reverse offset values in 'other' cTag. i.e. X-Parent ranks according to the offset in the Y cTag. Lowest first. (be nice)
  3. Pos number. Lowest first. (arbitrary, finally determistic)

The second criterion is used as a tie-breaker for the first. The third as a tie-breaker for the second.

value assignment

When a pos is picked by either parent the Z child acquires the picking parent's boolean value in contents for that pos.

fairness

Since each parent ranks all ambivalent pos according to their own preferences there is no incentive for a process to attempt to cheat by, say, artificially increasing offset values in a cTag.

 

Akin

Akin is the function of Akinity which delivers utility to applications. It takes two cTags as input and returns the normalised distance between them, in units of entropy.

Deterministic

Like the other key functions in Akinity, akin is a deterministic function. For any given inputs, akin's output is always the same in any implementation. Notwithstanding rounding due to variable precision.

Model

distance = akin (source, target, breadth)

Input types

Akin accepts as input synthetic cTag or meiotic cTag in any combination of source and target. The result of akin is not affected by input type(s).

Input sequence

The two input cTags are known as source and target. Nevertheless, because Akinity has the property  of symmetry, the order in which the inputs are presented to akin SHOULD not affect the function's result.

Offset

The offset values of the input cTags do not affect the result  of akin.

Normalised output

The output of akin is a measure of normalised entropy between the two input cTags. The lower and upper bounds of the scale are respectively zero and one.

Output precision

Akinity specifies a minimum requirement for precision of akin's output. Output must be accurate to at least eight decimal places, with no upper bound on precision.

Trailing zeros imply precision to the end of the sequence.

For instance, if the result of akin to 15 decimal places is 0.998414426937447 then the following are examples of correct output:

0.998414426937447; 0.99841442693745; 0.9984144269374; 0.998414426937; 0.99841442694; 0.99841442694; 0.9984144269; 0.998414427; 0.99841443

and the following are examples of incorrect output:

0.9984144; 0.998414426937450; 0.9984144269375

Parameter

A user may optionally specifiy one parameter to the akin function :

Breadth

This parameter restricts the range of pos used to calculate the result of akin. Its purpose is to limit the cost of processing in cases where broadly accurate results are sufficient.

The calculation's range is from pos=0 to the number of pos implied by the breadth parameter.

e.g. breadth parameter = 9 implies the pos range to calculate the result from is 0 to 511. ( 511 = 29 - 1)

Breadth parameter value MUST be an integer between (inclusively) the higher minimum breadth of both input cTag versions and

(if both inputs are meiotic) the lower breadth of both cTags.

(if one input is meiotic) the breadth of the meiotic cTag.

(if both inputs are  synthetic) the lower maximum breadth of both cTag versions

Default value is the lower breadth of both input cTags.

Calculation

Akin calculates distance using the standard formula for calculating entropy.

entropy formula

Because we are dealing with the binomial distribution, entropy calculations in Akinity use log base2.

The Javascript code below is taken from the akin module of the reference application.

            distance = -
              /* polarity0 */     ((    score / length)   * (Math.log((      score / length))   / Math.LN2)  +
              /* polarity1 */     (1 - (score / length)) * (Math.log((1 - (score / length))) / Math.LN2))

Due to reversal, there are two polarities, whose separate results are aggregated in the formula.

distance is the result of the calculation; the measure of entropy in the observed similarity.

score is the number of bits of similarity observed between the contents of the two cTags.

length is the number of pos from which score was derived.

 

Section 2


Appendix

The documents linked below are part of this specification:

Specification attachments

General

Glossary

Synthesis

Pattern of Expansion

Data Structure

cTag.xsd

XML Schema Documentation


Supporting documents

The documents linked below support, but are not part of, the specification:

Web app (Javascript / Gears)

Application

Functions in Javascript

Akin

Meiosis

Synthesis

Data Structure

samples