FOM: Canonical codings

JoeShipman@aol.com JoeShipman at aol.com
Mon Sep 3 03:18:29 EDT 2001


Axel Boldt's question about canonical non-computable numbers raises some 
interesting issues.  Some codings involve arbitrary choices, others seem to 
be more canonical, but precise definitions are lacking.

The following bijective codings of sets of finite objects seem to be arguably 
"canonical":

Successor:{non-negative integers}-->{positive integers}
Successor(x)=x+1

Fold:{integers}-->{positive integers}
Fold(x)=0.5 + |2x-0.5| (this "counts" the integers in the sequence 
0,1,-1,2,-2,3,-3,...; the only other candidate for canonicity counts 
0,-1,1,-2,2,-3,3,...)

Godelnumber:{eventually-zero infinite sequences of non-negative integers}-->
{positive integers} 
Godelnumber((x1,x2,x3,...,x_n,0,0,0,0,.....))=(p1^x1)*(p2^x2)*...*(p_n^x_n) 
where p1,p2,... is the sequence of primes 2,3,5,...

Basetwo:{finite sets of non-negative integers}-->{non-negative integers}
Basetwo({x1,x2,...,x_n}) = 2^x1+2^x2+...+2^x_n (the empty set goes to an 
empty sum which is 0, so combined with Successor this counts finite sets of 
non-negative integers {},{0},{1},{0,1},{2},{0,2},{1,2},{0,1,2},{3},...

HF:{hereditarily finite sets}-->{non-negative integers}
HF(X)=sum over x in X of 2^HF(x)  
This counts V_omega as 
{},
{{}},
{{{}}},
{{},{{}}},
{{{{}}}},
{{},{{{}}}},
{{{}},{{{}}}},
{{},{{}},{{{}}}}},
{{{},{{}}}},
etc.

What is missing?  We might like a canonical enumeration of rationals and of 
ordered pairs of non-negative integers.  For pairs there are a number of ways 
to do it, but none seems to be obviously "most canonical".  My candidate 
would be

OPair:{ordered pairs of non-negative integers}-->non-negative integers
OPair((x,y))=x + (x+y)(x+y+1)/2
This counts pairs 
(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0),(0,4),...

To canonically enumerate the rationals we have a problem.  The usual way to 
do it is to just enumerate positive rationals (x+1)/(y+1) by enumerating the 
corresponding pairs (x,y), and discarding those expressions not in lowest 
terms.  But this "generate and test" procedure is not naturally bijective, in 
particular the inverse function is no longer straightforward to calculate.

A somewhat better way would be to use Farey fractions and enumerate rationals 
between 0/1 and 1/1 as follows: 
Preparatory stage: 0/1,1/1                                  
Stage 0: 1/2, 
Stage 1: 1/3,2/3,
Stage 2: 1/4,2/5,3/5,3/4,
Stage 3: 1/5,2/7,3/8,3/7,4/7,5/8,5/7,4/5,
Stage 4: 1/6,...

The pattern here is that at the nth stage we enumerate the 2^n fractions 
formed by taking "Farey averages" of adjacent fractions from the previous 
stages.  The "Farey average" of a/b and c/d is (a+c)/(b+d).  It is easy to 
show this is a bijection of the fractions generated starting with stage 0 and 
the rationals between 0 and 1; this can be composed with the "Fold" and 
"Successor" transformations twice to add 1/1, get all positive rationals, add 
0/1, and get all rationals.

But this still doesn't seem compellingly non-arbitrary.  Any better 
suggestions?

We've seen that we can get a canonical coding of Diophantine equations (with 
non-negative integer coefficients) by using the Godelnumber coding twice, 
once for the sequence of coefficients in a polynomial and once for the 
sequence of exponents in a monomial.  

Word problems are harder to do canonically because of the asymmetry involved 
(you have to give the generators and relations and separately a word to test 
for triviality).  Fortunately, for both groups and semigroups the triviality 
problem is also unsolvable and instances can be represented more 
symmetrically.

Triviality problem for semigroups
Instance: a finite set of relations (pairs of words in a finite alphabet 
which are declared to be equal; assume that every letter in the alphabet 
appears in at least one relation so that we don't need to specify generators 
separately--we can do this because if some generator isn't involved in any 
relations the semigroup is nontrivial). 
Question: whether the relations imply that all the letters (1-letter words) 
in the alphabet are equal to the identity element (corresponding to the empty 
word).

The problem with canonically enumerating instances here is that we don't have 
such a great pairing function, and a relation needs to be a pair of words 
because we don't have inverses available.

Triviality problem for groups
Instance: a finite set W={w1,w2,...,w_n} of words in the doubly infinite 
alphabet x_1,x_-1,x_2,x_-2,x_3,x_-3,....
Question: Whether W generates a subgroup of the free group on 
{x_1,x_2,x_3,....} that is equal to the subgroup generated by exactly those 
generators x_i or x_-i which appear in some word in W.  (As before, we don't 
care about generators that aren't involved in any relation word because 
they'd prevent triviality.)

This has some hope of being canonical.  We can use Fold to make the alphabet 
singly infinite and Basetwo to enumerate finite sets; ***the only problem is 
canonically enumerating the set of words in a denumerably infinite 
alphabet***.

Here is one dopey way: enumerate all words beginning with x in the 2-letter 
alphabet {x,'} and then reinterpret such a word as a word in the denumerably 
infinite alphabet {x, x', x'', x'''',x'''',...} in the obvious way, tacking 
the empty word on the front. 

Can anyone suggest a better way to do this?  

(Yes, I know that there is a relatively small integer n such that the word 
problem for groups with n-generator presentations is already unsolvable, and 
it is easy to enumerate words in a finite alphabet, but the choice of n is 
arbitrary until we can find the minimal such n, and we are nowhere near able 
to do that!)

-- Joe Shipman




More information about the FOM mailing list