Computation theory

Function equality: Two functions are intensionally equal (\(f=g\)) iff they have the same definition. Two functions are equivalent (extensionally equal) iff their outputs always match. Formally, \(f \equiv g\) iff \(f~x = g~x\) for all \(x\). We choose extensionalism.

The power of a machine is the set of all functions it can compute. The power of \(m\) is the set of all \(m\)-computable functions.

A programmable machine has a program in its state; its state is thus of the form \((p,s) : (P,T)\) where \(p\) is the program and \(s\) is the program state. The program selects which function the machine computes, and thus we can think of \(p\) as having the type \(T \to T\). In theory, usually a machine’s program set is countable. A self-modifying machine is a programmable machine that changes its own program while running. A self-improving machine is a self-modifying machine that tries to minimizes \(h\) at every step: \[ \begin{align*} f~(p,s) &= (p',s') \\ h~s' &\le h~s \end{align*} \]

Relevant Wikipedia entry: recursive self-improvement.

Matt Mahoney in a model for recursively self-improving program:

We formally define recursively self improving programs and show that such programs exist, but that their algorithmic complexity cannot grow faster than \(O(\log n)\) given fixed goals.

Machine learning (learning theory) is related to approximation theory Universal function approximation.

Related topics: automatic programming, machine behavior, iterated function.

summary of machine learning

Stephen Omohundro’s the nature of self-improving artificial intelligence

A nondeterministic machine simply generalizes the transition function with a transition relation, but we can turn any relation into a function. We can turn \(R \subseteq A \times B\) to its image function \(im~R : 2^A \to 2^B\) where \(im~X = \{ y ~|~ x \in X, (x,y) \in R \}\); this allows us to transform a nondeterministic machine \((S,R)\) into a deterministic machine \((2^S, im~R)\). Bounded nondeterminism: image always finite. Bounded nondeterminism should give no additional power over determinism.

A universal machine can compute everything that any other machine can compute. Machine \(m\) is \(M\)-universal iff the power of \(m\) is the union of the powers of all machines in \(M\). If \(m\) is universal, there are \(\aleph_0\) \(m\)-computable functions. An \(M\)-universal machine can compute what every machine in \(M\) can compute.

Lambda-calculus universal machine: A state is \((m,s)\) where \(m\) is the simulated machine and \(s\) is a state of the simulated machine. We can do this because \(\lvert \mathbb{N}^2 \rvert = \lvert \mathbb{N} \rvert\). \[ step~u~(m,s) = (m, step~m~s) \] which is just function application.

There is a bijection between a machine’s step function and \(\mathbb{N}\)?

We define a complexity measure \(c\) as a function mapping a computation to a number in the following monotonic sense: \[ p \subseteq q \implies c~p \le c~q, \] which in English means that a path is never less complex than any of its subpath. Such complexity measure also has to satisfy \[ \begin{align*} c~p &\ge 0 \\ c~p + c~q &\le c~(p \cup q) \end{align*} \]

Two commonly used complexity measures are space complexity and time complexity. If we can define the space usage of each state, then we can define the space complexity of a path as the maximum space usage among all its vertices. The time complexity is simply the length of the path.

Despite having a ‘measure’ in its name, a complexity measure is not a measure.


A machine that does not do much

\[ \begin{align*} S &= \mathbb{N} \\ f~x &= 0 \end{align*} \]

A machine that does even less

\[ \begin{align*} f~x &= x \end{align*} \]

A smallest machine that never terminates

\[ \begin{align*} S &= \{ 0,1 \} \\ f~0 &= 1 \\ f~1 &= 0 \end{align*} \]

A machine that adds one

\[ \begin{align*} S &= (T,\mathbb{N}) \\ T &= \{Init,Term\} \\ f~(Init,n) &= (Term,N~n) \\ f~(Term,n) &= (Term,n) \end{align*} \]

A machine that determines whether a number is even

\[ \begin{align*} S &= \mathbb{N} \\ f~Z &= Z \\ f~(N~Z) &= N~Z \\ f~(N~(N~n)) &= n \end{align*} \]

Contrived ordering of states

Iff the machine is acyclic, we can define this order on \(S\):

Started on any state, acyclic machines always eventually terminate.

Started on some state, cyclic machines enter bigger cycle (don’t terminate).

Complexity measures

\(space~m~x\) is the maximum state size of \(m\) while running with input \(x\). We must define state size function having type \(S \to \mathbb{N}\) first.

\(time~m~x\) is the number of steps \(m\) needs to finish running with input \(x\).

Restricting the machine to be more realistic

Let \(S\) be the type of untyped lambda-calculus terms with currying (each lambda abstraction has exactly one parameter).

Let \(f\) be the normal-order \(\beta\)-reduction.

A state is an untyped lambda-calculus term. A state is terminal iff it is in head-normal form. A state \(s\) is terminal iff \(f~s = s\).

Two states are equal if they are equal modulo \(\alpha\)-conversion.

Example input function

\[ i~n = (\lambda x.x+1)~(u~n) \] where \(u~n\) is the Church encoding of \(n\).

How many lambda-calculus terms are there (modulo \(\alpha\)-conversion)?

From the grammar of combinatory calculus: \[ T ::= x ~|~ P ~|~ (TT) \] we can infer that there are \(\aleph_0\) terms. (How?)

We can write a program to enumerate all lambda-calculus terms (modulo \(\alpha\)-conversion).

Nondeterministic machine

The step relation?

The step function is \(f : S \to Set~S\). May be ‘partial’: \[ f~x = \{\} \]

A state \(s\) is terminal iff \(f~s = \{s\}\).

To simplify, define \(g : Set~S \to Set~S\): \[ g = liftedStep~m \]

g xs =
        x be an element of xs
        if f x == {x}
            then f x
            else union (f x) (xs - {x})

Termination: \(g~u = u\).

Given a function \(g : A \to A\), we can construct a free machine called a \(g\)-oracle machine that computes that function. The state space of the machine is \(\{0,1\} \times A\). The transition function is \(f~(0,x) = (1,g~x)\) and \(f~(1,x) = (1,x)\). The input embedding is \(i~x = (0,x)\). The output extraction is \(o~(a,x) = x\). Such machine computes \(g\) in one step.

Old approach

Program implementing function on machine

A program \(p\) implements function \(f:t\to u\) on machine \(m\) iff \[ cod~m~S~u \circ run~m~p \circ cod~m~t~S \equiv f \] which is the same as \[ run~m~p \equiv cod~m~u~S \circ f \circ cod~m~S~t. \]

\(cod~m\) is the coding bijection of \(m\). \(cod~m~a~b\) is a transformation from \(a\) to \(b\). \(cod~m\) must satisfy \[ \begin{align*} cod~m~a~a &\equiv id \\ cod~m~b~a \circ cod~m~a~b &\equiv id \\ cod~m~b~c \circ cod~m~a~b &\equiv cod~m~a~c \\ (cod~m~c~d \circ cod~m~b~c) \circ cod~m~a~c &\equiv cod~m~c~d \circ (cod~m~b~c \circ cod~m~a~c). \end{align*} \]

Computable function

A function is computable iff there is a program implementing it. Function \(f\) is \(m\)-computable iff there is a program implementing \(f\) on \(m\).

A machine has \(\aleph_0\) possible programs.

Program equivalence

Two programs are equivalent iff they implement two equivalent functions on the same machine. Formally, two programs \(p\) and \(q\) are \(m\)-equivalent iff \(run~m~p \equiv run~m~q\).

Let \(impset~m~f\) be the set of all programs on \(m\) implementing \(f\). This is an equivalence class of programs.

Iff \(p\) implements \(f\) and \(p\) implements \(g\), then \(f \equiv g\).

Iff \(f \not\equiv g\) then \(impset~m~f \cap impset~m~g \equiv \varnothing\).

Iff \(f \equiv g\) then \(impset~m~f \equiv impset~m~g\).

Complexity measure

A program \(p\) is a shortest program for \(f\) on \(m\) iff \(p\) implements \(f\) on \(m\) and there is no shorter programs implementing \(f\) on \(m\).

The \(m\)-algorithmic-complexity of \(f\) is the length of a shortest program for \(f\) on \(m\).

Long enough shortest program

There is always a function whose shortest program is long enough.

There are \(\aleph_0\) computable functions. The set of all programs shorter than \(n\) bits can only represent at most \(2^n - 1\) functions. There is always a function not in the set.

Slow enough fastest program

There is always a function whose fastest program is slow enough.

Fastest program

\(fastest~m~f\) is the fastest program implementing \(f\) on \(m\).

Is there such class whose fastest program is slow?

Is there a constant-time algorithm for every problem?

Is there a problem for which brute force is the fastest possible algorithm?

A problem can be solved infinitely slowly.

Can it be solved infinitely fast?

Graph theory

Deterministic machine (partial function approach)

A deterministic machine has state space \(S\) (with cardinality \(\aleph_0\)) and step function \(f : S \to S\) (partial).

A state \(s\) is terminal iff \(f~s\) is undefined.


A computation is the maximal iteration of the step function on an initial state.

The result of the computation that \(m\) performs from initial state \(x\) is \(mit~(step~m)~x\) where \(mit\) stands for ‘maximal iteration’.

\(mit~f~x\) means repeatedly applying the step function to the state until a terminal state is reached: \[ mit~f~x = \begin{cases} mit~f~(f~x) &\text{if $f~x$ is defined} \\ x &\text{otherwise} \end{cases} \]

\(fun~m\) is the function computed by \(m\): \[ mit~(step~m) \circ enc = enc \circ fun~m \\ mit~(step~m) \cong fun~m \] where \(enc : \mathbb{N} \to B^*\) is the binary encoding.

The function \(m\) computes is isomorphic to the maximal iteration of the step function of \(m\).

Complexity measure

The running time of the machine from initial state \(x\) is the smallest natural number \(n\) such that \(f^n~x\) is terminal.

Let \(I\) be the set of all initial states.

Let \(T\) be the set of all terminal states.

Let \(C~x~n\) be the set of all states reachable in at most \(n\) steps from \(x\).

\(\mu~s\) is the size of state \(s\).

Computation graph

A state is a vertex in the computation graph.

The edge \((u,v)\) is in the computation graph iff \(f~u = v\).

Corollary: A terminal state is a vertex with no outgoing edge.

A deterministic computation graph does not have a vertex with more than one outgoing edge.

If the computation graph has no cycles, then the machine always eventually terminates.

Universal nondeterministic machine

\(g : S \to Set~S\) is a universal nondeterministic machine.

The running time of a deterministic machine \(f\) from initial state \(x\) is the smallest nonnegative \(n\) such that \(c~f~x~n = c~f~x~(n-1)\) where \[ \begin{align*} c~f~x~0 &= x \\ c~f~x~n &= c~f~x~(n-1) \triangleright f \end{align*} \] where \[ \begin{align*} x \triangleright f &= f~x \\ x \triangleright f \triangleright g &= (x \triangleright f) \triangleright g. \end{align*} \]

Iff there is no such \(n\), the program doesn’t terminate.

The running time is the smallest \(n\) such that \(k~g~x~n = k~g~x~(n-1)\) where \[ \begin{align*} k~g~x~0 &= \{x\} \\ k~g~x~n &= k~g~x~(n-1) >>= g \end{align*} \] where \[ \begin{align*} x >>= g &= concatMap~g~x \\ x >>= g >>= h &= (x >>= g) >>= h \end{align*} \]

This is the same but with \(\triangleright\) replaced by \(>>=\).

Natural-number functions

There are \(2^{\aleph_0}\) functions of type \(\mathbb{N} \to B\).

There are \(2^{\aleph_0}\) functions of type \(B^* \to B\).

There are \(\aleph_0^{\aleph_0}\) functions of type \(\mathbb{N} \to \mathbb{N}\).

Both are continuum:

\[ 2^{\aleph_0} = \aleph_0^{\aleph_0} = c \]

Therefore there is a bijection between \(B^* \to B\) (predicates) and \(\mathbb{R}\) (real numbers).

Example bijection between \(B^* \to B\) and \([0,1]\)

Encode the predicate \(p\) to the real number \(r = (0 . r_0 r_1 r_2 \ldots)_2\): \[ r_n = p~(b~n) \] where \(b~n\) is the binary encoding of \(n\). This transformation is similar to Gödel numbering.

A computable real number corresponds to a decidable language.


The always-false predicate corresponds with the number \((.000\ldots)_2 = 0\).

The always-true predicate corresponds with the number \((.111\ldots)_2 = 1\).

The predicate of whether a number is zero corresponds with the number \((.100000\ldots)_2 = 1/2\).

The predicate of whether a number is divisible by 2 corresponds with the number \((.101010\ldots)_2 = 2/3\).

The predicate of whether bit 1 of a string is set corresponds with the number \((.0011~0011~0011\ldots)_2 = 3/16\).

A divisibility-test becomes a fraction (but not the other way).

The predicate of whether a number is prime corresponds with the number \((.0011~0101~0001~0100~0101~0001~0000~0101\ldots)_2\).

\(1 - r\) corresponds to \(not \circ p\).

Nondeterministic lambda calculus

Dependently-typed amb operator

Nondeterministic lambda-calculus machine can use McCarthy’s ambiguous operator.

\[ amb~t : t \]

Given a type \(t\), \(amb\) picks any inhabitant of \(t\) that makes the current computation succeed.

\[ (\lambda x. eq~0~(x^2 + 2x + 1))~(amb~\mathbb{R}) \]

\[ eq~0~(at~[3,2,1,0]~(amb~\mathbb{N})) \]

\[ (\lambda x y z. (x \wedge y) \vee z)~(amb~B)~(amb~B)~(amb~B) \]

Polymorphic amb operator

Lambda calculus with quantified type system.

\[ pam : \forall~(\lambda t. t) \\ pam : \forall~id \\ id : \forall~(\lambda t. t \to t) \]

The the following \(pam\) specializes to \(Bool\). \[ (\lambda x y z. (x \wedge y) \vee z)~pam~pam~pam \to true \\ (\lambda x . (x \wedge \neg x))~pam \to \bot \]

\(pam\) can be more complex than that. \[ (\lambda f x. eq~x~(f~x))~pam_f~pam_x \to true \\ pam_f : a \to a \\ pam_x : a \]

Oracle for satisfiability problem \(p : A \to \{0,1\}\): \[ p~pam \]

\(\beta^{pam}\)-reduction: If \(p : a \to B\), rewrite \(p~pam\) to \(p~x\) such that \(p~x\) is true.

Satisfier operator

\(\beta^{sat}\)-reduction reduces \(sat~p\) to any \(x\) such that \(p~x\) is true.

\[ sat~(\lambda(x,y). x \wedge y) \to (true,true) \\ sat~(\lambda(x,y). x \vee y) \to (true,false) \\ sat~(\lambda(x,y). x \vee y) \to (false,true) \\ sat~(\lambda x. x : \mathbb{R} \wedge x^2 - 1 = 0) \to 1 \\ sat~(\lambda x. x : \mathbb{R} \wedge x^2 - 1 = 0) \to -1 \newcommand\Nat{\mathsf{Nat}} \newcommand\Bit{\mathsf{Bit}} \]

SAT-LEN (limited-length satisfiability): given \(n : \Nat\) and \(f : \Bit^* \to \Bit\), find \(x : \Bit^n\) such that \(f~x = 1\).

The questions

Can nondetism asymptotically quicken a computation?

When can it?

Why/how can it?