# Computation theory

Function equality: Two functions are intensionally equal ($$f=g$$) iff they have the same definition. Two functions are equivalent (extensionally equal) iff their outputs always match. Formally, $$f \equiv g$$ iff $$f~x = g~x$$ for all $$x$$. We choose extensionalism.

The power of a machine is the set of all functions it can compute. The power of $$m$$ is the set of all $$m$$-computable functions.

A programmable machine has a program in its state; its state is thus of the form $$(p,s) : (P,T)$$ where $$p$$ is the program and $$s$$ is the program state. The program selects which function the machine computes, and thus we can think of $$p$$ as having the type $$T \to T$$. In theory, usually a machine’s program set is countable. A self-modifying machine is a programmable machine that changes its own program while running. A self-improving machine is a self-modifying machine that tries to minimizes $$h$$ at every step: \begin{align*} f~(p,s) &= (p',s') \\ h~s' &\le h~s \end{align*}

Relevant Wikipedia entry: recursive self-improvement.

Matt Mahoney in a model for recursively self-improving program:

We formally define recursively self improving programs and show that such programs exist, but that their algorithmic complexity cannot grow faster than $$O(\log n)$$ given fixed goals.

Machine learning (learning theory) is related to approximation theory Universal function approximation.

Related topics: automatic programming, machine behavior, iterated function.

summary of machine learning

Stephen Omohundro’s the nature of self-improving artificial intelligence

A nondeterministic machine simply generalizes the transition function with a transition relation, but we can turn any relation into a function. We can turn $$R \subseteq A \times B$$ to its image function $$im~R : 2^A \to 2^B$$ where $$im~X = \{ y ~|~ x \in X, (x,y) \in R \}$$; this allows us to transform a nondeterministic machine $$(S,R)$$ into a deterministic machine $$(2^S, im~R)$$. Bounded nondeterminism: image always finite. Bounded nondeterminism should give no additional power over determinism.

A universal machine can compute everything that any other machine can compute. Machine $$m$$ is $$M$$-universal iff the power of $$m$$ is the union of the powers of all machines in $$M$$. If $$m$$ is universal, there are $$\aleph_0$$ $$m$$-computable functions. An $$M$$-universal machine can compute what every machine in $$M$$ can compute.

Lambda-calculus universal machine: A state is $$(m,s)$$ where $$m$$ is the simulated machine and $$s$$ is a state of the simulated machine. We can do this because $$\lvert \mathbb{N}^2 \rvert = \lvert \mathbb{N} \rvert$$. $step~u~(m,s) = (m, step~m~s)$ which is just function application.

There is a bijection between a machine’s step function and $$\mathbb{N}$$?

We define a complexity measure $$c$$ as a function mapping a computation to a number in the following monotonic sense: $p \subseteq q \implies c~p \le c~q,$ which in English means that a path is never less complex than any of its subpath. Such complexity measure also has to satisfy \begin{align*} c~p &\ge 0 \\ c~p + c~q &\le c~(p \cup q) \end{align*}

Two commonly used complexity measures are space complexity and time complexity. If we can define the space usage of each state, then we can define the space complexity of a path as the maximum space usage among all its vertices. The time complexity is simply the length of the path.

Despite having a ‘measure’ in its name, a complexity measure is not a measure.

### Examples

#### A machine that does not do much

\begin{align*} S &= \mathbb{N} \\ f~x &= 0 \end{align*}

#### A machine that does even less

\begin{align*} f~x &= x \end{align*}

#### A smallest machine that never terminates

\begin{align*} S &= \{ 0,1 \} \\ f~0 &= 1 \\ f~1 &= 0 \end{align*}

#### A machine that adds one

\begin{align*} S &= (T,\mathbb{N}) \\ T &= \{Init,Term\} \\ f~(Init,n) &= (Term,N~n) \\ f~(Term,n) &= (Term,n) \end{align*}

#### A machine that determines whether a number is even

\begin{align*} S &= \mathbb{N} \\ f~Z &= Z \\ f~(N~Z) &= N~Z \\ f~(N~(N~n)) &= n \end{align*}

### Contrived ordering of states

Iff the machine is acyclic, we can define this order on $$S$$:

• If $$f~x = y$$ then $$x \le y$$.
• If $$x \le y$$ and $$y \le z$$ then $$x \le z$$.

Started on any state, acyclic machines always eventually terminate.

Started on some state, cyclic machines enter bigger cycle (don’t terminate).

### Complexity measures

$$space~m~x$$ is the maximum state size of $$m$$ while running with input $$x$$. We must define state size function having type $$S \to \mathbb{N}$$ first.

$$time~m~x$$ is the number of steps $$m$$ needs to finish running with input $$x$$.

### Restricting the machine to be more realistic

Let $$S$$ be the type of untyped lambda-calculus terms with currying (each lambda abstraction has exactly one parameter).

Let $$f$$ be the normal-order $$\beta$$-reduction.

A state is an untyped lambda-calculus term. A state is terminal iff it is in head-normal form. A state $$s$$ is terminal iff $$f~s = s$$.

Two states are equal if they are equal modulo $$\alpha$$-conversion.

#### Example input function

$i~n = (\lambda x.x+1)~(u~n)$ where $$u~n$$ is the Church encoding of $$n$$.

How many lambda-calculus terms are there (modulo $$\alpha$$-conversion)?

From the grammar of combinatory calculus: $T ::= x ~|~ P ~|~ (TT)$ we can infer that there are $$\aleph_0$$ terms. (How?)

We can write a program to enumerate all lambda-calculus terms (modulo $$\alpha$$-conversion).

## Nondeterministic machine

The step relation?

The step function is $$f : S \to Set~S$$. May be ‘partial’: $f~x = \{\}$

A state $$s$$ is terminal iff $$f~s = \{s\}$$.

To simplify, define $$g : Set~S \to Set~S$$: $g = liftedStep~m$

g xs =
let
x be an element of xs
in
if f x == {x}
then f x
else union (f x) (xs - {x})

Termination: $$g~u = u$$.

Given a function $$g : A \to A$$, we can construct a free machine called a $$g$$-oracle machine that computes that function. The state space of the machine is $$\{0,1\} \times A$$. The transition function is $$f~(0,x) = (1,g~x)$$ and $$f~(1,x) = (1,x)$$. The input embedding is $$i~x = (0,x)$$. The output extraction is $$o~(a,x) = x$$. Such machine computes $$g$$ in one step.

## Old approach

### Program implementing function on machine

A program $$p$$ implements function $$f:t\to u$$ on machine $$m$$ iff $cod~m~S~u \circ run~m~p \circ cod~m~t~S \equiv f$ which is the same as $run~m~p \equiv cod~m~u~S \circ f \circ cod~m~S~t.$

$$cod~m$$ is the coding bijection of $$m$$. $$cod~m~a~b$$ is a transformation from $$a$$ to $$b$$. $$cod~m$$ must satisfy \begin{align*} cod~m~a~a &\equiv id \\ cod~m~b~a \circ cod~m~a~b &\equiv id \\ cod~m~b~c \circ cod~m~a~b &\equiv cod~m~a~c \\ (cod~m~c~d \circ cod~m~b~c) \circ cod~m~a~c &\equiv cod~m~c~d \circ (cod~m~b~c \circ cod~m~a~c). \end{align*}

### Computable function

A function is computable iff there is a program implementing it. Function $$f$$ is $$m$$-computable iff there is a program implementing $$f$$ on $$m$$.

A machine has $$\aleph_0$$ possible programs.

### Program equivalence

Two programs are equivalent iff they implement two equivalent functions on the same machine. Formally, two programs $$p$$ and $$q$$ are $$m$$-equivalent iff $$run~m~p \equiv run~m~q$$.

Let $$impset~m~f$$ be the set of all programs on $$m$$ implementing $$f$$. This is an equivalence class of programs.

Iff $$p$$ implements $$f$$ and $$p$$ implements $$g$$, then $$f \equiv g$$.

Iff $$f \not\equiv g$$ then $$impset~m~f \cap impset~m~g \equiv \varnothing$$.

Iff $$f \equiv g$$ then $$impset~m~f \equiv impset~m~g$$.

### Complexity measure

A program $$p$$ is a shortest program for $$f$$ on $$m$$ iff $$p$$ implements $$f$$ on $$m$$ and there is no shorter programs implementing $$f$$ on $$m$$.

The $$m$$-algorithmic-complexity of $$f$$ is the length of a shortest program for $$f$$ on $$m$$.

### Long enough shortest program

There is always a function whose shortest program is long enough.

There are $$\aleph_0$$ computable functions. The set of all programs shorter than $$n$$ bits can only represent at most $$2^n - 1$$ functions. There is always a function not in the set.

### Slow enough fastest program

There is always a function whose fastest program is slow enough.

### Fastest program

$$fastest~m~f$$ is the fastest program implementing $$f$$ on $$m$$.

Is there such class whose fastest program is slow?

Is there a constant-time algorithm for every problem?

Is there a problem for which brute force is the fastest possible algorithm?

A problem can be solved infinitely slowly.

Can it be solved infinitely fast?

## Graph theory

### Deterministic machine (partial function approach)

A deterministic machine has state space $$S$$ (with cardinality $$\aleph_0$$) and step function $$f : S \to S$$ (partial).

A state $$s$$ is terminal iff $$f~s$$ is undefined.

#### Computation

A computation is the maximal iteration of the step function on an initial state.

The result of the computation that $$m$$ performs from initial state $$x$$ is $$mit~(step~m)~x$$ where $$mit$$ stands for ‘maximal iteration’.

$$mit~f~x$$ means repeatedly applying the step function to the state until a terminal state is reached: $mit~f~x = \begin{cases} mit~f~(f~x) &\text{if f~x is defined} \\ x &\text{otherwise} \end{cases}$

$$fun~m$$ is the function computed by $$m$$: $mit~(step~m) \circ enc = enc \circ fun~m \\ mit~(step~m) \cong fun~m$ where $$enc : \mathbb{N} \to B^*$$ is the binary encoding.

The function $$m$$ computes is isomorphic to the maximal iteration of the step function of $$m$$.

### Complexity measure

The running time of the machine from initial state $$x$$ is the smallest natural number $$n$$ such that $$f^n~x$$ is terminal.

Let $$I$$ be the set of all initial states.

Let $$T$$ be the set of all terminal states.

Let $$C~x~n$$ be the set of all states reachable in at most $$n$$ steps from $$x$$.

$$\mu~s$$ is the size of state $$s$$.

### Computation graph

A state is a vertex in the computation graph.

The edge $$(u,v)$$ is in the computation graph iff $$f~u = v$$.

Corollary: A terminal state is a vertex with no outgoing edge.

A deterministic computation graph does not have a vertex with more than one outgoing edge.

If the computation graph has no cycles, then the machine always eventually terminates.

### Universal nondeterministic machine

$$g : S \to Set~S$$ is a universal nondeterministic machine.

The running time of a deterministic machine $$f$$ from initial state $$x$$ is the smallest nonnegative $$n$$ such that $$c~f~x~n = c~f~x~(n-1)$$ where \begin{align*} c~f~x~0 &= x \\ c~f~x~n &= c~f~x~(n-1) \triangleright f \end{align*} where \begin{align*} x \triangleright f &= f~x \\ x \triangleright f \triangleright g &= (x \triangleright f) \triangleright g. \end{align*}

Iff there is no such $$n$$, the program doesn’t terminate.

The running time is the smallest $$n$$ such that $$k~g~x~n = k~g~x~(n-1)$$ where \begin{align*} k~g~x~0 &= \{x\} \\ k~g~x~n &= k~g~x~(n-1) >>= g \end{align*} where \begin{align*} x >>= g &= concatMap~g~x \\ x >>= g >>= h &= (x >>= g) >>= h \end{align*}

This is the same but with $$\triangleright$$ replaced by $$>>=$$.

## Natural-number functions

There are $$2^{\aleph_0}$$ functions of type $$\mathbb{N} \to B$$.

There are $$2^{\aleph_0}$$ functions of type $$B^* \to B$$.

There are $$\aleph_0^{\aleph_0}$$ functions of type $$\mathbb{N} \to \mathbb{N}$$.

Both are continuum:

$2^{\aleph_0} = \aleph_0^{\aleph_0} = c$

Therefore there is a bijection between $$B^* \to B$$ (predicates) and $$\mathbb{R}$$ (real numbers).

### Example bijection between $$B^* \to B$$ and $$[0,1]$$

Encode the predicate $$p$$ to the real number $$r = (0 . r_0 r_1 r_2 \ldots)_2$$: $r_n = p~(b~n)$ where $$b~n$$ is the binary encoding of $$n$$. This transformation is similar to Gödel numbering.

A computable real number corresponds to a decidable language.

#### Examples

The always-false predicate corresponds with the number $$(.000\ldots)_2 = 0$$.

The always-true predicate corresponds with the number $$(.111\ldots)_2 = 1$$.

The predicate of whether a number is zero corresponds with the number $$(.100000\ldots)_2 = 1/2$$.

The predicate of whether a number is divisible by 2 corresponds with the number $$(.101010\ldots)_2 = 2/3$$.

The predicate of whether bit 1 of a string is set corresponds with the number $$(.0011~0011~0011\ldots)_2 = 3/16$$.

A divisibility-test becomes a fraction (but not the other way).

The predicate of whether a number is prime corresponds with the number $$(.0011~0101~0001~0100~0101~0001~0000~0101\ldots)_2$$.

$$1 - r$$ corresponds to $$not \circ p$$.

## Nondeterministic lambda calculus

### Dependently-typed amb operator

Nondeterministic lambda-calculus machine can use McCarthy’s ambiguous operator.

$amb~t : t$

Given a type $$t$$, $$amb$$ picks any inhabitant of $$t$$ that makes the current computation succeed.

$(\lambda x. eq~0~(x^2 + 2x + 1))~(amb~\mathbb{R})$

$eq~0~(at~[3,2,1,0]~(amb~\mathbb{N}))$

$(\lambda x y z. (x \wedge y) \vee z)~(amb~B)~(amb~B)~(amb~B)$

### Polymorphic amb operator

Lambda calculus with quantified type system.

$pam : \forall~(\lambda t. t) \\ pam : \forall~id \\ id : \forall~(\lambda t. t \to t)$

The the following $$pam$$ specializes to $$Bool$$. $(\lambda x y z. (x \wedge y) \vee z)~pam~pam~pam \to true \\ (\lambda x . (x \wedge \neg x))~pam \to \bot$

$$pam$$ can be more complex than that. $(\lambda f x. eq~x~(f~x))~pam_f~pam_x \to true \\ pam_f : a \to a \\ pam_x : a$

Oracle for satisfiability problem $$p : A \to \{0,1\}$$: $p~pam$

$$\beta^{pam}$$-reduction: If $$p : a \to B$$, rewrite $$p~pam$$ to $$p~x$$ such that $$p~x$$ is true.

### Satisfier operator

$$\beta^{sat}$$-reduction reduces $$sat~p$$ to any $$x$$ such that $$p~x$$ is true.

$sat~(\lambda(x,y). x \wedge y) \to (true,true) \\ sat~(\lambda(x,y). x \vee y) \to (true,false) \\ sat~(\lambda(x,y). x \vee y) \to (false,true) \\ sat~(\lambda x. x : \mathbb{R} \wedge x^2 - 1 = 0) \to 1 \\ sat~(\lambda x. x : \mathbb{R} \wedge x^2 - 1 = 0) \to -1 \newcommand\Nat{\mathsf{Nat}} \newcommand\Bit{\mathsf{Bit}}$

SAT-LEN (limited-length satisfiability): given $$n : \Nat$$ and $$f : \Bit^* \to \Bit$$, find $$x : \Bit^n$$ such that $$f~x = 1$$.

## The questions

Can nondetism asymptotically quicken a computation?

When can it?

Why/how can it?