Church

Church is a new language designed in the spirit of the Java platform; its aim is to copy the many good things that Java brought to the development community but to address Java's perceived failure to provide a ubiquitous model for computation on the internet.

Introduction

Despite Java's many extraordinary achievements in both the desktop and server-side domains, it has not had the success that many of us had hoped for in the way it is used on the internet.

The web has the wonderful property that one can exchange URLs knowing that what others see in their browsers will be what we see on ours. It is easy to forget what a remarkable achievement this was. To have a consistent presentation of web pages that was effective on different pieces of hardware, running different operating systems, running different browsers seems both an obvious and necessary first step to doing almost anything now. Unfortunately the necessity for doing this with computer programs does not seem to enjoy the same consensus.

Java had all of the difficult pieces of plumbing required to make this a reality built-in from the start. That said, neither Applets or JWS applications appear to have done enough to make client-side Java useful to the average internet user. Is a global deployment of such powerful technology possible? The success of Macromedia's Flash plug-in gives us hope that it is. Flash is fully capable of performing arbitrary computations and yet quickly became ubiquitous despite having to face the same technical and anthropological hurdles that appear to have stunted client-side Java.

To many of us, the anchors preventing Java's adoption as a general purpose computational engine for web clients seem frustratingly clear: the JRE is too big, it changes too much and it changes too often.

Java's SDK is currently released roughly once every 18 months. The first JRE weighed in at about 1Mb in 1996 - when internet connections provided an average bandwidth of about 4Kbytes/s - leaving a download of the 1Mb JRE to take about 40 minutes. Adoption of broadband services is raising the global average of network bandwidth, and today the average bandwidth is now at least ten times what it was ten years ago. Interestingly though, the download time for today's 15Mb JRE is roughly what it was in 1996. Nielson's 'law' claims that average bandwidth rises by about 50% a year and, if the size of the JRE continues to increase at a similar rate, half hour downloads of the Java VM will be a constant of its deployment model. This level of inconvenience limits the penetration of up-to-date VMs to machines that are owned by Java enthusiasts - and, if these quasi-laws hold in the future, things seem destined to stay this way.

So JREs seem destined to be different - can they be compatible?

While enormous effort is made by Sun and others to make their revisions backward-compatible, true backward-compatibility has never really been achieved between any two major revisions. Ten years on, it seems sensible to consider the possibility that this is not actually achievable for something as large as the JRE.

Can Java be fixed?

Yes! Java's VM has all the properties we want - being both small and stable enough. One proposal for fixing Java runs as follows: instead of bundling a useful set of libraries with the JRE, remove all the libraries and retain only the core functionality that is required to load code over the network and run it. Prototypical systems like this were developed by the author and others at Sun around 2000 but they did not get traction at Sun the time. Unfortunately, third parties that attempt to provide this infrastructure for Java would be in breach of Sun's distribution licence since it (quite reasonably) precludes the disaggregation of their distribution.

[Since writing the above paragraph I have been tipped off of a new effort at Sun to create a browser edition of the JRE. This is a good thing - though initial results seem to show that the 'kernel VM' which is now 2.5Mb - roughly 2.5 times larger than it was when we looked at it in 2000. (This is due to the independent expansion of the JRE - the techniques being used to produce the 'minimal VM seem similar to the ones we used before). And as one of the contributors to this blog points out, this is about 500 times the size of the 8Kb Fourth VMs of three decades ago. All this makes me tempted to propose my own law: that bits of software get 50% bigger every year that people fiddle with them.]

If we accept the lines of reasoning above it seems likely that Java's current deployment model will prevent the Java VM from becoming a ubiquitous computing engine for internet clients. Further, that any reasonable changes made by Sun, either to the VM or its deployment model, will have to balance new priorities with the need to support the millions of people who currently rely on Java in the areas where it is both effective and extremely successful. For all its disadvantages, a fresh start has one significant upside: that a new set of priorities may be pursued more freely in an environment that is not yet shackled by success.

Summary

Church's central aim then is to do for programs what the web has done for documents.

We summarise the high-level design goals of the Church VM as follows:
  1. Small and stable
  2. Fast
  3. Broad application domain
  4. Robust
  5. Easy to use

Contents

Features

Very small VM that doesn't need to be updated (often)

Size
Church provides a platform-neutral virtual machine that is very small (<100k); only those features that are required to load external resources, compile and run them are provided in the VM.

Like the Java VM there are 'class libraries' that provide support for common services and the manipulation of common data structures - but these are not distributed with the VM and are not even centrally managed. Anyone can write a List interface, XML reader or even GUI toolkit and seek to make it the standard. 'Standard' packages are the ones that most people use - the ones you are likely to find with a search engine.

Wikipedia is fast becoming a standard place to look for definitions of things, yet its position was not prescribed by those who created the infrastructure for the web. Instead it has earned its place on merit, by adopting an extremely effective method of construction and maintenance. We aim to copy this de-centralised model rather than the traditional model of platform evolution.
Stability
Robustness is dealt with in a different section, by stability we mean that we aim to create a VM specification that can quickly become final - I.e. reach a state where it doesn't need to be changed. Making the VM small, as lauded above, is probably the most important step in realising this goal but for the indispensable kernel of the VM we shoot for purity as the easiest way to avoid the ambiguity and arbitrariness that are the primary sources of specification churn.

Before virtual machines there were real machines, and before real machines there was a beautiful general theory of computation - called the Lambda Calculus. The Lambda Calculus, invented around 1936 by the American mathematician Alonzo Church, is widely considered to be one of the purest, simplest and most powerful systems of computation that have ever been created. Two reduction rules describe all of what may be computed using the Lambda Calculus and yet Church's Lambda Calculus can be shown to be a Turing-equivalent - a system that is capable of computing anything that can be computed.

For the author, the Lambda Calculus has an additional feather in its cap that is harder to define. Lambda Calculus provides a framework that promotes a concise representation of ideas - in a way that today's languages do not. The Object-Oriented paradigm, for example, is widely touted as providing abstractions that are not easy to produce in procedural languages like C. It took decades for procedural languages to acquire the features that embrace Object-Oriented development where, as we will see later, the Lambda Calculus could be argued to have been Object-Oriented from the start. The use of the term Object-Oriented is something of a stretch here, though, incredibly, it is the Lambda Calculus that is denigrated in this loose language as Lambda Calculus provides us with abstractions that are a strict superset of those that can be considered Object-Oriented. In Object-Oriented languages some ideas come up over and over again and yet there is no way to factor the repeated constructs into one place. Instead, ideas that cannot be formally abstracted in the O-O paradigm are baptised patterns and taught to developers who are then expected to in-line them into their software by hand. In Church we aim to stop this nonsense, by allowing deep abstractions to be written in the language and interpreted by the VM.

Performance identical to C

Second only to being easily moved around, performance of the VM is paramount. Performance is an entropy-like quantity that only ever gets worse as abstractions are laid on top of each other. Traditionally, fast languages with poor abstractions have won the day over more elegant languages that are slower and it is interesting to consider why this is.

Provided the poor abstractions support a Turing-equivalent model of computation, they can be used to implement better languages which inherit their speed. By contrast, it is not possible to write an interpreter or cross-compiler that will make a given task run efficiently on top of a language that is itself inefficient. The fast language can therefore usurp the role of the slow one, whose universal acceptance depends on the intractable problem of reaching agreement on what is elegant.

All Church implementations are compiled to object code that is as fast as object code derived from C.

Functions, not classes, are the atomic units of computation

By function we really mean closure: a function that may have captured state. Class systems, like Java's, which support the 'dynamic' invocation of methods may be implemented in Church but are not part of the specification of the VM. The dynamic invokation of methods - while providing an excellent model of abstraction - incurs a performance penalty which we do not wish to impose on those who do not use it.

Functions are converted to a canonical form in Church and so may be compared in the same way that ints can be compared in Java. When two functions are equal they return equal results when supplied with equal arguments - I.e. it is not possible to tell the difference between two equal functions by calling them.

Syntax

Church's syntax is supported by a BNF description which can be replaced. Different grammars may therefore be used to input programs to the VM in different syntaxes. We aim to produce a good syntax for Church programs that is widely used but the strategy for doing this is to use feedback to build a consensus on what is best, rather than making the compiler impenetrable to the syntax enthusiast. A function written in two different syntaxes will appear the same to the VM - so libraries written in different syntaxes will interoperate without any intervention. Like Java, we seek to define the VM in a way that is independent of the syntax of the languages it can support. As with Java, we expect very few people to care about the distinction between language and VM but for those who do; the rest of this section defines a syntax for a language that is also called Church.

Statements at the outer level of the Church interpreter are semicolon terminated.
Literals and arithmetic expressions
Conventional syntax is used for arithmetic expressions. Numeric literals use the same conventions as Java and the =, +, -, *, /, %, ^, &, and | infix binary operators have their usual precedences. Unary prefix operators: +, - and ~ are also supported. Destructive operators like: '++', '--' and '+=' are not supported. Parentheses may be used as normal. Double quotes are used to delimit strings and single quotes for characters. Each of the characters representing the binary operators above may also be used as literals - and have the values of the functions they represent. So (x + y) can also be written +(x, y). <PROBLEM1>
Functional application
A function is called in Church in the conventional way I.e.:

f(x, y);

[This replaces the convention used in Lambda Calculus where Curried functional application is implicit - I.e. Lambda Calculus syntax would use (f x y) - where the outer parentheses simply serve to associate this group of three elements together. If this expression were found inside another expression I.e (g (f x y) z) the parentheses make it clear that this expression means: g(f(x, y), z) and not g(f, x, y, z). We choose to copy the conventions used in most languages and use parentheses for both function application and association - not because this a good idea but because, in rare deference to pragmatism, very few people would read much further if we didn't.]
Functional abstraction (or definition)
Like anonymous inner classes in Java, functions can be anonymous in Church. To create an unnamed function in Church we use the infix -> operator which gathers the arguments of the function from its left hand side and the lexical body of the function from its right.

Example:

The identity function that returns its argument is written:

x -> x;

We can use the let construct to assign this value to a name:

let identity = (x -> x);

[The parentheses are not required in this example.]

Like Scheme's define macro, the let construct has smarts to allow a concise form for the common case of defining a named function. Arguments placed on the left hand side of the '=' symbol to implicitly define a function:

let identity(x) = x;

In Java this could be written:

static Object identity(Object x) { return x; }
Multiple arguments and Currying
Functions defined on multiple arguments are written in the usual way so:

let average(x, y) = (x + y) / 2;

which is the same as:

let average = (x, y) -> (x + y) / 2;

Both constructs define an average function that must take two arguments.

We may use the semicolon separator instead of a comma to define a second version of the average function in which the second argument is optional:

let average(x;  y) =  (x + y) / 2; 

this defines the Curried form of the function - a function that accepts one argument and returns another function that accepts the second. This is equivalent to: 

let average = x -> (y -> (x + y) / 2);

[The -> operator is right-associating so the outer parentheses above are not required.]

The Curried version allows us to supply the two arguments to the function in separate places. E.g.:

let z = average(3);

then:

z(5);

returns 4.

In Java, the Curried version could be defined using an anonymous inner class as follows:

interface Int_Int {
    int apply(int y);
}

static Int_Int average(final int x) {
    return new Int_Int() {
        int apply(int y) {
           return (x + y) / 2;
        }  
    }  
}

We could then assert

    average(3).apply(5) = 4;

Note that Java programs created from Church only ever need one method name; we use "apply" here. Because of this, method use is implicit. In addition, Curried function application syntax is optional so average(3)(5) is equivalent to average(3, 5). Both will be translated use the Curried version only when Currying is required. <PROBLEM?>

Multiple return values

Multiple return values have no special syntax or support in Church - but they have a convenient natural representation, as always, through Currying.

In Java, two possibilities exist for providing a type-safe mechanism to handle the idea of a function returning multiple values. Either the method can create a simple struct-like class for returning its values or it may accept another argument that is a function to be applied to the two values. The latter option is implemented in Java by creating an interface with a single method. In Java, both techniques create a syntactic distraction that can obscure the purpose of the method that uses them.

In Church, not only can both of the above techniques be written much more concisely - they turn out to be the same! 

Example:

The divide function takes two arguments and returns two results: the quotient and remainder. We will calculate each of these results using the / and % operators though native implementations could choose to do something more efficient.

let divide(x, y; f) = f(x / y, x % y);

We can use this function as follows:

divide(7, 2, +);

returns 4: the sum of the quotient, 3, and the remainder, 1.

Similarly, we can split up the lexical grouping of arguments to enable us to pass around the multiple values returned.

let quotientAndRemander = divide(7, 2);

So:

quotientAndRemander(+);

returns 4.

Referential transparency, thread safety and parallelism

All Church programs are referentially transparent; by which we mean that we may substitute the value of any lexically bound variable without changing the meaning of the program. Church does not have an assignment operator and does not provide any other functions that have side-effects. Because functions don't have side-effects, no special primitives or considerations are required to achieve thread safety in Church programs: all Church programs are thread-safe. For the same reason VMs are free to deploy multiple processors in the evaluation of functions without requiring that changes be made to the programs.
Dealing with side effects in the target platform
Both of Church's target platforms are imperative in nature and support constructs that have side effects. To allow Church programs to call methods in its target systems we must ensure that the process of evaluating expressions in the Church VM does not change the order that 'native' methods are called. To do this we need to impose some constraints on the way methods with side-effects are presented to Church programs.

Let us take Java's List interface which has an 'add' method that may be used to destructively append an element to the List.

public interface List {
    ...
    public void add(Object element);
    ...
}

In Church we present this method as a function that returns a new List containing the new element.

As if:

public interface List {
    ...
    public List add(Object element);
    ...
}

The purely functional view of this method would leave its original argument unchanged, where in reality the Java implementation will have modified it. To accommodate this we now leave the state of the original argument undefined and force correct Church programs to operate on the returned value. <PROBLEM2>

So to add two elements to a Java List in Church we must write:

add(add(list, e1), e2)

or, if the add operation were overloaded onto the infix '+' operator (which it can be), we could write:

list + e1 + e2;

The resulting Java code is emitted as follows:

static List add(List l, Object e) { l.add(e); return l; }

add(add(l, e1), e2);

The Church VM is now forced by the structure of the program to perform the operations in the order we stated them, allowing the functional language to be used to control imperative operations in the underlying system without ambiguity. However the Church VM chooses to perform its computations - either in multiple threads or different processes - it is forced to preserve the order in which side effects take place and thus preserve the intended behaviour of the underlying classes.

Currying support

Procedural languages, like C, have no implicit construct that provides the user with access to a garbage collected heap. Java presents such an interface via the keyword "new" which allocates instances of classes. Church does not have classes but provides access to heap-allocated state via the Currying of functions.

Interactive 'interpreter'

Calling the provision of an interactive shell an 'interpreter' is slightly misleading here as it implies that a slower evaluation mechanism will be deployed to evaluate expressions in this mode. In practice, we will compile each function on the fly so that functions defined and invoked in the interactive mode have the same performance as those defined in files.
System shells
Libraries to support file handling will be available to the interpreter so that it may be used as a system shell that can perform some of the roles of UNIX's sh, bash, csh, tcsh programs.
Calculator
Operations on integers are routed through to Java's BigInteger package via overloading - allowing numeric literals and normal infix operators to be used on infinite precision integers. This allows the shell to perform some of the functions of UNIX's bc (calculator) command.
Pattern matching
The Church VM contains a comprehensive library for handling both regular expressions (Chomsky Type-3 languages) and the more general context-free grammars that define typical computer languages (Chomsky Type-2 languages). This allows the shell to perform some of the roles of scripting languages like Perl, Python and Ruby.

Run-time compiler

In Java, a distinction is made between the JDK, which contains the compiler, and the JRE, which runs programs that were created with it. The Church VM puts both facilities in one environment, allowing the compiler to be used at run-time.

Syntactic analysis in Church is entirely defined by a formal (BNF-style) grammar. Its parser correctly handles all unambiguous context-free (Chomsky Type-2) grammars rather than the subsets (e.g. LL(K), LALR(2)) that are supported by conventional compiler compilers like ANTLR.

VM is a library rather than an executable

The Church VM is a library and so may be incorporated to existing Java systems. The C version (to appear) will be linkable with native C systems though will obviously lack the ability to make use of Java's libraries.

Implementation

Church can manipulate programs written with respect to external type systems that are referentially transparent (as Java's is). So the first implementation of the Church VM is written in Java and translates Church programs into Java sources. The Java sources are then compiled by a Java compiler (Janino) that is written in Java and can be executed at run-time. Going this route we can interoperate with raw Java code; calling Java code from Church and vice-versa. Later versions may translate directly to C - though the the C-based VM will need to include a C compiler, assembler, linker and garbage collector.

Types

Optional type declarations

Most Church functions are defined without type information and are considered 'generic'. To host our implementations on the Java platform, functions (methods in Java) must have their arguments and local variables typed as they are in Java. Explicitly-typed functions are derived from their generic form by a type inference phase that precedes execution.

Conceptually a function is considered to be defined on all types if none is specified. The outermost function call, the one that starts the execution of the VM, will receive parameters of a given type and these types are used to start a type inference computation that will derive the concrete types of all functions that may be called in the ensuing execution. These functions are made specific (I.e. given concrete types as arguments), emitted from the system to the underlying platform (Java) then compiled, loaded and linked. At the end of all of that the entry level function can be called - in the knowledge that all possible paths of execution have specific implementations that are already compiled.

Type inference

Church's type system is designed to bring as much of the feel of dynamic binding as possible to a statically typed environment. This requires a completely new approach - to avoid the crippling limitations of conventional static typing systems. Despite placing much more adventurous demands on the typing system, the definition of Church's type system is very simple to state: computations on types exactly mirror computations on values. Instead of juxtaposing type information with each data item in the heap we keep them separate and perform two calculations: the first on types - to produce type-safe programs, the second on values - to produce results.

To get all this off the ground we place one restriction on Church programs that is not present in dynamically bound systems: the return type of a function is a function of the types of its parameters. This sounds harmless enough but it prevents the following Java method being written in Church.

Object sqrt(float x) {
    if (x<0) {
        return new Complex(0, Math.sqrt(x));
    }
    else {
         return new Float(Math.sqrt(x));
    }
}

The class of the return value of this sqrt implementation is a function of the value of its argument rather than the type of its argument - and this is not supported in Church.

Since a Church implementation cannot throw errors, a sqrt implementation written in Church would have to either return the more general 'Complex' type in all cases or return the magnitude of its potentially complex result. [In pure Church - I.e. an implementation that creates its own representation of numbers - signed integers happen to be distinguished from non-negative integers by type so the problem highlighted in this particular example doesn't apply. Other examples can nevertheless be constructed to demonstrate the same restriction over pure Lambda Calculus. ]

No subclassing or type hierarchy

Church's type system is functional rather than hierarchical. When a new type is generated, the synthesised type is a function of types that returns a type. So synthesised types are subject to application and abstraction (with Currying) - just like values. Since the manipulation of types mirrors the manipulation of values, all type manipulation is implicit/automatic in Church. Explicit references to types are only required in the specification of the formal parameters to an overloaded function.

The types of the underlying platform, both primitive and reference, may be used in Church programs as base types that are treated as symbols or constants. A synthesised type may be given an auto-generated name and emitted to the underlying platform to define the intermediate results of Curried functions - but no subclassing primitives are provided by the Church language itself.

Static typing

A type system of some kind is required if we wish to use a single operator, '-' say, to represent the negation of an integer (int) and floating point number (float). In this example, the 32 bits that make up an int or a float can only be distinguished by a separate piece of information that indicates how the bits should be manipulated to produce its additive inverse. Dynamic typing requires that the object itself carries with it a reference to the type of number that the collection of bits represents. The overhead for dynamic typing on arithmetic operations is large and this has historically lead to pure dynamically typed languages, like Lisp and SmallTalk, being abandoned for applications that perform number crunching.

Consider a code fragment in which a variable is subject to a negation operation in the body of a loop. It does indeed seem wasteful to copy the type information that could be associated with the variable into each of the potentially enormous number of values the variable may take during the execution of the program.

Java takes a typically pragmatic view here and introduces primitives to provide a compromise that works well for the vast majority of typical use cases. Java's primitives allow lexical blocks of Java to be compiled into code whose efficiency is the same as that of C. It could be argued that this single step widened the potential audience of Java to a level many orders of magnitude beyond that of the languages that embraced the 'everything is an object' paradigm.
On the other hand, a few years of programming in Java are all that are needed to acquire a passionate resentment of Java's primitives for the plethora of irritating anomalies they bring to the rest of the language and its libraries. To retain the lofty goal of providing abstractions that are effective to as many communities as possible, including scientific computing and low-level systemic work, we simply cannot abandon primitives and therefore choose to throw out all those parts of the Java language that are inconsistent with them. We know how we want the rest of the language to work - as Java does - but we are forced to abandon dynamic typing as an implementation method.

Worst-case costs of dynamic binding

Consider the LISP's primitive data structure, cons, that takes two objects and returns a cell that points to them both. We will model this first in Java, with a class called Pair which has instance variables first and rest. The second variable is called rest (instead of second) to reflect typical usage where arbitrary length Lists are stored by placing the first element in the first slot and all the rest of the elements in another Pair pointed to by rest.

class Pair {
    Object first;
    Object rest;

    Pair(Object first, Object rest) {
       this.first = first;
       this.rest = rest;
    }
}

The absence of useful type information in the class above forces coercion to be used by algorithms that make use of it. In turn, this forces dynamic binding on us as an implementation method. Although flexible enough to serve as a general data structure, the performance penalty for using this structure to store, say a list of characters, is very large. On a typical 32 bit implementation each character takes about five words - three for each Pair, and two for each Character. So a String stored this way would take about ten times as much space as an ordinary Java String. While a precise value for this performance degradation is dependent on VM specifics the problem is not due to Java itself but is fundamental to dynamically bound systems in general - all of which suffer some space penalty to bring consistency to their treatment of objects.

Part of the problem with this class is caused by the fact that the Pair object is not able to hold type information itself and therefore has to force type information into objects that are too small to hold it efficiently. In Java 1.5 we can use generics to introduce useful type information to this class as follows:

class Pair<A, B> {
    A first;
    B rest;

    Pair(A first, B rest) {
       this.first = first;
       this.rest = rest;
    }
}

At this point (and for many good reasons) Java chooses to force the type parameters, A and B, to be reference types - I.e. they can refer to the Character wrapper class but not the char primitive type. So, while Java's generics system serves us well in allowing the removal of many error prone coercion operations, we need to extend it to provide the option of generating specific implementations for handling primitives so as to remove the run-time performance hit associated with autoboxing.

Currying

Currying of values
In Church we can use Currying to produce an analogue of Lisp's cons primitive and Java's constructors.

let pair(first, rest; f) = f(first, rest);

Here the function f is an accessor function that returns one of the two 'instance variables' that are bound to the closure.

So the first accessor is:

let first(x, y) = x;

and the rest accessor is:

let rest(x, y) = y;

The following fragment shows a construction and accessor operation in action:

pair(1, 2, first);

returns 1.

Currying introduces the idea that the third argument may not be provided in the same lexical context as the first two. The use of the semicolon, instead of a comma, in the specification of the formal parameters of the pair function allows it to be called with either two arguments or three. We can now create the data structure in one place and retrieve its contents somewhere else:

let p1 = pair(1, 2);

then later:

p1(first) or p1((x, y) -> x)

returns 1.
Currying of types
Suppose that our underlying system has an int type defined as in Java's VM specification.

In the example above, the return type of the pair function is a synthesised type. We don't need to write anything to get Church to create this type but, underneath the bonnet, the following concrete types will be emitted to the underlying Java platform. At the point we see the lexical construct, pair(1, 2), the Church compiler can deduce that we need an implementation of pair that can be applied to two int arguments. In Java we need a new class and a new interface to support the ideas of variable capture and reference respectively. The new class effects effectively copies state from the stack to the heap. The new interface implements the inverse of this operation - copying information from the heap to the stack.

public class IntInt {
    private int first;
    private int rest;

    public IntInt(int first, int rest) {
       this.first = first;
       this.rest = rest;
    }

    public int apply(Accessor a) {
        return a.apply(this.first, this.rest);  
    }

    public interface Accessor {
       int apply(int first, int rest);
    }
}
 
The accessors, first and rest, given above would be translated to Java as follows:

static IntInt.Accessor first = new IntInt.Accessor() { apply(int first, int rest) { return first; }

static IntInt.Accessor rest = new IntInt.Accessor() { apply(int first, int rest) { return rest; }

To summarise, access to the heap is provided by Currying in Church. In all cases the Church VM emits type-safe code to the underlying Java VM so that the Java VM can use its own type system to optimise the implementations. Type information is gathered together from instances and factored into larger synthetic types that are associated with programs. The result is that Church's 'Objects' do not need class pointers and may be handled the same way as primitives.

All 'primitive' data types external to the VM

The fundamental data type of the Church VM is not the bit, the int, the word or the class but the function. It is difficult to imagine a VM without ints - how, for example can I run a loop without a variable to increment? In one of the use cases in a later section we will see that we can implement integers from scratch. Programs may therefore make use of the fundamental properties of integers without relying on an implementation method.

It is a common misconception that processors spend most of their time performing arithmetic operations, like addition, bit-wise and, etc., on collections of bits. While the mechanics of processors are indeed binary, this view of computation is quite misleading. About a quarter of the byte codes available to Java's interpreter are associated with arithmetic operations. During the execution of a typical Java program these arithmetic operations, in total, comprise less than 10% of the instructions invoked by the processor. So what does the other 90% of CPU time get spent on? - setting up stack frames, copying pointers and the like: things that have far more to do with the evaluation of functions than arithmetic.

Jettisoning the primitive types from the VM specification is actually very similar to the method adopted by the Java VM for operators that have hardware support but didn't get a byte-code: they are provided as 'native' methods in special class files. The sine function, for example, is provided as a static method in the Math class with a native implementation. Java's JIT compiler in-lines these methods and emits code that makes use of appropriate hardware support so very little seems to be lost in pushing their definition out of the VM.

Ease of use

Unification of object and reference equality

Because Church itself does not permit side effects, two objects that are equal will always be equal. We can therefore unify object equality and, since there is no naked assignment operator, use the '=' operator to represent it. As with most Java implementations of equals, a check for pointer equality may be made by the VM to improve performance.

Obviate the use of StringBuffers in toString() methods.

Efficient toString() methods often require the temporary allocation of a StringBuffer to reduce the number of intermediate String objects that are created during its execution. In Church, the Object::toString() method is replaced with an interface (given here in Java):

public interface Writable {
    public Stream<Character> write(Stream<Character> s);
}

Where Stream is a simple interface that can be implemented by Files, Strings and Buffers. The object that implements the Stream interface may be arbitrarily complicated and is a good place to put other stream related state such as the IdentityHashMap that is often required to handle cyclic data structures. The write method may also be overloaded to have different implementations on different types of stream: allowing an object to write itself in a variety of formats.

Implicit package names

There is no equivalent of Java's  package directive in Church. Package names are implied by the names of the directories and they are found in. To change the name of a package, simply rename the directory that contains the source files. To move a package to a different part of the package hierarchy, move the directory.

No class loaders

Any source of characters may be converted into an executable function at runtime (provided it can be compiled).

Hide class/object files

Church programs and libraries are distributed in source form. Local copies of external resources may be compiled into machine dependent formats and/or cached locally but this process is made transparent to the developer.

Zip/Jar files or some other mechanism to download sets of related files at once are needed to mitigate the latency involved in opening remote network connections. This happens automatically and local caches of 'jar' files are kept up-to-date by checking the originating site for new versions.

The enforced distribution of sources has a number of legal implications for big systems but, as even the least hirsute Java professional will know, the transformation of Java sources to class files is easy to reverse. A source code obfuscator can therefore be considered to provide protection that is fundamentally equivalent to that provided by byte-code obfuscators.

No classpaths!

References to external code are made by deriving full URLs from import statements and named references in the code that depends on them. Local copies may be made for performance reasons but they are kept up to date automatically so this process is transparent.

Church's 'jar' files have to have the name of the directory they represented when they were created. This allows their contents to be accessed automatically.

Versioning

All providers of external resources are obliged to maintain all previous versions of their public releases, indexed by date.

Automatic versioned persistence for all 'objects'

All state (in the form of closures) may be written to a file for evaluation by a VM invoked at a later date. If the persistent form includes references to external resources, they are accessed with a qualified name/date pair so that the state will be recreated exactly as it was in the original image. Incompatibilities may arise between this old structure and current versions of functions that only apply to new data structures. The type system will normally report these problems but their solution requires manual intervention. <Get some views on this - would it be better to in-line all external resources to remove the dependencies and make them free-standing?>

Absence of security directives

A Church program operates against a VM whose capabilities are defined when it is launched. The VM has a number of switches that can be used to permit access to certain OS level functions such as file system and network access. All programs inherit the capabilities of the VM and no control or manipulation of those capabilities is available to a Church program.

Robustness

No null reference

Church's closures are very much like objects - except that closures must have all their state given to them at creation time. This obviates the need for a universal null reference - the source of many trivial but annoying bugs in systems implemented in Java.

[All Object-Oriented systems need to have the idea of null - to represent uninitialised parts of an object during initialisation.

Example:

Given an class with a single instance variable and accessor methods it is possible to create a pair of objects o1 and o2 which 'point to' each other. At some stage in the creation of this circular data structure it must be the case that a valid program has access to the uninitialised state of one of the objects. Any (erroneous) attempt to execute a method on this vacant part of the object is open to ambiguity and in Java results in the raising of an exception.

Interestingly, Objective-C makes a different choice here - simply defining all messages sent to null to return null. I have been tracking this in my Java coding and noticed the last half dozen or so NullPointerExceptions I have fixed in my Java programs would all have worked correctly in Objective-C I.e. my fixes all happened to implement the general rule used by Objective-C.]

Programs cannot fail or throw exceptions

It is not possible to write a program that will throw an internal exception in Church and Church does not provide mechanisms to catch exceptions thrown by foreign functions. The type system prevents all inappropriate applications of functions to arguments, so everything that gets past the type system can be executed. All programs therefore proceed until one of the following conditions applies:
These limits are normally set by the user (or process) that launches the VM. They may be monitored by Church programs but the systemic exceptions that arise when the limits are exceeded unconditionally terminate the VM.

Use cases

We will try to use cases from as disparate and testing a set of communities as possible - not because they are representative but because the union of their requirements will give us the greatest possible coverage. If the language can be made to perform adequately in all of these environments we can be optimistic it will perform well for the vast majority of more typical programming tasks.

Arithmetic fundamentals

Church numerals, are an example of an even deeper fundamental property of objects that are supported by the Church VM. In general, we never ask what something is, instead we only ever ask what it does. A function is completely defined by the effect it has on its arguments - and numbers are no exception. A Church Numeral defines the counting numbers, 0, 1, 2, ... as functions of two arguments, f and x say. 0 is the function of f and x that applies f zero times to x and returns the result - I.e. it returns x. 1 is the function of f and x that applies f once, as in f(x), and 2 is the function of f and x that applies f twice to x to return f(f(x)). Given these definitions of the counting numbers we can define the Church numerals and their arithmetic operators as follows (this example uses the +, * and ^ operators in their explicit prefix form to avoid confusion when they are Curried):

let 0(f, x) = x;
let 1(f, x) = f(x);
let 2(f, x) = f(f(x));
etc.

let +(a, b; f; x) = a(f, b(f, x)); 
let *(a, b; f) = a(b(f));
let ^(a, b) = b(a);

assert *(3, 7) = 21;

Scientific computing

A rare exception to Java's impressive acceptance story has been the scientific computing community who have remained resistant to using Java - primarily because of performance concerns. Below is a  Fast-Fourier Transform routine written in FORTRAN:

<FFT>
<Use Gaussian elimination as an example instead?>.

We aim to split this piece of code into its algorithmic and arithmetic parts, so that the Church version will use general routines for performing complex arithmetic and leave the algorithmic part in its simpler and more general form. The Church compiler should be able to in-line the functions that define complex arithmetic and essentially recreate the FORTRAN style program above. The result should be a presentation of the algorithm that is concise and general and yet as fast as the more specific FORTRAN routine above.

Symbolic/Algebraic Computing

Symbolic algebra systems, like Mathematica, typically use special data structures to represent polynomials and other mathematical formulae. Rule-based reduction systems are then written in some meta-language that operates on these data structures. Often, facilities are provided to take the results of symbolic computations and write them to files as FORTRAN or C functions that can be compiled and used independently. We would like to integrate all this, so that rule-based algorithms can be applied to ordinary programs instead of data.

Show that the basic arithmetic operations can be implemented for a new univariate polynomial type and that a canonical form can be maintained. Show that (x + 1) ^2 = x^2 + 2*x + 1;

Rule-based programming

Some problems, like the differentiation operator in Calculus, have no general solution that can be expressed as an ordinary function of a functional input. The symbolic result can only be implemented as a series of rules that will differentiate a subset of all possible functions and return the unanswerable parts of the problem unchanged.

Of course, a general procedure does exist for calculating numerical approximations to the derivative of functions over an approximate domain such as Java's double. In Church, such a function could be defined as:

let differentiate(f, x) = (f(x+0.01) - f(x))/0.01;

But the symbolic version of differentiate has many advantages over this numerical approach. Firstly, the accuracy of the numerical result is dependent the 'delta' value (0.01) - and good values are domain specific. Secondly, in the case of, say, the function x -> 2*x, the symbolic result x -> 2 is much more efficient as an implementation than its partially-evaluated numerical equivalent: x -> (2*(x+0.01) - 2*x)/0.01.

The use-case here then will be to provide a handful of rules that define a symbolic differentiation operator, diff, that can be applied to functions. So if: 

let f(x) = (x+1)^2;

diff(f);

returns the unnamed function:

x  -> 2 * (x + 1);

diff^2(f); 

calculates 2(diff, f) or diff(diff(f)), the second derivative, to return:

x -> 2;

Note that this is not equal to the number 2. This is a 'constant' function that accepts an argument and returns 2.

Database access

Log into a remote database and search for a record satisfying given criteria.

Cryptography

Implement the RSA encryption algorithm, compare speed with C.

Number crunching

 Implement one of the factorisation algorithms, compare speed with C.

GUI programming

Put up a GUI to accept two numbers and present their product when a 'calculate' button is pressed.

Animation

Pop up a window that presents a synthesised animation, suitable for users of recreational drugs.

Graphics

Demonstrate a simple ray-tracing algorithm in Church to render a three dimensional (CSG) model by recursive subdivision. Demonstrate the performance improvement of replacing numerical differentiation methods with symbolic differentiation as defined above.