Church
Church is a new language designed in the spirit of the Java platform;
its aim is to copy the many good things that Java brought to the
development community but to address Java's perceived failure to
provide a ubiquitous model for computation on the internet.
Introduction
Despite Java's many extraordinary achievements in both the desktop and
server-side domains, it has not had the success that many of us had
hoped for in the way it is used on the internet.
The web has the wonderful property that one can exchange URLs knowing
that what others see in their browsers will be what we
see on ours. It is easy to forget what a remarkable achievement this
was. To have a consistent presentation of web pages that was effective
on different pieces of hardware, running different operating systems,
running different browsers seems both an obvious and necessary first
step to doing almost anything now. Unfortunately the necessity for
doing this with computer programs does not seem to enjoy the same
consensus.
Java had all of the difficult pieces of plumbing required to make this
a reality built-in from the start. That said, neither Applets or JWS applications
appear to have done enough to make client-side Java useful to the
average internet user. Is a global
deployment of such powerful technology possible? The success of
Macromedia's
Flash plug-in gives us hope that it is. Flash is fully capable of
performing arbitrary computations and yet quickly became ubiquitous
despite having to face the same technical and anthropological hurdles that appear to have stunted client-side
Java.
To many of us, the anchors preventing Java's adoption as a general
purpose computational engine for web clients seem frustratingly clear:
the
JRE is too big, it changes too much and it changes too often.
Java's SDK is
currently released roughly once every 18 months. The first JRE weighed
in at about 1Mb in 1996 - when internet connections provided an
average bandwidth of about 4Kbytes/s - leaving a download of the 1Mb
JRE to
take about 40 minutes. Adoption of broadband services is raising the
global average of network bandwidth, and today the average bandwidth is
now at
least ten times what it was ten years ago. Interestingly though, the
download time for today's 15Mb JRE is roughly what it was in 1996.
Nielson's 'law'
claims that average bandwidth rises by about 50% a year
and, if the size of the JRE continues to increase at a similar rate,
half hour
downloads of the Java VM will be a constant of its deployment model.
This
level of inconvenience limits the penetration of up-to-date VMs to
machines that are owned by Java enthusiasts - and, if these quasi-laws
hold in the future, things seem destined to stay this way.
So JREs seem destined to be different - can they be compatible?
While enormous effort is made by Sun and others to make their revisions
backward-compatible, true backward-compatibility has never
really been achieved between any two major revisions. Ten years on, it
seems
sensible to consider the possibility
that this is not actually achievable for something as large as the JRE.
Can Java be fixed?
Yes! Java's VM has all the properties we want - being both small and
stable enough. One proposal for fixing Java runs as follows: instead of
bundling a useful set of
libraries with the JRE, remove all the libraries and retain only the
core functionality that is required to load code over the network and
run it. Prototypical systems like this were developed by the author and
others at Sun around 2000 but they did not get traction at Sun the
time. Unfortunately, third parties that attempt to provide this
infrastructure for
Java would be in breach of Sun's distribution licence since it (quite
reasonably) precludes the disaggregation of their distribution.
[Since writing the above paragraph I have been tipped off of a new
effort at Sun to create a browser
edition of the JRE. This is a good thing - though initial results
seem to show that the 'kernel VM' which is now 2.5Mb - roughly 2.5
times
larger than it was when we looked at it in 2000. (This is due to the
independent expansion of the JRE - the techniques being used to produce
the 'minimal VM seem similar to the ones we used before). And as one of
the
contributors to this blog points out, this is about 500 times the size
of the 8Kb Fourth VMs of three decades ago. All this makes me tempted
to propose my own law: that bits of software get 50% bigger every year
that
people fiddle with them.]
If we accept the lines of reasoning above it seems likely that Java's
current deployment model will prevent the Java VM from becoming a
ubiquitous computing engine for internet clients. Further, that any
reasonable changes made by Sun, either to the VM or its deployment
model, will have to balance new priorities with the need to support the
millions of people who currently rely
on
Java in the areas where it is both effective and extremely
successful. For all its disadvantages, a fresh start has one
significant upside: that a new set of priorities may be pursued more
freely in an environment that is not yet shackled by success.
Summary
Church's central aim then is to do for programs what the web has done
for
documents.
We summarise the high-level design goals of the Church VM as follows:
- Small and stable
- Fast
- Broad application domain
- Robust
- Easy to use
Contents
Features
Very small VM that
doesn't need to be updated (often)
Size
Church provides a platform-neutral virtual machine that is very small
(<100k); only those features that are required to load external
resources, compile and run them are provided in the VM.
Like the Java VM there are 'class libraries' that provide support for
common services and the
manipulation of common data structures - but these are not distributed
with the VM and are not even centrally managed. Anyone can write a List
interface, XML reader or even GUI toolkit and seek to make it the
standard. 'Standard'
packages are the ones that most
people use - the ones you are likely to find with a search engine.
Wikipedia is fast becoming a standard place to look
for definitions of things, yet its position was not prescribed by those
who created the infrastructure for the
web. Instead it has earned its place on merit, by adopting an extremely
effective
method of construction and maintenance. We aim to copy this
de-centralised model
rather than the traditional model of platform evolution.
Stability
Robustness is dealt with in a
different section, by stability we
mean that we aim to create a VM
specification that can quickly become final - I.e. reach a state where
it doesn't need to be changed. Making the VM small, as lauded above, is
probably the most important step in realising this goal but for the
indispensable kernel of the VM we shoot for purity as
the easiest way to avoid the ambiguity and arbitrariness that are the
primary sources of specification churn.
Before virtual machines there were real machines, and before real
machines there was a beautiful general theory of computation - called
the Lambda
Calculus. The Lambda Calculus, invented around 1936 by the
American mathematician Alonzo Church,
is widely considered to be one of the purest, simplest and most
powerful systems of computation that have ever been created. Two reduction rules describe all of
what may be computed using the Lambda
Calculus and yet Church's Lambda Calculus can be shown to be a
Turing-equivalent - a system that is capable of computing anything that
can be computed.
For the author, the Lambda Calculus has an additional feather in its
cap that is harder to define. Lambda Calculus provides a framework that
promotes a concise
representation of
ideas - in a way that today's languages do not. The Object-Oriented
paradigm, for example, is widely touted as providing abstractions that
are not easy to produce in procedural languages like C. It took decades
for procedural languages to acquire the features that embrace
Object-Oriented development where, as we will see later, the Lambda
Calculus
could be argued to have been Object-Oriented from the start. The use of
the term Object-Oriented is something of a stretch here,
though, incredibly, it is the Lambda Calculus that is denigrated in
this
loose language as Lambda Calculus provides us with abstractions
that are a strict superset of those that can be considered
Object-Oriented. In Object-Oriented languages some ideas come up over
and over again and yet there is no way to factor the repeated
constructs into one place. Instead, ideas that cannot be formally
abstracted in the O-O paradigm are baptised patterns and taught to developers
who are then expected to in-line them into their software by hand. In
Church we aim to stop this nonsense, by allowing deep abstractions to
be written in the language and interpreted by the VM.
Performance identical
to C
Second only to being easily moved around, performance of the VM is
paramount. Performance is an entropy-like quantity that only ever gets
worse as abstractions are laid on top of each other. Traditionally,
fast
languages with poor abstractions have won the day over more elegant
languages that are slower and it is interesting to consider why this
is.
Provided the poor abstractions support a Turing-equivalent model of
computation, they can be used to implement better languages which
inherit their speed. By contrast, it is not possible to write an
interpreter or cross-compiler that will make a given task run
efficiently
on top of a language that is itself inefficient. The fast language can
therefore usurp the role of the slow one, whose universal acceptance
depends on the intractable problem of reaching agreement on what is
elegant.
All Church implementations are compiled to object code that is as fast
as
object
code derived from C.
Functions, not
classes, are the atomic units of computation
By function we really mean closure: a function that may
have captured state. Class systems, like Java's, which support the
'dynamic'
invocation of methods may be implemented in Church but are
not part of the specification of the VM. The dynamic invokation of
methods - while providing an excellent model of abstraction - incurs a
performance penalty which we do not wish to impose on those who do not
use it.
Functions are converted to a canonical form in Church and so may be
compared in the same way that ints
can be compared in Java. When two functions are equal they return equal
results when supplied with equal arguments - I.e. it is not possible to
tell the difference between two equal functions by calling them.
Syntax
Church's syntax is supported by a BNF description which can be
replaced. Different grammars may therefore be used to input programs to
the VM in different syntaxes. We aim to produce a good syntax for
Church programs that is
widely used but the strategy for doing this is to use feedback to build
a consensus on what is best, rather than making the compiler
impenetrable to the syntax enthusiast. A function written in two
different syntaxes will appear the
same to the VM - so libraries written in different syntaxes will
interoperate without any intervention. Like Java, we seek to define the
VM in a way
that is independent of the syntax of the languages it can support. As
with Java, we expect very few people to care about the distinction
between
language and VM but for those who do; the rest of this section defines
a syntax for a language that is also called Church.
Statements at the outer level of the Church interpreter are semicolon
terminated.
Literals and
arithmetic expressions
Conventional syntax is used for arithmetic expressions. Numeric
literals use the same conventions as Java and the =, +, -, *,
/, %, ^, &, and | infix binary operators have their usual
precedences. Unary prefix operators: +, - and ~ are also supported.
Destructive operators like: '++', '--' and '+=' are not supported.
Parentheses may be used as normal. Double quotes are used to delimit
strings and single quotes for characters. Each of the characters
representing the binary operators above may also be used as literals -
and have the values of the functions they represent. So (x + y) can
also be written +(x, y). <PROBLEM1>
Functional application
A function is called in Church in the conventional way I.e.:
f(x, y);
[This replaces the convention used in Lambda Calculus where Curried
functional application is implicit - I.e. Lambda Calculus syntax would
use (f x y) - where the outer parentheses simply serve to associate
this
group of three elements together. If this expression were found inside
another expression I.e (g (f x y) z) the parentheses make it clear that
this expression means: g(f(x, y), z) and not g(f, x, y, z). We choose
to copy the conventions used in most languages and use parentheses for
both function application and association - not because this a good
idea
but because, in rare deference to pragmatism, very few people would
read much further if we didn't.]
Functional abstraction
(or definition)
Like anonymous inner
classes in Java, functions can be anonymous in Church. To create an
unnamed
function in Church we use the infix -> operator which gathers the
arguments of the function from its left hand side and the lexical body
of the function from its right.
Example:
The identity function that returns its argument is
written:
x -> x;
We can use the let construct
to
assign this value to a name:
let identity = (x -> x);
[The
parentheses are not required in this example.]
Like Scheme's
define macro, the let construct
has smarts to allow a
concise form for the common case of defining a named function.
Arguments placed on the left hand side of the '=' symbol to implicitly
define a function:
let identity(x) = x;
In Java this could be written:
static Object identity(Object x) { return x; }
Multiple arguments
and Currying
Functions defined on multiple arguments are written in the usual way
so:
let average(x, y) = (x + y) / 2;
which is the same as:
let average = (x, y) -> (x + y) / 2;
Both constructs define an average
function that must take two arguments.
We may use the semicolon separator instead of a comma to define a
second version of the
average function in which the second argument is optional:
let average(x; y) = (x + y) / 2;
this defines the Curried
form of the function - a function that accepts
one argument and returns another function that accepts the second. This
is equivalent to:
let average = x -> (y -> (x + y) / 2);
[The -> operator is right-associating so the outer parentheses above
are
not required.]
The Curried version allows us to supply the two arguments to the
function in separate places. E.g.:
let z = average(3);
then:
z(5);
returns 4.
In Java, the Curried version could be defined using an anonymous inner
class as follows:
interface Int_Int {
int apply(int y);
}
static Int_Int average(final int x) {
return new Int_Int() {
int apply(int y) {
return (x + y) / 2;
}
}
}
We could then assert
average(3).apply(5) = 4;
Note that Java programs created from Church only ever need one method
name; we use "apply" here. Because of this, method use is implicit. In
addition, Curried function application syntax is optional so
average(3)(5) is equivalent to average(3, 5). Both will be translated
use the Curried version only when Currying is required. <PROBLEM?>
Multiple return values
Multiple return values have no special syntax or support in Church -
but they have a convenient natural representation, as always, through
Currying.
In Java, two possibilities exist for providing a type-safe mechanism to
handle the idea of a function returning multiple values. Either the
method can create a simple struct-like class for returning its values
or it may accept another argument that is a function to be applied to
the
two values. The latter option is implemented
in Java by creating an interface with a single method. In Java, both
techniques create a syntactic distraction that can obscure the purpose
of the method that uses them.
In Church, not only can both of the above techniques be written much
more concisely - they turn out to be the same!
Example:
The divide function takes two
arguments and returns two results: the quotient and remainder. We will
calculate each of these results using the / and % operators though
native implementations could choose to do something more efficient.
let divide(x, y; f) = f(x / y, x % y);
We can use this function as follows:
divide(7, 2, +);
returns 4: the sum of the quotient, 3, and the
remainder, 1.
Similarly, we can split up the lexical grouping of arguments to enable
us to pass around the multiple values
returned.
let quotientAndRemander = divide(7, 2);
So:
quotientAndRemander(+);
returns 4.
Referential
transparency, thread safety and parallelism
All Church programs are referentially transparent; by which we mean
that we may substitute the value of any lexically bound variable
without changing the
meaning of the program. Church does not have an assignment operator and
does not provide any other functions that have side-effects. Because
functions don't have side-effects, no special primitives or
considerations are required to achieve thread safety in Church
programs: all Church programs are thread-safe. For the same reason VMs
are free to deploy multiple processors in the evaluation of functions
without requiring that changes be made to the programs.
Dealing with side
effects in the target platform
Both of Church's target platforms are imperative in nature and support
constructs that have side effects. To allow Church programs to call
methods in its target systems we must ensure that the process of
evaluating expressions in the Church VM does not change the order that
'native' methods are called. To do this we need to impose some
constraints on the way methods with side-effects are presented to
Church programs.
Let us take Java's List interface which has an 'add' method that may be
used to destructively append an element to the List.
public interface List {
...
public void add(Object element);
...
}
In Church we present this method as a function that returns a new List
containing the new element.
As if:
public interface List {
...
public List add(Object element);
...
}
The purely functional view of this method would leave its original
argument
unchanged, where in reality the Java implementation will have modified
it. To accommodate this we now leave the
state of the original argument undefined and force correct Church
programs to operate on the returned value. <PROBLEM2>
So to add two elements to a Java List in Church we must write:
add(add(list, e1), e2)
or, if the add operation were overloaded onto the infix '+' operator
(which it can be), we could write:
list + e1 + e2;
The resulting Java code is emitted as follows:
static List add(List l, Object e) { l.add(e); return l; }
add(add(l, e1), e2);
The Church VM is now forced by the structure of the program to
perform the operations in the order we stated them, allowing the
functional language to be used to control imperative operations in the
underlying system without ambiguity. However the Church VM chooses to
perform its computations - either in multiple threads or different
processes - it is forced to preserve the order in which side effects
take place and thus preserve the intended behaviour of the underlying
classes.
Currying support
Procedural languages, like C, have
no implicit construct that provides the user with access to a garbage
collected heap. Java presents such an interface via the keyword "new"
which allocates instances of classes. Church does not have classes but
provides access to heap-allocated state via the Currying of functions.
Interactive
'interpreter'
Calling the provision of an interactive shell an 'interpreter' is
slightly misleading here as it implies that a slower evaluation
mechanism will be deployed to evaluate expressions in this mode. In
practice, we
will compile each function on the fly so that functions defined and
invoked in the interactive mode have the same performance as those
defined in files.
System shells
Libraries to support file handling will be available to the interpreter
so that it may be used as a system shell that can perform some of the
roles of
UNIX's sh, bash, csh, tcsh programs.
Calculator
Operations on integers are routed through to Java's BigInteger package
via overloading - allowing numeric literals and normal infix operators
to be used on infinite precision integers. This allows the shell
to perform some of the functions of UNIX's bc (calculator) command.
Pattern matching
The Church VM contains a comprehensive library for handling both
regular
expressions (Chomsky
Type-3 languages) and the more general context-free grammars that
define typical computer languages (Chomsky
Type-2 languages). This
allows the shell to perform some of the roles of scripting languages
like Perl, Python and Ruby.
Run-time compiler
In Java, a distinction is made between the JDK, which contains the
compiler, and the JRE, which runs programs that were created with it.
The Church VM puts both facilities in one environment, allowing the
compiler to be used at run-time.
Syntactic analysis in Church is entirely
defined by a formal (BNF-style)
grammar. Its parser correctly handles all unambiguous context-free (Chomsky
Type-2) grammars
rather than the subsets (e.g. LL(K), LALR(2)) that are supported
by
conventional compiler compilers like ANTLR.
VM is a library
rather than an executable
The Church VM is a library and so may be incorporated to existing Java
systems. The C version (to appear) will be linkable with native C
systems though will obviously lack the ability to make use of Java's
libraries.
Implementation
Church can manipulate programs written with respect to external type
systems that are referentially
transparent (as Java's is). So the first implementation of the
Church VM is written in Java and translates Church programs into Java
sources. The Java sources are then compiled by a Java compiler
(Janino) that is written in Java
and can be executed at
run-time. Going this route we can interoperate with raw Java code;
calling Java code from Church and vice-versa. Later versions may
translate directly to C - though the the C-based VM will need to
include a C
compiler, assembler, linker and garbage collector.
Types
Optional type
declarations
Most Church functions are defined without type information and are
considered 'generic'. To host
our implementations on the Java platform, functions (methods in Java)
must have their
arguments and local variables typed as they are in Java.
Explicitly-typed functions are derived from their generic form by a
type inference phase that precedes execution.
Conceptually a function
is considered to be defined on all types if none is specified. The
outermost function call, the one that starts the execution of the VM,
will receive parameters of a given type and these types are used to
start a type inference
computation that will derive
the concrete types of all functions that may be called in the ensuing
execution. These functions are made specific (I.e. given concrete types
as arguments), emitted from the system to the underlying platform
(Java) then compiled, loaded and linked. At the end of all of that the
entry level function can be called - in the knowledge that all possible
paths of execution have specific implementations that are already
compiled.
Type inference
Church's type system is designed to bring as much of the feel of
dynamic binding as possible to a statically typed environment. This
requires a completely new approach - to avoid the crippling limitations
of conventional static typing systems. Despite placing much more
adventurous demands on the typing system, the definition of Church's
type system is very simple to state: computations
on types exactly mirror computations on values. Instead of
juxtaposing type information with each data item in the heap we keep
them separate and perform two calculations: the first on types - to
produce type-safe programs, the second on values - to produce results.
To get all this off the ground we place one restriction on Church
programs that is not present in dynamically bound systems: the return
type of a function is a function of the
types of its parameters. This sounds harmless enough but it
prevents the following Java method being written in Church.
Object sqrt(float x) {
if (x<0) {
return new Complex(0,
Math.sqrt(x));
}
else {
return new Float(Math.sqrt(x));
}
}
The class of the return value of this sqrt implementation is a function
of the value of its argument
rather than the type of its
argument - and this is not supported in Church.
Since a Church implementation cannot throw errors, a sqrt
implementation written in Church would have to either return the more
general 'Complex' type in all cases or return the magnitude of its
potentially complex result. [In pure Church - I.e. an implementation
that creates its own representation of numbers - signed integers happen
to be distinguished from non-negative integers by type so the problem
highlighted in this particular example doesn't apply. Other examples
can nevertheless be constructed to demonstrate the same restriction
over pure Lambda Calculus. ]
No subclassing or
type hierarchy
Church's type system is functional rather than hierarchical. When a new
type is generated, the synthesised type is a function of types that
returns a type. So synthesised types are subject to application and
abstraction (with Currying) - just like values. Since the manipulation
of types mirrors the manipulation of values, all type manipulation is
implicit/automatic in Church. Explicit references to types are only
required in the specification of the formal parameters to an overloaded
function.
The types of the underlying platform, both primitive and reference, may
be used in Church programs as base
types that are treated as symbols or constants. A synthesised
type may be given
an auto-generated name and emitted to the underlying platform to define
the intermediate results of Curried functions - but no subclassing
primitives are provided by the Church language itself.
Static typing
A type system of some kind is required if we wish to use a single
operator, '-' say, to represent the negation of an integer (int) and
floating point number (float). In this example, the 32 bits that make
up an int or a float can only be distinguished by a separate piece of
information that indicates how the bits should be manipulated to
produce its
additive inverse. Dynamic typing requires that the object itself
carries with it a
reference to the type of number that the collection of bits represents.
The
overhead
for dynamic typing on arithmetic operations is large and this has
historically lead to pure dynamically typed languages, like Lisp and
SmallTalk, being abandoned for applications that perform number
crunching.
Consider a code fragment in which a variable is subject to a negation
operation in the body of a loop. It
does indeed seem wasteful to copy the type information that could be
associated with the variable into
each of the potentially enormous number of values the variable may take
during the execution of the program.
Java takes a typically pragmatic view here and
introduces primitives to
provide a compromise that works well for the vast majority of typical
use cases. Java's primitives allow lexical blocks of Java to be
compiled
into code whose efficiency is the same as that of C. It could be argued
that this single step widened the potential audience of Java to a level
many orders of magnitude beyond that of the languages that embraced the
'everything is an object' paradigm.
On the other hand, a few
years of programming in Java are all that are needed to acquire a
passionate resentment of Java's primitives for the plethora of
irritating
anomalies they bring to the rest of the language and its libraries. To
retain the lofty goal
of providing abstractions that are effective to as many communities as
possible, including scientific computing and low-level systemic work,
we simply cannot abandon
primitives and therefore choose to throw out all those parts of
the Java language that are inconsistent with them. We know how we want
the rest of the language to work - as Java does - but we are forced to
abandon dynamic typing as an implementation method.
Worst-case costs of
dynamic binding
Consider the LISP's primitive data structure, cons, that takes two objects and
returns a cell that points to them both. We will model this first in
Java, with a class called Pair which
has instance variables first and
rest. The second variable is
called rest (instead of second) to reflect typical usage
where arbitrary length Lists are stored by placing the first element in
the first slot and all the
rest of the elements in another Pair pointed to by rest.
class Pair {
Object first;
Object rest;
Pair(Object first, Object rest) {
this.first = first;
this.rest = rest;
}
}
The absence of useful type information in the class above forces
coercion to be used by algorithms that make use of it. In turn, this
forces dynamic binding on us as an implementation method. Although
flexible enough to serve as a general data structure, the performance
penalty for using this structure to store, say a list of characters, is
very large. On a typical 32 bit implementation each character takes
about five words - three for each Pair, and two for each Character.
So a String stored this way would take about ten times as much space as
an ordinary Java String. While a precise value for this performance
degradation is dependent on VM specifics the problem is not due to Java
itself but is fundamental to dynamically bound systems in general - all
of which suffer some space penalty to bring consistency to their
treatment of objects.
Part of the problem with this class is caused by the fact that the Pair
object is not able to hold type information itself and therefore has to
force type information into objects that are too small to hold it
efficiently. In Java 1.5 we can use generics
to introduce useful type
information to this class as follows:
class Pair<A, B> {
A first;
B rest;
Pair(A first, B rest) {
this.first = first;
this.rest = rest;
}
}
At this point (and for many
good reasons) Java chooses to force the type parameters, A and
B, to be reference types - I.e. they can refer to the Character wrapper class but not
the char primitive type. So,
while Java's generics system serves us well in allowing the removal of
many error prone coercion operations, we need to extend it to provide
the option of generating specific implementations for handling
primitives so as to remove the
run-time performance hit associated with autoboxing.
Currying
Currying of values
In Church we can use Currying to
produce an analogue of Lisp's cons
primitive and Java's constructors.
let pair(first, rest; f) = f(first, rest);
Here the function f is an
accessor function that returns one of the two 'instance variables' that
are bound to the closure.
So the first accessor is:
let first(x, y) = x;
and the rest accessor is:
let rest(x, y) = y;
The following fragment shows a construction and accessor operation in
action:
pair(1, 2, first);
returns 1.
Currying introduces the idea that the third argument may not be
provided in the same lexical context as the first two. The use of the
semicolon, instead of a comma, in the specification of the formal
parameters of the pair function allows it to be called with either two
arguments or three. We can now create the data structure in one place
and retrieve its contents somewhere else:
let p1 = pair(1, 2);
then later:
p1(first) or p1((x, y) -> x)
returns 1.
Currying of types
Suppose that our underlying system has an int type defined as in Java's VM
specification.
In the example above, the return type of the pair function is a
synthesised type. We don't need to write anything to get Church to
create this type but, underneath the bonnet, the following concrete
types will be emitted to the underlying Java platform. At the point we
see the lexical construct, pair(1, 2), the Church compiler can deduce
that we need an implementation of pair that can be applied to two int arguments. In Java we need a
new class and a new interface to support the ideas of variable capture
and reference respectively. The new class effects effectively copies
state from the stack to the heap. The new interface implements the
inverse of this operation - copying information from the heap to the
stack.
public class IntInt {
private int first;
private int rest;
public IntInt(int first, int rest) {
this.first = first;
this.rest = rest;
}
public int apply(Accessor a) {
return a.apply(this.first,
this.rest);
}
public interface Accessor {
int apply(int first, int rest);
}
}
The accessors, first and rest, given above would be
translated to
Java as follows:
static IntInt.Accessor first = new IntInt.Accessor() { apply(int
first, int rest) { return first; }
static IntInt.Accessor rest = new IntInt.Accessor() { apply(int
first, int rest) { return rest; }
To summarise, access to the heap is provided by Currying in Church. In
all cases the Church VM emits type-safe code to the underlying Java VM
so that the Java VM can use its own type system to optimise the
implementations. Type information is gathered together from instances
and factored into larger synthetic types that are associated with
programs. The result is that Church's 'Objects' do not need class
pointers and may be handled the same way as primitives.
All 'primitive' data
types external to the VM
The fundamental data type of the Church VM is not the bit, the int, the word or the class but the function. It is difficult to
imagine a VM without ints - how, for example can I run a loop without a
variable to increment? In one of the use cases in a later section we
will see that we can
implement integers from scratch. Programs may therefore make use of the
fundamental properties of integers without relying on an implementation
method.
It is a common misconception that processors spend most of their time
performing arithmetic operations, like addition, bit-wise and, etc., on
collections of bits. While the mechanics of processors are indeed
binary, this view of computation is quite misleading. About a quarter
of the byte
codes available to Java's interpreter are associated with arithmetic
operations. During the execution of a typical Java program these
arithmetic operations, in total, comprise less than 10% of the
instructions
invoked by the processor. So what does the other 90% of CPU time get
spent on? - setting up stack frames, copying pointers and the
like: things that have far more to do with the evaluation of functions
than
arithmetic.
Jettisoning the primitive types from the VM specification is actually
very
similar to the method adopted by the Java VM for
operators that have hardware support but didn't get a byte-code: they
are provided as 'native' methods in special class files. The sine function, for example, is
provided as a static method in the Math
class with a native implementation. Java's JIT compiler in-lines
these methods and emits code that makes use of appropriate hardware
support so very little seems to
be lost in pushing their definition out of the VM.
Ease of use
Unification of object
and reference equality
Because Church itself does not permit side effects, two objects that
are
equal will always be equal. We can therefore unify object
equality and, since there is no naked assignment operator, use the '='
operator to represent it. As
with most Java implementations of equals, a check for
pointer equality may be made by the VM to improve performance.
Obviate the use of
StringBuffers in toString() methods.
Efficient toString() methods often require the temporary allocation of
a StringBuffer to reduce the number of intermediate String objects that
are created during its execution. In Church, the Object::toString()
method is replaced with an interface (given here in Java):
public interface Writable {
public Stream<Character>
write(Stream<Character> s);
}
Where Stream is a simple interface that can be implemented by Files,
Strings and Buffers. The object that implements the Stream interface
may be arbitrarily complicated and is a good place to put other stream
related state such as the IdentityHashMap that is often required to
handle cyclic data structures. The write method may also be overloaded
to have different implementations on different types of stream:
allowing an object to write itself in a variety of formats.
Implicit package names
There is no equivalent of Java's package directive in Church.
Package names are implied by the names of the directories and they are
found in. To change the name of a package, simply rename the directory
that contains the source files. To move a package to a different part
of the package hierarchy, move the directory.
No class loaders
Any source of characters may be converted into an executable function
at
runtime (provided it can be compiled).
Hide class/object
files
Church programs and libraries are distributed in source form. Local
copies of
external resources
may be compiled into machine dependent formats
and/or cached locally but this process is made transparent to the
developer.
Zip/Jar files or some other mechanism to download sets of related files
at once are needed to mitigate the latency involved in opening remote
network connections. This happens automatically and local caches
of 'jar' files are kept up-to-date by checking the originating site for
new versions.
The enforced
distribution of sources has a number of legal implications for big
systems but, as even the least hirsute Java professional will
know, the transformation of Java sources to class files is easy to
reverse. A source code obfuscator can therefore be considered to
provide protection that is fundamentally equivalent to that provided by
byte-code obfuscators.
No classpaths!
References to external code are made by deriving full URLs from import
statements and named references in the code that
depends on them. Local copies may be made for performance reasons but
they are kept up to date automatically so this process is transparent.
Church's 'jar' files
have to have the name of the directory they represented when they were
created. This allows their contents
to be accessed automatically.
Versioning
All providers of external resources are obliged to maintain all
previous versions of their public releases, indexed by date.
Automatic versioned
persistence for all 'objects'
All state (in the form of closures) may be written
to a file for evaluation by a VM invoked at a later date. If the
persistent form includes references to external resources, they are
accessed with a qualified name/date pair so that the state will be
recreated
exactly as it was in the original image. Incompatibilities may arise
between this old structure and current versions of functions that only
apply to new data structures. The type system will normally report
these problems but their solution requires manual intervention. <Get
some views on this - would it be better to in-line all
external resources to remove the dependencies and make them
free-standing?>
Absence of security
directives
A Church program operates against a VM whose capabilities are defined
when it is launched. The VM has a number of switches that can be used
to permit access to certain OS level functions such as file system and
network access. All
programs inherit the capabilities of the VM and no control or
manipulation of those capabilities is available to a Church
program.
Robustness
No null reference
Church's closures are very much like objects - except that closures
must have all
their state given to them at creation time. This obviates the need for
a universal null reference -
the source of many trivial but annoying bugs in systems implemented in
Java.
[All Object-Oriented systems need to have the idea of null - to
represent uninitialised parts of an object during initialisation.
Example:
Given an class with a single instance variable and accessor methods it
is possible to create a pair of objects o1 and o2 which
'point to' each other. At some stage in the creation of this circular
data structure it must be the case that a valid program has access to
the uninitialised state of one of the objects. Any (erroneous) attempt
to
execute a method on this vacant part of the object is open to ambiguity
and in Java results in the raising of an exception.
Interestingly, Objective-C makes a different choice here - simply
defining all messages sent to null to return null. I have been tracking
this in my Java coding and noticed the last half dozen or so
NullPointerExceptions I have fixed in my Java programs would all have
worked correctly in Objective-C I.e. my fixes all happened to implement
the general rule used by Objective-C.]
Programs cannot fail
or throw exceptions
It is not possible to write a program that will throw an internal
exception in Church and Church does not provide mechanisms to catch
exceptions thrown by foreign functions. The type system prevents all
inappropriate applications of functions to arguments, so everything
that
gets past the type system can be executed. All programs therefore
proceed until one of the following conditions applies:
- the VM has successfully evaluated the input and has finished,
having computed the specified the result
- the VM is awaiting further input from an input stream
- the VM has terminated because execution of the program has
exceeded one of the following VM limits:
- heap space
- stack space
- processing time
These limits are normally set by the user (or process) that launches
the VM. They may be monitored by Church programs but the
systemic exceptions that arise when the limits are exceeded
unconditionally terminate
the VM.
Use cases
We will try to use cases from as disparate and testing a set of
communities as possible - not because they are representative but
because the union of their requirements will give us the greatest
possible coverage. If the language can be made to perform adequately in
all of these environments we can be optimistic it will perform well for
the vast majority of more typical programming tasks.
Arithmetic
fundamentals
Church numerals, are an example of an even deeper fundamental property
of objects that are supported by the Church VM. In general, we never
ask what something is,
instead we only ever ask what it does.
A function is completely defined by the effect it has on its arguments
- and numbers are no exception. A Church Numeral defines the counting
numbers, 0, 1, 2, ... as functions of two arguments, f and x say. 0 is the function of f and x
that applies f zero times to x and returns the result - I.e. it returns
x. 1 is the function of f and x that applies f once, as in f(x), and 2
is the function of f and x that applies f twice to x to return f(f(x)).
Given these definitions of the counting numbers we can define the
Church numerals and their arithmetic operators as follows (this example
uses the +, * and ^ operators in their explicit prefix form to avoid
confusion when they are Curried):
let 0(f, x) = x;
let 1(f, x) = f(x);
let 2(f, x) = f(f(x));
etc.
let +(a, b; f; x) = a(f, b(f, x));
let *(a, b; f) = a(b(f));
let ^(a, b) = b(a);
assert *(3, 7) = 21;
Scientific computing
A rare exception to Java's impressive acceptance story has been the
scientific computing community who have remained resistant to using
Java - primarily because of performance concerns. Below is a
Fast-Fourier Transform routine written in FORTRAN:
<FFT>
<Use Gaussian elimination as an example instead?>.
We aim to split this piece of code into its algorithmic and arithmetic
parts, so that the Church version will use general routines for
performing complex arithmetic and leave the algorithmic part in its
simpler and more general form. The Church compiler should be able to
in-line the functions that define complex arithmetic and essentially
recreate the FORTRAN style program above. The result should be a
presentation of the algorithm that is concise and general and yet as
fast as the more specific FORTRAN routine above.
Symbolic/Algebraic
Computing
Symbolic algebra systems, like Mathematica, typically use special data
structures to represent polynomials and other mathematical formulae.
Rule-based reduction systems are then written in some meta-language
that operates on these data structures. Often, facilities are provided
to take the results of symbolic computations and write them to files as
FORTRAN or C functions that can be compiled and used independently. We
would like to integrate all this, so that rule-based algorithms can be
applied to ordinary programs instead
of data.
Show that the basic arithmetic operations can be implemented for a new
univariate polynomial type and that a canonical form can be maintained.
Show that (x + 1) ^2 = x^2 + 2*x + 1;
Rule-based
programming
Some problems, like the differentiation operator in Calculus, have no
general solution that can be expressed as an ordinary function of a
functional input. The symbolic result can only be implemented as a
series of rules that will differentiate a subset of all possible
functions and return the unanswerable parts of the problem unchanged.
Of course, a general procedure does exist for calculating numerical approximations to the
derivative of functions over an approximate domain such as Java's double. In Church, such a function
could be defined as:
let differentiate(f, x) = (f(x+0.01) - f(x))/0.01;
But the symbolic version of differentiate has many advantages over this
numerical approach. Firstly, the accuracy of the numerical result is
dependent the 'delta' value (0.01) - and good values are domain
specific. Secondly, in the case of, say, the function x -> 2*x, the
symbolic result x -> 2 is much more efficient as an implementation
than its partially-evaluated numerical equivalent: x -> (2*(x+0.01)
- 2*x)/0.01.
The use-case here then will be to provide a handful of rules that
define a symbolic differentiation operator, diff, that can be applied to
functions. So if:
let f(x) = (x+1)^2;
diff(f);
returns the unnamed function:
x -> 2 * (x + 1);
diff^2(f);
calculates 2(diff, f) or diff(diff(f)), the second derivative, to
return:
x -> 2;
Note that this is not equal to the number 2. This is a 'constant'
function that accepts an argument and returns 2.
Database access
Log into a remote database and search for a record satisfying given
criteria.
Cryptography
Implement the RSA encryption algorithm, compare speed with C.
Number crunching
Implement one of the factorisation algorithms, compare speed with
C.
GUI programming
Put up a GUI to accept two numbers and present their product when a
'calculate' button is pressed.
Animation
Pop up a window that presents a synthesised animation, suitable for
users of recreational drugs.
Graphics
Demonstrate a simple ray-tracing algorithm in Church to render a three
dimensional (CSG) model by recursive subdivision. Demonstrate the
performance improvement of replacing numerical differentiation methods
with symbolic differentiation as defined above.