There
are several features in Modula-3 that support structuring of large systems.
First is the separation of interface from implementation. This allows for
system evolution as implementations evolve without affecting the clients of
those interfaces; no one is dependent on *how* you implement something, only
*what* you implement. As long as the *what* stays constant, the *how* can
change as much as is needed. Secondly, it provides a simple single-inheritance
object system. There is a fair amount of controversy over what the proper model
for multiple inheritance (MI) is. I have built systems that use
multiple-inheritance extensively and have implemented programming environments
for a language that supports MI. Experience has [1][1] ```1 [i][i] taught me that MI can complicate a
language tremendously (both conceptually and in terms of implementation) and
can also complicate applications. Modula-3 has a particularly simple definition
of an object. In Modula-3, an object is a record on the heap with an associated
method suite. The data fields of the object define the state and the method
suite defines the behavior. The Modula-3 language allows the state of an object
to be hidden in an implementation module with only the behavior visible in the
interface. This is different than C++ where a class definition lists both the
member data and member function. The C++ model reveals what is essentially
private information (namely the state) to the entire world. With Modula-3
objects, what should be private can be really be private. Programmers wind up
adopting such conventions, as the explicit use of reference counts to determine
when it is safe to deallocate storage. Unfortunately, programmers are not very
good about following conventions. The net result is that programs develop
storage leaks or the same piece of storage is mistakenly used for two different
purposes. Also, in error situations, it may be difficult to free the storage.
In 'C', a 'longjmp' may cause storage to be lost if the procedure being unwound
doesn't get a chance to clean up. Exception handling in C++ has the same
problems. In general, it is very difficult to manually reclaim storage in the
face of failure. Having garbage collection in the language removes all of these
problems. Better yet, the garbage collector that is provided with SRC
implementation of Modula-3 has excellent performance. .It is the result of several years of production use and tuning. Most modern systems and applications have
some flavor of asynchrony in them. Certainly all GUI-based applications are
essentially asynchronous. The user drives inputs to a GUI-based application.
Multiprocessor and multi-machine applications are essentially asynchronous as
well. Given this, it is surprising that very few languages provide any support
at all for managing concurrency. Instead, they "leave it up to the
programmer". More often then not, programmers do this through the use of
timers and signal handlers. While this approach suffices for fairly simple
applications, it quickly falls apart as applications grow in complexity or when
an application uses two different libraries, both of which try to implement
concurrency in their own way. If you have ever programmed with TX or Motif,
then you are aware of the problems with nested event loops. There needs to be
some standard mechanism for concurrency. Modula-3 provides such a standard
interface for creating threads. In addition, the language itself includes
support for managing locks. The standard
libraries provided in the SRC implementation are all thread-safe.
Trestle, which is a library providing an interface to X is not only
thread-safe, but also itself, uses threads to carry out long operations in the
background. With a Trestle-based application, you can create a thread to carry
out some potentially long-running operation in response to a mouse-button
click. This thread runs in the background without tying up the user interface.
It is a lot simpler and error prone than trying to accomplish the same thing
with signal handlers and timers. Generic interfaces and modules are a key to
reuse. One of the principal uses is in defining container types such as stacks,
lists, and queues. They allow container objects to be independent of the type
of entity contained. Thus, one needs to define only a single "Table"
interface that is then instantiated to provide the needed kind of
"Table", whether an integer table or a floating-point table or some
other type of table is needed. Modula-3 generics are cleaner than C++
parameterized types, but provide much of the same flexibility. Below is a detailed description of these
features that are embedded in modula-3 to make it the exceptional language that it is today.
One
of Modula-2's most successful features is the provision for explicit interfaces
between modules. Interfaces are retained with essentially no changes in
Modula-3. An interface to a module is a collection of declarations that reveal
the public parts of a module; things in the module that are not declared in the
interface are private. A module imports the interfaces it depends on
and exports the interface (or, in Modula-3, the interfaces) that it
implements. Interfaces make separate compilation type-safe; but it does them an
injustice to look at them in such a limited way. Interfaces make it possible to
think about large systems without holding the whole system in your head at
once. Programmers who have never used Modula-style interfaces tend to
underestimate them, observing, for example, that anything that can be done with
interfaces can also be done with C-style include files. This misses the point:
many things can be done with include files that cannot be done with interfaces.
For example, the meaning of an include file can be changed by defining macros
in the environment into which it is included. Include files tempt programmers
into shortcuts across abstraction boundaries. To keep large programs well
structured, you either need super-human will power, or proper language support
for interfaces.
One
of the important lessons from C was that there are times that real systems need
to be programmed essentially at the machine level. This power has been nicely
integrated into the Modula-3. Any module that is marked as 'unsafe' has full
access to machine-dependent operations such as pointer arithmetic,
unconstrained allocation and deallocation of memory, and machine-dependent
arithmetic. These capabilities are exploited in the implementation of the
Modula-3 IO system. The lowest-levels of the IO system are written to make
heavy use of machine-dependent operations to eliminate bottlenecks. In
addition, existing (non-Modula-3) libraries can be imported. Many existing C
libraries make extensive use of machine-dependent operations. These can be
imported as "unsafe" interfaces. Then, safer interfaces can be built
on top of these while still allowing access to the unsafe features of the
libraries for those applications that need them.
The
better we understand our programs, the bigger the building blocks we use to
structure them. After the instruction came the statement, after the statement
came the procedure, after the procedure came the interface. The next step seems
to be the abstract type. At the
theoretical abstract type is a type defined by the specifications of its
operations instead of by the representation of its data. As realized in modern
programming languages, a value of an abstract type is represented by an
"object" whose operations are implemented by a suite of procedure
values called the object's "methods". A new object type can be
defined as a subtype of
an existing type, in which case the new type has all the methods of the old
type, and possibly new ones as well (inheritance). The new type can provide new
implementations for the old methods (overriding). The farsighted designers of
Simula invented objects in the mid-sixties . Objects in Modula-3 are very much
like objects in Simula: they are always references, they have both data fields
and methods, and they have single inheritance but not multiple inheritances.
Small examples are often used to get across the basic idea: truck as a subtype
of vehicle; rectangle as a subtype of polygon. Modula-3 aims at larger systems
that illustrate how object types provide structure for large programs. In
Modula-3 the main design effort is concentrated into specifying the properties
of a single abstract type---a stream of characters, a window on the screen.
Then dozens of interfaces and modules are coded that provide useful subtypes of
the central abstraction. The abstract type provides the blueprint for a whole
family of interfaces and modules. If the central abstraction is well designed
then useful subtypes can be produced easily, and the original design cost will
be repaid with interest. The combination of object types with Modula-2 opaque
types produces something new: the partially opaque type, where some of
an object's fields are visible in a scope and others are hidden. Because the
committee had no experience with partially opaque types, the first version of
Modula-3 restricted them severely; but after a year of experience it was clear
that they were a good thing, and the language was revised to remove the
restrictions. It is possible to use object-oriented techniques even in
languages that were not designed to support them, by explicitly allocating the
data records and method suites. This approach works reasonably smoothly when
there are no subtypes; however it is through sub typing that object-oriented
techniques offer the most leverage. The approach works badly when sub typing is
needed: either you allocate the data records for the different parts of the
object individually (which is expensive and notationally cumbersome) or you
must rely on unchecked type transfers, which is unsafe. Whichever approach is
taken, the subtype relations are all in the programmer's head: only with an
object-oriented language is it possible to get object-oriented static type
checking.
Example-2 defines two types: "File.T",
which is an object with two methods to get and put a single character resp; and
"FileServer.T", an object that manages file objects. A server
someplace defines a concrete implementation of these abstract types.
INTERFACE File;
TYPE T = NetObj.T OBJECT
METHODS
getChar(): CHARACTER;
putChar(c: CHARACTER);
END;
END File;
INTERFACE FileServer;
IMPORT File;
TYPE T = NetObj.T OBJECT
METHODS
create(name: Text): File.T;
open(name: Text): File.T;
END;
END FileServer;
A generic
module is a template in which some of the imported interfaces are regarded as
formal parameters, to be bound to actual interfaces when the generic is
instantiated. For example, a generic hash table module could be instantiated to
produce tables of integers, tables of text strings, or tables of any desired
type. The different generic instances are compiled independently: the source
program is reused, but the compiled code will generally be different for
different instances. To keep Modula-3 generics simple, they are confined to the
module level: generic procedures and types do not exist in isolation, and
generic parameters must be entire interfaces. In the same spirit of simplicity,
there is no separate type checking associated with generics. Implementations
are expected to expand the generic and type checks the result. The alternative
would be to invent a polymorphism type system flexible enough to express the
constraints on the parameter interfaces that are necessary in order for the
generic body to compile. This has been achieved for ML and CLU, but it has not
yet been achieved satisfactorily in the Algol family of languages, where the
type systems are less uniform. (The rules associated with Ada generics are too
complicated for our taste.) Modula-3 generics are cleaner than C++
parameterized types, but provide much of the same flexibility.
Dividing
a computation into concurrent processes (or threads of control) is a
fundamental method of separating concerns. For example, suppose you are
programming a terminal emulator with a blinking cursor: the most satisfactory
way to separate the cursor blinking code from the rest of the program is to
make it a separate thread. Or suppose you are augmenting a program with a new
module that communicates over a buffered channel. Without threads, the rest of
the program will be blocked whenever the new module blocks on its buffer, and
conversely, the new module will be unable to service the buffer whenever any
other part of the program blocks. If this is unacceptable (as it almost always
is) there is no way to add the new module without finding and modifying every
statement of the program that might block. These modifications destroy the
structure of the program by introducing undesirable dependencies between what
would otherwise be independent modules.The provisions for threads in Modula-2
are weak, amounting essentially to coroutines. Hoare's monitors are a sounder
basis for concurrent programming. Monitors were used in Mesa, where they worked
well; except that the requirement that a monitored data structure be an entire
module was irksome. For example, it is often useful for a monitored data
structure to be an object instead of a module. Mesa relaxed this requirement,
made a slight change in the details of the semantics of Hoare's Signal primitive, and introduced the Broadcast primitive as a convenience [Lampson]. The Mesa primitives were simplified in the
Modula-2+ design, and the result was successful enough to be incorporated with
no substantial changes in Modula-3. A threads package is a tool with a very
sharp edge. A common programming error is to access a shared variable without
obtaining the necessary lock. This introduces a race condition that can lie
dormant throughout testing and strike after the program is shipped. Theoretical
work on process algebra has raised hopes that the rendezvous model of
concurrency may be safer than the shared memory model, but the experience with
Ada, which adopted the rendezvous, lends at best equivocal support for this
hope, Ada still allows shared variables, and apparently they are widely used.
A
language feature is unsafe if
its misuse can corrupt the runtime system so that further execution of the
program is not faithful to the language semantics. An example of an unsafe
feature is array assignment without bounds checking: if the index is out of
bounds, then an arbitrary location can be clobbered and the address space can
become fatally corrupted. An error in a safe program can cause the computation
to abort with a run-time error message or to give the wrong answer, but it
can't cause the computation to crash in rubble of bits. Safe programs can share
the same address space, each safe from corruption by errors in the others. To
get similar protection for unsafe programs requires placing them in separate
address spaces. As large address spaces become available, and programmers use
them to produce tightly coupled applications, safety becomes more and more
important. Unfortunately, it is generally impossible to program the lowest
levels of a system with complete safety. Neither the compiler nor the runtime
system can check the validity of a bus address for an I/O controller, nor can
they limit the ensuing havoc if it is invalid. This presents the language
designer with a dilemma. If he holds out for safety, then low-level code will
have to be programmed in another language. But if he adopts unsafe features,
then his safety guarantee becomes void everywhere. The languages of the BCPL
family are full of unsafe features; the languages of the Lisp family generally
have none (or none that are documented). In this area Modula-3 follows the lead
of Cedar by adopting a small number of unsafe features that are allowed only in modules explicitly
labeled unsafe. In a safe module, the compiler prevents any errors that could
corrupt the runtime system; in an unsafe module, it is the programmer's
responsibility to avoid them.
One
of the most important features in Modula-3 is garbage collection. Garbage
collection really enables robust, long-lived systems. Without garbage collection,
you need to define conventions about who owns a piece of storage. For instance,
if I pass you a pointer to a structure, are you allowed to store that pointer
somewhere? A classic unsafe runtime error is to free a data structure that is
still reachable by active references (or "dangling pointers"). The
error plants a time bomb that explodes later, when the storage is reused. If on
the other hand the programmer fails to free records that have become
unreachable, the result will be a "storage leak" and the computation
space will grow without bound. Problems due to dangling pointers and storage
leaks tend to persist long after other errors have been found and removed. The
only sure way to avoid these problems is the automatic freeing of unreachable
storage, or garbage collection. Modula-3 therefore provides "traced
references", which are like Modula-2 pointers except that the storage they
point to is kept in the "traced heap" where it will be freed
automatically when all references to it are gone. Another great benefit of
garbage collection is that it simplifies interfaces. Without garbage
collection, an interface must specify whether the client or the implementation
has the responsibility for freeing each allocated reference, and the conditions
under which it is safe to do so. This can swamp the interface in complexity.
For example, Modula-3 supports text strings by a simple required interface Text, rather than with a built-in type.
Without garbage collection, this approach would not be nearly as attractive.
New refinements in garbage collection have appeared continually for more than
twenty years, but it is still difficult to implement efficiently. For many
programs, the programming time saved by simplifying interfaces and eliminating
storage leaks and dangling pointers makes garbage collection a bargain, but the
lowest levels of a system may not be able to afford it. For example, in SRC's
Topaz system, the part of the operating system that manages files and
heavyweight processes relies on garbage collection, but the inner
"nub" that implements virtual memory and thread context switching
does not. Essentially all Topaz application programs rely on garbage
collection. For programs that cannot afford garbage collection, Modula-3
provides a set of reference types that are not traced by the garbage collector.
In most other respects, traced and untraced references behave identically.
An
exception is a control construct that exits many scopes at once. Raising an
exception exits active scopes repeatedly until a handler is found for the
exception, and transfers control to the handler. If there is no handler, the
computation terminates in some system-dependent way---for example, by entering
the debugger. There are many arguments for and against exceptions, most of
which revolve around inconclusive issues of style and taste. One argument in
their favor that has the weight of experience behind it is that exceptions are
a good way to handle any runtime error that is usually, but not necessarily,
fatal. If exceptions are not available, each procedure that might encounter a
runtime error must return an additional code to the caller to identify whether
an error has occurred. This can be clumsy, and has the practical drawback that
even careful programmers may inadvertently omit the test for the error return
code. The frequency with which returned error codes are ignored has become
something of a standing joke in the Unix/C world. Raising an exception is more
robust, since it stops the program unless there is an explicit handler for it.
Like all languages in the Algol family, Modula-3 is strongly typed. The basic
idea of strong typing is to partition the value space into types, restrict
variables to hold values of a single type, and restrict operations to apply to
operands of fixed types. In actuality, strong typing is rarely so simple. For
example, each of the following complications is present in at least one
language of the Algol family: a variable of type [0.9] may be safely assigned to an INTEGER, but not vice-versa (sub typing).
Operations like absolute value may apply both to REALs and to INTEGERs instead of to a single type
(overloading). The types of literals (for example, NIL) can be ambiguous. The type of an
expression may be determined by how it is used (target-typing). Type mismatches
may cause automatic conversions instead of errors (as when a fractional real is
rounded upon assignment to an integer). We adopted several principles in order
to make Modula-3's type system as uniform as possible. First, the+-re are no
ambiguous types or target typing: the type of every expression is determined by
its sub expressions, not by its use. Second, there are no automatic
conversions. In some cases the representation
of a value changes when it is assigned (for example, when assigning to a
packed field of a record type) but the abstract value itself is transferred
without change. Third, the rules for type compatibility are defined in terms of
a single subtype relation. The subtype relation is required for treating
objects with inheritance, but it is also useful for defining the type
compatibility rules for conventional types.
In
the early days of the Ada project, a general in the Ada Program Office opined
that "obviously the Department of Defense is not interested in an
artificially simplified language such as Pascal". Modula-3 represents the
opposite point of view. We used every artifice that we could find or invent to
make the language simple. C. A. R.
Hoare has suggested that as a rule of thumb a language is too complicated if it
can't be described precisely and readably in fifty pages. The Modula-3
committee elevated this to a design principle: we gave ourselves a
"complexity budget" of fifty pages, and chose the most useful
features that we could accommodate within this budget. In the end, we were over
budget by six lines plus the syntax equations. This policy is a bit arbitrary,
but there are so many good ideas in programming language design that some kind
of arbitrary budget seems necessary to keep a language from getting too
complicated.
To
bring to a close, the features were directed toward two main goals. Interfaces,
objects, generics, and threads provide fundamental patterns of abstraction that
help to structure large programs. The isolation of unsafe code, garbage
collection, and exceptions help make programs safer and more robust. Of the
techniques that we used to keep the language internally consistent, the most
important was the definition of a clean type system based on a subtype
relation. There is no special novelty in any one of these features
individually, but there is simplicity and power in their combination
The
links to this information are located at these
sites:
1.http://www.research.digital.com/SRC/modula-3/html/home.html
2ftp://ftp.vlsi.polymtl.ca/pub/m3/m3-faq.ps
3[Modula-3 home page]