Go to the first, previous, next, last section, table of contents.
The Oberon-2 language as implemented by OOC is based on the paper
The Programming Language Oberon-2
H. Moessenboeck, N. Wirth
Institut fuer Computersysteme, ETH Zurich, March 1995
ftp://ftp.inf.ethz.ch/pub/Oberon/Docu/Oberon2.Report.ps.gz
This paper describes the language in just 16 pages. Additional appendices cover implementation details of Oberon-2 and its implementation as part of the Oberon operating system for the Ceres workstation. These appendices are system dependent, and are only partially implemented by OOC.
This chapter is intended as a supplement to the language report. Together with the report, it describes the programming language accepted by OOC compilers. It specifies details left open by the report and those features of OOC that go beyond the scope of the language report.
OOC does not impose a limit on the length of identifiers or string literals. The Oakwood guidelines suggest that a compiler should support at least 23 significant characters for identifiers, but there is no such suggestion for strings. The pragma variables `IdentLength' and `StringLength' control the length of identifiers and string literals accepted by the compiler (see section Option and Pragma Variables). Note that other compilers may impose arbitrary limits on identifiers and strings.
The basic types of OOC are defined as follows:
============================================================= Name Size in Bytes MIN(T) MAX(T) ------------------------------------------------------------- BOOLEAN 1 n/a n/a CHAR 1 0X 0FFX SHORTINT 1 -128 127 INTEGER 2 -32768 32767 LONGINT 4 -2147483648 2147483647 HUGEINT 8 -2^63 2^63-1 REAL 4 -3.40282347E+38 3.40282347E+38 LONGREAL 8 -1.79769313D+308 1.79769313D+308 SET 4 0 31 -------------------------------------------------------------
The integer type HUGEINT is required to be implemented only by
compilers for 64-bit target architectures; that is, for targets whose
address size is 8 bytes. HUGEINT is optional for all other
implementations of OOC.
The size and extreme values of the real numeric types depend on the floating
point representation used on the target system. For the vast majority of
modern systems, this will be based on the IEEE standard 754-1985. In this
case, REAL is encoded as IEEE single precision number in 4 bytes, and
LONGREAL as double precision number in 8 bytes. The numbers given in
the table above assume IEEE representation for real numeric types. Note
that the extreme values as shown for LONGREAL are truncated; for the
exact absolute value, refer to the constant LowLReal.large.
The string constant `""' and the character constant `0X' are interchangeable. This implies that a string constant cannot contain the character `0X'.
Arrays of length zero are permitted, but the compiler will emit a warning if it detects such an array during compilation (unless warnings are disabled).
So, a declaration of the form TYPE T=ARRAY 0 OF CHAR is allowed, and
variables of type T can be defined, but any attempt to access
elements in the array will cause an `index out of range' error either
during compilation (if detectable) or at run-time. Similarly, open array
instances of length zero can be created (by passing a length of zero to
NEW), but, of course, no elements can be accessed.
Note that, in any case, specifying a negative length will trigger an error either during compilation or at program run-time.
OOC does not impose any limit on the number of levels a record type may be extended. The Oakwood guidelines suggest that the number of levels of type extension should not be less than 8 levels, including the base type.
If an extended record type redefines a type-bound procedure, then the redefinition must appear after the base declaration in the source text. The following example violates this rule:
TYPE R0 = RECORD END; R1 = RECORD (R0) END; PROCEDURE (VAR r: R1) P; END P; PROCEDURE (VAR r: R0) P; END P;
In order to make this example legal, the order of declaration of the two procedures must be reversed. Note that this problem can only arise if the record types, and their corresponding type-bound procedures, are defined in the same module. This restriction is inherently present in many Oberon-2 compilers, although it is not stated in the language report.
Hexadecimal constants can be specified over the full unsigned range of the largest integer type. A constant value greater than the maximum value of the largest signed integer type is mapped onto a negative value by interpreting it as an unsigned representation of an integer number.
For example, if the compiler does not support the HUGEINT type,
hexadecimal constants in the range `08000000H..0FFFFFFFFH' are mapped
into the range `MIN(LONGINT)..-1' by matching the bit pattern of the
constant onto the negative value. However, if HUGEINT constants are
supported by the compiler, these values are mapped into the positive
range `2^31..2^32-1'. This means that the interpretation of such
constants is compiler-dependent. Without special precautions, modules using
this constant representation are not portable to systems that support
additional, larger integer types.
Note that the extended mapping of constant literals applies only to values given in hexadecimal format; a decimal integer constant will cause an overflow if it exceeds the maximum integer value. Special handling of hexadecimal constants is not part of the language report, but is implemented in many Oberon-2 compilers, presumably to ease the implementation of low-level modules that need a convenient way to define bit pattern values of word size.
DIV and MODThe results of integer division and modulus are defined as
x=(x DIV y)*y + x MOD y
where `0 <= x MOD y < y' or `y < x MOD y <= 0'.
Example:
================================= x y x DIV y x MOD y --------------------------------- 5 3 1 2 5 -3 -2 -1 -5 3 -2 1 -5 -3 1 -2 ---------------------------------
Note that with this definition the equation `x DIV y = ENTIER (x/y)' holds.
NEW
There are two points of clarification regarding OOC's implementation of
NEW:
NEW is
illegal. An exception is raised during any attempt to create an open array
with a negative length as one of its dimensions. A length of zero is
permitted, but no read operations on such an array are possible; note that,
writing `NEW(a,0)' triggers a compile-time warning (if warnings are
enabled).
NIL
value in `v'. If the program cannot obtain the required amount of
memory, an exception is raised.
HALT and ASSERTThe statements `HALT(n)' and `ASSERT(FALSE, n)' are equivalent to the C function invocation `exit(n)' unless the raised exception is caught explicitly. The value of `n' must be from the range `[0..255]'. The statement `ASSERT(FALSE)' is equivalent to `ASSERT(FALSE, 1)'.
SYSTEM.MOVE
There are several points to consider when using SYSTEM.MOVE:
SYSTEM.MOVE with a block size of zero will do nothing.
Also note that, due to the way OOC represents source code internally,
SYSTEM.MOVE should not be applied to local scalar variables of a
procedure. OOC will not recognize such a memory-level copy as a defining
instruction of a variable. The generated code would be valid, but it is
possible that a warning will be emitted, stating that the variable is not
defined before its use.
OOC has two, slightly different, modes of operation: conformant mode and non-conformant mode. Conformant mode emulates the behaviour of ETH compilers, whereas non-conformant mode is generally closer to the language report. The mode in use is determined by the boolean pragma variable `ConformantMode' (see section Option and Pragma Variables); non-conformant mode is the default setting. The differences between both modes are described in the following sections.
WITH
In older ETH compilers, the implementation of the WITH statement is
faulty. If a guarded variable is a formal parameter, the formal parameter
type is actually modified by the WITH statement; that is, the
interface of the procedure is changed for any calls to it within the scope
of the regional type guard.
Example:
MODULE TestWith;
TYPE
R0 = RECORD END; P0 = POINTER TO R0;
R1 = RECORD (R0) END; P1 = POINTER TO R1;
PROCEDURE Proc0 (VAR r: R0);
VAR r0: R0;
BEGIN
WITH r: R1 DO Proc0 (r0) END
END Proc0;
PROCEDURE Proc1 (p: P0);
VAR p0: P0;
BEGIN
WITH p: P1 DO Proc1 (p0) END
END Proc1;
END TestWith.
In conformant mode, the compiler warns that the formal parameter type is
modified by a WITH statement, and two errors are issued: the argument
of the first procedure call `Proc0 (r0)' is not compatible to a formal
parameter of type `R1', and in the second call `Proc1 (p0)', the
argument is not assignment compatible to type `P1'. This example
module is legal in non-conformant mode.
The language report specifies that a variable of pointer type cannot be compared with a procedure constant. Appendix A states that, for an expression `p0 = p1' where `p0' and `p1' are procedure values, both operands have to have the same type. However, the type rules imply that the type of any one procedure is distinct from all other types.
Example:
MODULE ProcCmp; VAR p0, p1: PROCEDURE; bool: BOOLEAN; PROCEDURE P; END P; BEGIN p0 := P; bool := (p0 = P) END ProcCmp.
The assignment to `p0' is legal, but the comparison between the procedure variable and the procedure value is not. Oberon-2 relaxes the type rules for assignments involving procedure types by requiring only structural equivalence rather than name equivalence. It would be natural to extend this to comparisons of procedure values as well. Because omitting this extension is most likely an oversight by the language designers, OOC implements the less restrictive rule. This means, that a comparison like in the example is legal in non-conformant mode, but will be marked as faulty in conformant mode.
INC and DECThe report defines `INC(v, n)' to be functionally equivalent to `v := v + n', and gives a similar definition for `DEC(v, n)'. It does not specify any type rules for these predefined procedures. ETH compilers implement the procedures in such a way that `INC(v, n)' is legal, if
Because this implementation is questionable when compared with the language report, OOC applies a less restrictive rule in non-conformant mode; `INC(v, n)' and `DEC(v, n)' are valid if the type of `n' is included in that of `v', regardless of `n' being a constant value or not. In conformant mode, the more restrictive interpretation of the ETH compilers is applied.
The language report specifies that, when redefining a type-bound
procedure, the formal parameters of the original procedure and the
redefinition must match. OOC enfores this rule in conformant
mode. However, in non-conformant mode, the compiler applies a less
restrictive rule that was first introduced by Oberon/F: If P' is
a redefinition of the type-bound procedure P, the formal
parameters of P and P' must match with the
exception of the function result of P', which may be an extension
of the result of P.
Example:
MODULE Redef; TYPE R0 = RECORD END; P0 = POINTER TO R0; R1 = RECORD (R0) END; P1 = POINTER TO R1; PROCEDURE (p: P0) Copy(): P0; ... END Copy; PROCEDURE (p: P1) Copy(): P1; ... END Copy; END Redef.
In conformant mode, the compiler will complain that the formal parameters of the second procedure do not match the ones of the first. Conformant mode requires the result type of the second procedure to be `P0', whereas non-conformant mode allows the extended type `P1' as valid.
By allowing the less restrictive rule for return types, it is possible to design class hierachies distributed over several modules where an extended class can be used without direct knowledge of the base class. An example of this is found in the module `Files': If adhering to the report, the type-bound procedures `NewReader' and `NewWriter' would need to return types imported from module `Channel'. Any module using these procedures would also need to import `Channel' to define variables to hold the result of these procedures. With the relaxed redefinition rule, the procedures can return types from module `Files', and can therefore be used separately from the base module `Channel'.
OOC provides several data types beyond the standard types of Oberon-2. They are introduced to add Unicode support and to ease interfacing to foreign code and migration to 64-bit target architectures.
In order to support the Unicode character set, OOC adds the type
LONGCHAR and introduces the concept of long strings. The
character types are now CHAR and LONGCHAR, and
string constants can be either String or LongString.
The basic character types are as follows:
CHAR
LONGCHAR
The character type LONGCHAR includes the values of type CHAR
according to the following hierarchy:
LONGCHAR >= CHAR
Character constants are denoted by the ordinal number of the character in
hexadecimal notation followed by the letter `X'. The type of a
character constant is the minimal type to which the constant value belongs.
(i.e., If the constant value is in the range `0X..0FFX', its type is
CHAR; otherwise, it is LONGCHAR.)
String constants are sequences of characters enclosed in single (')
or double (") quote marks; strings can also be represented using the
string concatenation operator `+' and a combination of characters or
string constants. String constants that consist solely of characters in the
range `0X..0FFX' are of type String, all others are of type
LongString. For example, the following is of type LongString:
Example:
CONST
aLongString = 0C0ACX + 0C6A9X + " " + 0C2E4X + 0D328X;
The string type LongString includes the values of type String
according to the following hierarchy:
LongString => String
This means that a string constant composed of CHAR range values can
be used in place of a constant string composed of LONGCHAR range
values. The expected implicit type conversion rules (similar to integer
types) apply to character values and string constants, too; that is, it is
possible to compare a LongString with a String, or assign a
String to an ARRAY OF LONGCHAR variable. (CHAR and
Strings are promoted to LONGCHAR and LongString when
necessary.)
The following predeclared function procedures support these additional operations:
Name Argument type Result type Function
CAP(x) CHAR CHAR if x is a letter,
LONGCHAR LONGCHAR corresponding capital
letter; otherwise,
identity (see note below).
LONG(x) CHAR LONGCHAR identity
String LongString identity
LONGCHR(x) integer type LONGCHAR long character with
ordinal value x
ORD(x) LONGCHAR LONGINT ordinal value of x
SHORT(x) LONGCHAR CHAR projection
LongString String projection
Please note:
CAP(x) maps lower case letters in the ISO-Latin-1 range to the
capital counterparts and produces identity for all other characters (meaning
that most Unicode characters are mapped to identity). There are two
exceptions: `U+00DF' (LATIN SMALL LETTER SHARP S) whose uppercase
version is the two letter sequence `SS', and `U+00FF' (LATIN SMALL
LETTER Y WITH DIARESIS) whose capital version is outside the ISO-Latin-1
range (it has the code `U+0178'); these two characers are also mapped
onto themselves.
SHORT(x), where x is of type LONGCHAR, can result in
overflow, which triggers a compilation or run-time error. The result of an
operation that causes an overflow, but is not detected as such, is
undefined.
The predeclared procedure COPY(x, v) also supports
LongStrings:
Name Argument type Function
COPY(x, v) x: character array, string v := x
v: character array
Note that, COPY(x, v) is invalid if x is of type ARRAY
OF CHAR, and v is of type LongString or ARRAY OF
LONGCHAR.
Characters, string constants, and arrays of characters are assignment compatible as follows:
An expression e of type Te is assignment compatible
with a variable v of type Tv if one of the following
conditions hold:
Tv and Te are character types and Tv includes
Te;
Tv is an ARRAY n OF LONGCHAR, Te is a string constant
with m characters, and m < n;
Tv is an ARRAY n OF CHAR, Te is a String with
m characters, and m < n;
String constants and arrays of characters are array compatible as follows:
An actual parameter a of type Ta is array compatible with a
formal parameter f of type Tf if
f is a value parameter of type ARRAY OF CHAR and a is a
String, or
f is a value parameter of type ARRAY OF LONGCHAR and a
is a string constant.
Character and string types are expression compatible as follows:
Operator First operand Second operand Result type
= # < <= > >= character type character type BOOLEAN
string type string type BOOLEAN
Please note: Implicit type conversion rules apply to both character values and constant strings. They do not apply to character arrays.
The following modules provide support for LONGCHAR and
LongString:
LONGCHAR and LongString as binary data.
LONGCHAR and LongString.
Additional mapper classes, which are extended from TextRider and
UnicodeRider, can be added to handle additional 8- and 16-bit encodings.
These classes are used to map from another encoding (e.g., "KSC5601", a
standard Korean character encoding) to Unicode or Latin-1 (as appropriate),
and vice versa. Here "encoding" means both the encoding of n bit
values in byte streams and translation of character codes between the
two standards. (See section Standard Mappers.)
In addition to the standard set type SET, module SYSTEM
defines several other set types: SET8, SET16, SET32,
and--optionally---SET64. The set type SET64 is required to
be implemented only by compilers for 64-bit target architectures. Note that
the type SET32 is an alias for the standard type SET.
============================================================= Name Size in Bytes MIN(T) MAX(T) ------------------------------------------------------------- SET8 1 0 7 SET16 2 0 15 SET32 4 0 31 SET64 8 0 63 -------------------------------------------------------------
To assign a value to a variable of one of these set types, prefix the set constructor with the type name.
Example:
set8 := SYSTEM.SET8{0..3};
set16 := SYSTEM.SET16{0, 8..x};
Operations for the additional set types are defined just like their
counterparts for SET. However, both sides of a dyadic operator must
be of the same type. That is, set operations such as intersection, union,
and so on, are only permitted if both the left and right side are of the
same set type.
The predefined procedures EXCL, INCL, and the membership test
IN are applicable to all set types. The predefined function
LONG converts a set value to the identic value of the next larger set
type (if available), whereas SHORT converts to the next smaller set
(if available). SHORT will remove all elements that are not included
in the smaller type.
Variables of type SET8 can be assigned to variables of type BYTE.
If a formal variable parameter is of type BYTE, then the corresponding
actual parameter can be of type SET8.
The Oberon-2 language specification does not give fixed values for the size of its basic data types. Any actual language implementation, in the form of a compiler, has to define the size of its types as fixed values. Most modern Oberon-2 compilers use the type sizes listed in the table in section Specifications. However, problems arise when changing to a 64-bit target system.
Unfortunately, the pseudo module SYSTEM as defined in the language
report uses an integer type, LONGINT, to represent memory addresses.
For a 64-bit machine a 64-bit integer type is required to represent
addresses. Defining LONGINT to be a 8 byte type on those machines
introduces a number of problems:
LONGINT grows, possibly
doubling the amount of memory required to store them.
LONGINT. Accordingly, all modules using these interfaces also need
to be rewritten.
Therefore, OOC introduces a new integer type called HUGEINT, and
leaves existing integer types unchanged. Values of type HUGEINT are
stored in 8 bytes, with a minimum value of `-2^63' and a maximum of
`2^63-1'. This extends the integer type inclusion hierachy at its
upper end. All integer operations also apply to HUGEINT.
The integer type HUGEINT is required to be implemented only by
compilers for 64-bit target architectures; that is, for targets whose
address size is 8 bytes. HUGEINT is optional for all other
implementations of OOC. 64-bit systems are also required to implement the
set type SYSTEM.SET64, which is of the same size as HUGEINT.
The pseudo module SYSTEM declares the type ADDRESS. On 32-bit
systems, it is an alias for LONGINT, and on 64-bit machines, it is an
alias for HUGEINT. The following relation holds for all systems:
SIZE(SYSTEM.ADDRESS) = SIZE(POINTER) = SIZE(SYSTEM.PTR).
The predefined function ADR returns a value of type ADDRESS.
Likewise, the type of the address arguments of MOVE, GET,
PUT, and BIT is ADDRESS. Note that, for 32-bit
systems, this corresponds to the language report because ADDRESS
is an alias for LONGINT on these systems.
Language extensions are implemented by OOC in the framework of system flags (see section System Flags). Instead of extending the syntax of the language Oberon-2, additional flags are defined to specify non-standard attributes of data types and declarations. To enable any OOC specific extensions for a module, the module must be declared with the `OOC_EXTENSIONS' flag:
MODULE Foo [OOC_EXTENSIONS];
To convert a module employing OOC specific extensions to standard Oberon-2, use the command oocn --filter --strip-system-flags (see section Source Code Analysis and Transformation).
Please note: This section was contributed by Stewart Greenhill.
Abstract classes are an important technique for separating interfaces from their implementations. Abstract classes allow the methods of a class to be defined without specifying an implementation of those methods. An abstract class can never be instantiated because its behaviour is undefined. Only concrete (non-abstract) extensions of an abstract class can be instantiated; these must define an implementation for each abstract method.
The Oberon-2 language lacks a mechanism for abstract classes, so typically,
abstract classes are simulated using concrete types: HALT statements
are used in the body of "abstract" methods, so that a run-time exception
occurs if the class is accidentally instantiated and an abstract method is
called. This is not a safe solution to the problem. To preserve the
semantics of abstract classes, it is expected that the compiler will:
This section describes an extension to the OOC compiler that supports these essential features. The extension does not change the Oberon-2 language itself, but instead uses the `ABSTRACT' flag to distinguish abstract classes and methods. Compilers that do not recognise the flag should ignore it and compile the code as normal. Code that compiles correctly using the OOC abstract classes model should work correctly under other Oberon-2 compilers.
The following rules describe the semantics of abstract classes, as implemented by the compiler:
BEGIN statement.
These rules ensure that non-implemented methods are never called, which ensures safety of the system. OOC permits classes to be partially abstract, that is, to have a mixture of concrete and abstract methods. OOC also permits concrete classes to have abstract extensions. The following example illustrates some of the features of this mechanism:
MODULE Abstract [OOC_EXTENSIONS]; TYPE PtrA = POINTER TO A; A* = RECORD [ABSTRACT] END; (* abstract type *) PtrB = POINTER TO B; B* = RECORD (A) END; (* concrete extension of A *) VAR vA: A; (* illegal! *) pA: PtrA; pB: PtrB; (* abstract definition of F *) PROCEDURE (VAR v: A) [ABSTRACT] F*; END F; (* implementation of F *) PROCEDURE (VAR v: B) F*; BEGIN END F; BEGIN NEW(pA); (* illegal! *) NEW(pB); pB.F; pB.F^; (* illegal! *) pA := pB; pA.F; END Abstract.
Another example of the use of abstract classes is the module `Channel' (see section Module Channel). All record types and most of the methods defined in this module are declared abstract.
When invoking a procedure, local copies are created for all arguments that are passed to value parameters. If the value parameter is a large array or an open array parameter, allocating storage for the variable and creating the copy can be expensive in both stack space and execution time. During compile-time, OOC tries to detect situations where it can avoid the local copy and access the original variable instead. The algorithm used cannot detect all situations where this is possible; it must conservatively choose in favor of a local copy if there is the slightest chance that omitting it would change the semantics of the procedure. This can be overidden by the programmer by setting the flag `NO_COPY' for the value parameter in question. The flag can be applied only to value parameters of structured type.
Example:
MODULE Out [OOC_EXTENSIONS];
PROCEDURE String* (s[NO_COPY]: ARRAY OF CHAR);
BEGIN
writer. WriteString (s)
END String;
END Out.
Because the procedure `Out.String' is just a wrapper for the channel's `WriteString' function, there is no need to create a local copy of any arguments passed to parameter `s'.
If used without care, the flag `NO_COPY' can create a parameter that is neither variable nor value. It should be applied only in situations where
An example for a situation where `NO_COPY' cannot be used is OOC's implementation of `Strings.Append'. If the parameter `source' would be marked with `NO_COPY', the procedure would break for calls that use the same variable for both `source' and `destination'.
During the execution of a program, it is possible for an operation to be performed that is either illegal or that leaves the program in an undefined state. This sort of operation is usually not intended by the programmer, and should therefore be brought to his attention. For this reason, a typical Oberon-2 run-time system provides mechanisms to detect illegal or undefined operations and report them to the user.
For OOC, an invalid operation is signaled by the facilities defined in
module Exception (see section Exception Handling). If detected, such an
operation will raise an exception of source Exception.runtime with an
exception number from the list defined in Exception. If the raised
exception is not handled by the program, it will cause the run-time system
to emit an appropriate error message and terminate the program.
The following is a list of exception numbers, which indicate run-time errors
and are applicable to source Exception.runtime:
NIL. Such a dereference can happen as a result of
one of the following:
^ as part of a designator
IN, INCL, or
EXCL is given an element value that is not a member of the set type
being used. Note that an element of the standard set type SET has to
be in the range `0..31'.
RETURN statement.
CASE does not match any label of
the listed branches and there is no ELSE part.
NIL.
WITH statement is valid and there
is no ELSE part.
NEW is invoked for an open array
pointer type with a negative value for one of the array's lengths. This
run-time check cannot be disabled.
NEW or SYSTEM.NEW failed because of
insufficient memory. NEW guarantees that the pointer variable will
refer to a valid heap object after completion; if this fails, an
outOfMemory exception is raised. This run-time check cannot be
disabled.
During compilation, each run-time check (except the checks performed for
NEW) can be enabled or disabled by a pragma variable. If the
corresponding pragma variable is set to TRUE, appropriate code is
generated to catch the class of errors in question. If it is FALSE,
and such an error occurs at run-time, it might go by unnoticed, lead to
unexpected program results, or even cause program termination in the form of
a core dump.
The performance impact of run-time checks depends mostly on the target
system. Testing for an illegal operation might be done efficiently on one
target architecture, but be very costly on another. Not all run-time checks
are supported by all OOC compilers. For example, oo2c does not
support overflow checks for integer or real operations.
A special note should be made about operations on strings. Recall that the
Oberon-2 definition of a string is a character array that contains the
character 0X as a terminator. Oberon-2 string operations
(COPY, `=', `#', `<', etc.) are undefined for
unterminated character arrays. Here OOC follows the Oakwood guidelines, and
does not provide any checks to guard against such undefined operations. For
instance, if the source of a COPY operation is not terminated by a
character 0X, the contents of the target array are undefined after
completion.
Go to the first, previous, next, last section, table of contents.