CHAPTER 1: SYNTAX AND SEMANTICS

A programming language has two main aspects: (A) the permitted sequences of symbols making up a program, and (B) their meanings. (A) is often called the syntax and (B) the semantics of the language. However, the notion of "semantics" or "meaning" in this context is often ambiguous as sometimes people who talk about the meaning of a program are referring to what objects it creates and manipulates in the computer and sometimes they are referring to things in the world that the program models or represents. For example, if a program builds a database of information about rooms using a list of lists, then a portion of the program has one meaning insofar as it specifies which lists are constructed or examined in the computer, and a totally different meaning insofar as the information manipulated is about rooms and their measurements, i.e. things that have nothing to do with the computer.

Internal and External Semantics

It might be useful to call the first "internal semantics" and the second "external semantics". The internal semantics will be concerned with manipulation of symbolic structures in the machine. These, like the external programming language, will have a SYNTAX, i.e. there will be rules specifying which structures can be built and how they can be manipulated. And these internal structures may themselves have a semantics, insofar as they refer to things in the world. In that case programs have a syntax and internal and external semantics. The internal structures are the internal semantics of the programs, but they too have a syntax and an external semantics. Generally a programming language is defined in terms of its syntax and its internal semantics. It's up to the user to determine how to give it an external semantics, by applying the language to different sorts of problems.

Compile Time VS Run Time Processes

There is another "imperative" aspect of programming language constructs. So far we've mentioned the internal actions that are produced when the commands are obeyed by the computer, i.e. at "run time". There is an earlier process that occurs when your instructions are read in by the Pop-11 system, e.g. from a file on the disk, or from the editor buffer, or from what you type at a terminal. Pop-11 has a "compiler" which reads the commands, analyses them and translates them into "low level" machine instructions which will later be obeyed by the computer. So the Pop-11 expressions cause additional processes to occur BEFORE the program is run. These processes include:

Lexical analysis - breaking the "input stream of characters" into separate symbols, namely words, numbers and strings.

Syntactic analysis - working out how to analyze complex sequences of symbols, like the analysis of (x + y)*(y - x) given above.

Code generation - i.e. production of the compiled procedure containing machine instructions.

This is a slight oversimplification, but it will suffice for most purposes. These processes happen at "compile time", i.e. when the program text is being read in, analyzed, and translated into a machine code version. When the program is later obeyed and the machine code instructions executed, things happen at "run time". We could think of the internal semantics of the program as having two aspects: the compile time semantics, which determine the processes that translate source code into machine code, and the run time semantics which determine what happens later when the machine code instructions are executed.

Understanding the difference between compile time and run time is important because different things can go wrong at those times. E.g. at compile time you can get syntactic errors, like leaving out a closing bracket or a semi colon. At run time you get "semantic" errors, like trying to add a number to a string, or trying to examine the 15th element of a list that has only 14 elements. Moreover, in languages like Lisp and Pop-11 whose syntax users can extend by defining new so-called macros or syntax words, the processes that occur at compile time can be modified by users, and doing this requires fairly detailed knowledge of what happens at compile time. This primer will not go into such details. Experts who wish to know more can read the online documentation in REF PROGLIST, REF ITEMISE, REF POPCOMPILE, and HELP MACRO.

Some languages have interpreters instead of compilers, and they do something different for the last stage, i.e. they build a structure that does not contain machine instructions, but symbols that can later be interpreted by another program, the interpreter, which performs actions under the control of the symbols being interpreted. In Pop-11, and many widely used languages, a compiler is used and the machine itself (the CPU, or central processing unit) is the interpreter, rather than another program.

Understanding all those processes is not essential for understanding how to design and develop programs, but it can help you design programs that are more efficient, and it can help you understand what goes wrong when there are obscure errors, especially in a language like Pop-11 that does not have a fixed syntax, but allows you to extend it by defining new forms of expressions and imperatives.

Itemisation rules

We have so far assumed that we can treat Pop-11 programs as made of numbers and words which can be combined to form expressions, or imperatives, or sequences thereof. But what is actually typed in, or read in from a file is a sequence of characters. For instance the following is a sequence of five characters, which has to be broken up into four items, the number 3, the word "+" the number 55 and the word "=>" :

3+55=>

The Pop-11 `itemiser' applies quite complex rules to decide how to divide up the stream of characters into meaningful chunks. For instance, if you type:

**[a little list,and,65]
this is read as 11 items:

[ a little list , and , 6 5 ]**
and in fact they will be interpreted as an instruction to build a list containing nine items: seven words and two numbers. (Non alphabetic characters can also be used to form words, e.g. "+++", "##@##".)

To do this Pop-11 needs `lexical' rules saying which sorts of characters can be joined up with which, since you do not have to use spaces to separate things. Besides things like spaces, tabs and newlines, which are normally ignored by Pop-11, there are the following types of characters:

Numeric: 0 1 2 3 4 5 6 7 8 9

Alphabetic: a b c d e f g ... z

A B C D E F G ... Z

Signs: ! # $ & + - : < = > ? @ \ ^ | ~ / *

Underscore: _

Separators: ; " % ( ) , . [ ] { }

String quote: '

Character quote: `

Word formation

Unfortunately, Pop-11 has fairly complex rules for grouping characters in the text input stream into words, although words created by programs can contain arbitrary characters.

During program compilation, a letter followed by a series of letters and numbers will be formed into a single word, e.g. list1, list2. But if a text item starts with a number, then as soon as a non-number is reached (e.g. the "l" in "1list") Pop-11 assumes that it should insert a break. I.e. the text is separated into a number followed by a word. This can be shown by typing in the following instructions to create and print out lists:

[list3] =>

[list3]

[3list] =>

[3 list]

The second list is taken to have two elements, a number and a word. The first has a single word "list3".

A numeric character may be buried in the middle of a word which starts with letters, e.g. "list3a". Thus a word that starts with a letter can be followed by any combination of numbers and letters.

The word quote symbol """ can be used to tell Pop-11 that you wish to refer to a word, instead of using it as the name of something else (i.e. as variable):

"list3" =>
** list3

But if you give it an illegal combination of characters you will get an error:

"3list" =>
;;; MISHAP: IQW INCORRECT QUOTED WORD
;;; INVOLVING: 3

You can also make a word out of certain non-alpha-numeric characters, i.e. sign characters:

"++*::\/^" =>
** ++::\/^

But you cannot mix letters and sign characters:

"+x" =>
"+x" =>

and in a list they will be separated into two:
[+x] =>
[+x] =>

However, the underscore character can be used to join alphanumeric type characters to sign characters, e.g. here are two lists each containing only one word:
[+_x] =>
[+_x]
[apple_#@$=>] =>
[apple_#@$=>]

The underscore can also be used as a convenient way of producing long names which are readable. E.g. the following is the name of a system variable:
pop_readline_prompt

Using the underscore to join letters and sign characters

In general a sequence of characters made of "sign" characters and letters will be broken at the point where the two sorts of characters meet, unless they are joined by an underscore symbol "_", e.g.

fast_++
++_lists_++

Two of the sign characters `/` and `` play a special role in that they can be combined to form the `comment brackets', explained aboved. So "/" and "/" cannot be used as ordinary Pop-11 words.

The separator characters cannot be used to join up with anything else, except for the use of `.` in decimal numbers. This is because separators play a special role in the syntax of Pop-11. E.g. the following is a list of seven items

[(.,)a"!] =>
** [( . , ) a " !]

The semicolon, though normally a separator which marks the end of an imperative has a special role if repeated three times without anything between: it marks an `end of line' comment as explained previously.

6 * 6 => ;;; this bit on the right is ignored!
** 36

Some of the characters have special roles which will not be explained fully till later. In particular `%' can be used both in creating procedure closures by `partial application' and in `unquoting' part of a list expression. (The file TEACH PERCENT gives a tutorial introduction to both.)

Strings can contain arbitrary characters

Strings, created using the string quote character can contain arbitrary characters:

'this is a +++ string %&$%$ of rubbish!!!'
except that if you wish to include the string quote itself in the string it must be preceded by the backslash character \ to indicate that it does not mark the end of the string. Here is a string containing the string quote:
'isn\'t it' =>
** isn't it
Note that strings are normally printed without the outer quotes. To make the quotes appear, do true -> pop_pr_quotes;

Character quotes and string quotes

Characters themselves are represented by positive integers less than 256. Since it is difficult to remember which number represents which character (the so called `ASCII code'), the character quote can be used to tell Pop-11 to read a character as representing the number. The character quote, sometimes referred to as the "backquote" is the backward sloping single quote character. It should not not be confused with the forward sloping (sometimes displayed as vertical) single quote character used to begin and end string expressions. Depending on the printer used the string and character quotes in this document may have different appearances.

Here is the character quote symbol: `
Here is the string quite symbol: '
Unfortunately neither symbol has a predictable location on keyboards: they appear in different places on different keyboards.
Here are some examples using the character quote to represent characters (as integers) without remembering their integer values. The letter `A` has the code 65 and the numerals start from 48:
`A` => 65
`B` => 66
;;; lower case codes are different
`a` => 97
`0` => 48
`5` => 53
If you wish to include non-printing characters in a string, see the details in HELP ASCII. In particular you can use the following
\s = a space
\t = a tab
\n = a newline
\r = the return character (ascii 13)

'\nA string\n\twith text\n\t\s\son three lines' =>

A string
with text
on three lines

Double quotes with single quotes can form arbitrary words

If you really need to have a word containing arbitrary characters you can create it by putting word quotes around the corresponding string.

For example
vars funny_word = "'A word with spaces and junk:&=%][)))'";

isword(funny_word) =>

funny_word =>
A word with spaces and junk:&=%][)))
However, you would not be able to use such a word as the name of a variable, since typing something like
vars 'A word with spaces and junk:&=%][)))' = 999;
will produce an error.
;;; MISHAP - vars STATEMENT: IDENTIFIER NAME EXPECTED
;;; INVOLVING: 'A word with spaces and junk:&=%][)))'
Back to Introduction
Go to Chapter 2
Go to Chapter 3
Go to Conclusion
Go to References