Exploring Context Free Grammars

NATIONAL CAPODISTRIAN UNIVERSITY OF ATHENS

FACULTY OF SCIENCE DEPARTMENT OF INFORMATICS

Development and Exploration Tool for Context-Free Grammars created with Java

Greek Introduction - Ελληνική Εισαγωγή

Introductory Concepts

The grammars which are handled by this tool follow the BNF standarization. The only difference is that the symbolism used by the YACC meta-compiler has been preferred not only for reasons of simplicity but, also in order to keep the tool compatible with some future extensions which are currently under consideration. This decision is also due to the majot approval YACC has been enjoying from the Informatics community.

Grammar symbols are divided into terminals and nonterminals:

Terminals are the basic symbols from which grammar strings are formed.

Nonterminals are syntactic variables that denote sets of strings, which help define the language generated by the grammar. They also impose an hierarchical structure on the language that is useful for both syntax analysis and translation. In a grammar one nonterminal is distinguished as the start symbol and the set of strings it denotes is the language defined by the grammar.

The productions of a grammar specify the way in which the terminals and nonterminals can be combined to form strings. Each production consists of a nonterminal, followed by a colon ":", followed by a string of nonterminals and terminals. Productions with the same lefthand part are grouped into a rule which has the common lefthand part and as right hand part the righthand parts of the productions divided by "|".

The following grammar that describes numerical expressions is an example:

E : E + T | T

T : T * F | F

F : ( E ) | id

In this grammar, the rules are the three lines of the grammar and the productions are the following:

E : E + T

E : T

T : T * F

T : F

F : ( E )

F : id

In this example the set of the terminals is: {+, *, (, ), id}, while the nonterminals are : {E, T, F}. The start symbols is E (the left hand part of the first rule).

Although the dominative trend is that of writing the terminals of a language with small latin letters and the nonterminals with capitals, the only convention respected by this tool is that terminals start with a latin letter.

The order of the productions within a rule is insignificant. In the contrary, the order of the rules is important, not only in the process of the initial symbol recognition but also in order to recognize and eliminate left recursions, as we will see next.

There is a principle that symbols appearing at the left part of a rule are acceptable only if they have previously appeared in the right hand part of a preceding rule.

For practical reasons, the division of symbols with at least one space is imposed in order to enable the tool to distinguish the symbols of the grammar. On the other hand, productions of a grammar will be accepted for further exploration only if they are grouped in rules. The empty string is denoted with null.

If these principles are taken into account, the tool can be used to recognize the symbols of a grammar.

Grammar Symbols Recognition-Syntax Tree-Left Productions

Grammatical Properties and Transformations

A grammar is left recursive if it has a nonterminal A such that there is a derivation of the string A a through one (immediate left recursion) or multiple (non immediate left recursion) production of the grammar. If a is the empty string, then the recursion is a cycle. In case of a recursive grammar without cycles, the recursions can be eliminated.

Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing.

The elimination of the recursions together with left factoring are needed in order to proceed to the next stages of syntax analysis.

Enhancements of recognition for such properties along with exploration of the necessary transformations are provided by the tool.

Grammar Properties & Transformations

Nonrecursive Predictive Parser Operation

After the recognition of the above properties and the application of the necessary transformations to the considered grammar, it is possible to examine if a string can be produced by the grammar. This end is served by the nonrecursive predictive parser with respect to the contents of the parsing table, which is a two-dimensional array M[A,a], where A is a nonterminal ,and a is a terminal or the symbol $.

This tool enables the user to examine if a string can be produced by the considered grammar, which means to find out if this string belongs to the language described by the grammar.

Predictive Parser Operation

Parsing Table Construction

The parsing table M is constructed with respect to the values of two functions, FIRST and FOLLOW.

If a is any string of grammar symbols, let FIRST(a) be the set of terminals that begin the strings derived from a. If the empty string can be derived from a as well, then it is also in FIRST(a).

Define FOLLOW(A), for nonterminal A , to be the set of terminals a that can appear immediately to the right of A in some sentential form. that is, the set of terminals a such that there exists a derivation that starts from the start symbol S and concludes to a string of the form c A a b, for some c and b. If A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A).

The computation of the values for both functions (FIRST and FOLLOW) as well as the construction of the parsing table M are described in detail by the tool.

Daveloping & Exploring Grammars

Vassilis Apostolopoulos - Thanos Varias - Nikolaos Theocharis
November 1998

The source of the above data is the book "Compilers: Principals Techniques and Tools" by Aho, Sethi και Ullman, also known as "dragonbook".
The described tool was created by the three signees from October 1996 to November 1998 as a final year project submitted for the degree of the BSc under the supervision and cooperation of the Assistant Professor dr. J.Y.Cotronis and the the technical support and contribution of mr. Dimitrios Theotokis, PhD student, Research Fellow of the University of Athens and also a Certified Java Instructor.