Exploring Context Free Grammars

NATIONAL CAPODISTRIAN UNIVERSITY OF ATHENS

FACULTY OF SCIENCE

DEPARTMENT OF INFORMATICS

Development and Exploration Tool

for Context-Free Grammars

created with Java

Greek Version - Ελληνική Έκδοση

Introductory Concepts

The grammars which are handled by this tool follow the BNF standarization. The only difference is that the symbolism used by the YACC meta-compiler has been preferred not only for reasons of simplicity but, also in order to keep the tool compatible with some future extensions which are currently under consideration. This decision is also due to the majot approval YACC has been enjoying from the Informatics community.

Grammar symbols are divided into terminals and nonterminals:

Terminals are the basic symbols from which grammar strings are formed.
Nonterminals are syntactic variables that denote sets of strings, which help define the language generated by the grammar. They also impose an hierarchical structure on the language that is useful for both syntax analysis and translation. In a grammar one nonterminal is distinguished as the start symbol and the set of strings it denotes is the language defined by the grammar.

The productions of a grammar specify the way in which the terminals and nonterminals can be combined to form strings. Each production consists of a nonterminal, followed by a colon ":", followed by a string of nonterminals and terminals. Productions with the same lefthand part are grouped into a rule which has the common lefthand part and as right hand part the righthand parts of the productions divided by "|".

The following grammar that describes numerical expressions is an example:

E : E + T | T

T : T * F | F

F : ( E ) | id

In this grammar, the rules are the three lines of the grammar and the productions are the following:

E : E + T

E : T

T : T * F

T : F

F : ( E )

F : id

In this example the set of the terminals is: {+, *, (, ), id}, while the nonterminals are : {E, T, F}. The start symbols is E (the left hand part of the first rule).

Although the dominative trend is that of writing the terminals of a language with small latin letters and the nonterminals with capitals, the only convention respected by this tool is that terminals start with a latin letter.

The order of the productions within a rule is insignificant. In the contrary, the order of the rules is important, not only in the process of the initial symbol recognition but also in order to recognize and eliminate left recursions, as we will see next.

There is a principle that symbols appearing at the left part of a rule are acceptable only if they have previously appeared in the right hand part of a preceding rule.

For practical reasons, the division of symbols with at least one space is imposed in order to enable the tool to distinguish the symbols of the grammar. On the other hand, productions of a grammar will be accepted for further exploration only if they are grouped in rules. The empty string is denoted with null.

If these principles are taken into account, the tool can be used to recognize the symbols of a grammar.

Grammatical Properties and Transformations

A grammar is left recursive if it has a nonterminal A such that there is a derivation of the string A a through one (immediate left recursion) or multiple (non immediate left recursion) production of the grammar. If a is the empty string, then the recursion is a cycle. In case of a recursive grammar without cycles, the recursions can be eliminated.

Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing.

The elimination of the recursions together with left factoring are needed in order to proceed to the next stages of syntax analysis.

Enhancements of recognition for such properties along with exploration of the necessary transformations are provided by the tool.

Nonrecursive Predictive Parser Operation

After the recognition of the above properties and the application of the necessary transformations to the considered grammar, it is possible to examine if a string can be produced by the grammar. This end is served by the nonrecursive predictive parser with respect to the contents of the parsing table, which is a two-dimensional array M[A,a], where A is a nonterminal ,and a is a terminal or the symbol $.

This tool enables the user to examine if a string can be produced by the considered grammar, which means to find out if this string belongs to the language described by the grammar.

Parsing Table Construction

The parsing table M is constructed with respect to the values of two functions, FIRST and FOLLOW.

If a is any string of grammar symbols, let FIRST(a) be the set of terminals that begin the strings derived from a. If the empty string can be derived from a as well, then it is also in FIRST(a).

Define FOLLOW(A), for nonterminal A , to be the set of terminals a that can appear immediately to the right of A in some sentential form. that is, the set of terminals a such that there exists a derivation that starts from the start symbol S and concludes to a string of the form c A a b, for some c and b. If A can be the rightmost symbol in some sentential form, then $ is in FOLLOW(A).

The computation of the values for both functions (FIRST and FOLLOW) as well as the construction of the parsing table M are described in detail by the tool.

Vassilis Apostolopoulos- Thanos Varias
July 8th, 1997

The source of the above data is the book "Compilers: Principals Techniques and Tools" by Aho, Sethi êáé Ullman, also known as "dragonbook".
The described tool was created by the two signees from October 1996 to July 1997 as a final year project submitted for the degree of the BSc under the supervision and cooperation of the Assistant Professor dr. J.Y.Cotronis and the the technical support and contribution of mr. Dimitrios Theotokis, PhD student, Research Fellow of the University of Athens and also a Certified Java Instructor.