NATIONAL CAPODISTRIAN UNIVERSITY OF ATHENS
FACULTY OF SCIENCE
DEPARTMENT OF INFORMATICS
Development and Exploration
Tool
for Context-Free Grammars
created with Java
The productions of a grammar specify the way in which the terminals
and nonterminals can be combined to form strings. Each production consists
of a nonterminal, followed by a colon ":", followed by a string
of nonterminals and terminals. Productions with the same lefthand part
are grouped into a rule which has the common lefthand part and as
right hand part the righthand parts of the productions divided by "|".
The following grammar that describes numerical expressions is an example:
E : E + T | T
T : T * F | F
F : ( E ) | id
In this grammar, the rules are the three lines of the grammar and the productions are the following:
E : E + T
E : T
T : T * F
T : F
F : ( E )
F : id
In this example the set of the terminals is: {+, *, (,
), id}, while the nonterminals are : {E, T,
F}. The start symbols is E (the left hand part of the first
rule).
Although the dominative trend is that of writing the terminals of a language
with small latin letters and the nonterminals with capitals, the only convention
respected by this tool is that terminals start with a latin letter.
The order of the productions within a rule is insignificant. In the contrary,
the order of the rules is important, not only in the process of the initial
symbol recognition but also in order to recognize and eliminate left recursions,
as we will see next.
There is a principle that symbols appearing at the left part of a rule
are acceptable only if they have previously appeared in the right hand
part of a preceding rule.
For practical reasons, the division of symbols with at least one space
is imposed in order to enable the tool to distinguish the symbols of the
grammar. On the other hand, productions of a grammar will be accepted for
further exploration only if they are grouped in rules. The empty string
is denoted with null.
If these principles are taken into account, the tool
can be used to recognize the symbols of a grammar.
Grammatical Properties and Transformations
A grammar is left recursive if it has a nonterminal A such that
there is a derivation of the string A a through one (immediate
left recursion) or multiple (non immediate left recursion) production
of the grammar. If a is the empty string, then the recursion is a cycle.
In case of a recursive grammar without cycles, the recursions can be eliminated.
Left factoring is a grammar transformation that is useful for producing
a grammar suitable for predictive parsing.
The elimination of the recursions together with left factoring are needed
in order to proceed to the next stages of syntax analysis.
Enhancements of recognition for such properties along
with exploration of the necessary transformations are provided by the tool.
Nonrecursive Predictive Parser Operation
After the recognition of the above properties and the application of the
necessary transformations to the considered grammar, it is possible to
examine if a string can be produced by the grammar. This end is served
by the nonrecursive predictive parser with respect to the contents of the
parsing table, which is a two-dimensional array M[A,a], where A is a nonterminal
,and a is a terminal or the symbol $.
This tool enables the user to examine if a string
can be produced by the considered grammar, which means to find out if this
string belongs to the language described by the grammar.
Parsing Table Construction
The parsing table M is constructed with respect to
the values of two functions, FIRST and FOLLOW.
If a is any string of grammar symbols, let FIRST(a) be the set of terminals
that begin the strings derived from a. If the empty string can be derived
from a as well, then it is also in FIRST(a).
Define FOLLOW(A), for nonterminal A , to be the set of terminals
a that can appear immediately to the right of A in some sentential form.
that is, the set of terminals a such that there exists a derivation that
starts from the start symbol S and concludes to a string of the form c
A a b, for some c and b. If A can be the rightmost symbol in some
sentential form, then $ is in FOLLOW(A).
The computation of the values for both functions (FIRST and FOLLOW) as
well as the construction of the parsing table M are described in detail
by the tool.
Vassilis Apostolopoulos- Thanos
Varias
July 8th, 1997