
See Our team
Wondering how we keep quality?
Got unsolved questions? Ask Questions
GATE
GMAT
CBSE
NCERT
Career
Interview
Railway
UPSC
NID
NIFT-UG
NIFT-PG
PHP
AJAX
JavaScript
Node Js
Shell Script
Research
COMPAILER DESIGN 10CS63 VTU NOTES UNIT-2
UNIT II: SYNTAX ANALYSIS - 1

Syntax Analyzer creates the syntactic structure of the given source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-free grammar (CFG). We
will use BNF (Backus-Naur Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not.
If it satisfies, the parser creates the parse tree of that program.
Otherwise the parser gives the error messages.
A context-free grammar
gives a precise syntactic specification of a programming language.
the design of the grammar is an initial phase of the design of a compiler.
a grammar can be directly converted into a parser by some tools.
Parser works on a stream of tokens.
The smallest item is a token.
Fig :Position Of Parser in Compiler model
SYMBOL TABLE
We categorize the parsers into two groups:
1. Top-Down Parser
2. the parse tree is created top to bottom, starting from the root.
1. Bottom-Up Parser
the parse is created bottom to top; starting from the leaves
Both top-down and bottom-up parsers scan the input from left to right (one
symbol at a time).
Efficient top-down and bottom-up parsers can be implemented only for subclasses
of context-free grammars.
LL for top-down parsing
LR for bottom-up parsing
Syntax Error Handling
Common Programming errors can occur at many different levels.
1. Lexical errors: include misspelling of identifiers, keywords, or operators.
2. Syntactic errors : include misplaced semicolons or extra or missing braces.
31
3. Semantic errors: include type mismatches between operators and operands.
4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Goals of the Parser
Report the presence of errors clearly and accurately
Recover from each error quickly enough to detect subsequent errors.
Add minimal overhead to the processing of correct programs.
Error-Recovery Strategies
Panic-Mode Recovery
Phrase-Level Recovery
Error Productions
Global Correction
Panic-Mode Recovery
On discovering an error, the parser discards input symbols one at a time until one
of a designated set of Synchronizing tokens is found.
Synchronizing tokens are usually delimiters.
Ex: semicolon or } whose role in the source program is clear and unambiguous.
It often skips a considerable amount of input without checking it for additional
errors.
Advantage:
Simplicity
Is guaranteed not to go into an infinite loop
Phrase-Level Recovery
A parser may perform local correction on the remaining input. i.e
it may replace a prefix of the remaining input by some string that allows the parser to
continue.
Ex: replace a comma by a semicolon, insert a missing semicolon
Local correction is left to the compiler designer.
It is used in several error-repairing compliers, as it can correct any input string.
Difficulty in coping with the situations in which the actual error has occurred
before the point of detection.
Error Productions
We can augment the grammar for the language at hand with productions that
generate the erroneous constructs.
Then we can use the grammar augmented by these error productions to
Construct a parser.
If an error production is used by the parser, we can generate appropriate error
diagnostics to indicate the erroneous construct that has been recognized in the
input.
Global Correction
We use algorithms that perform minimal sequence of changes to obtain a globally
least cost correction.
32
Given an incorrect input string x and grammar G, these algorithms will find a
parse tree for a related string y.
Such that the number of insertions, deletions and changes of tokens required to
transform x into y is as small as possible.
It is too costly to implement in terms of time space, so these techniques only of
theoretical interest.
Context-Free Grammars
Inherently recursive structures of a programming language are defined by a
context-free grammar.
In a context-free grammar, we have:
A finite set of terminals (in our case, this will be the set of tokens)
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form
A ?? ?? where A is a non-terminal and
?? is a string of terminals and non-terminals (including the empty
string)
A start symbol (one of the non-terminal symbol)
NOTATIONAL CONVENTIONS
1. Symbols used for terminals are :
Lower case letters early in the alphabet (such as a, b, c, . . .)
Operator symbols (such as +, *, . . . )
Punctuation symbols (such as parenthesis, comma and so on)
The digits(09)
Boldface strings and keywords (such as id or if) each of which represents
a single terminal symbol
2. Symbols used for non terminals are:
Uppercase letters early in the alphabet (such as A, B, C, )
The letter S, which when it appears is usually the start symbol.
Lowercase, italic names (such as expr or stmt).
3. Lower case greek letters, α, β, ?? for example represent (possibly empty)
strings of grammar symbols.
Example: using above notations list out terminals, non terminals and start symbol in
the following example
E ?? E + T | E T | T
T ?? T * F | T / F | F
F ?? ( E ) | id
Here terminal are +, -, *, / , (, ), id
Non terminals are E, T, F
Start symbol is E
Derivations
33
E ?? E+E
E+E derives from E
we can replace E by E+E
to able to do this, we have to have a production rule E??E+E in our
grammar.
E ?? E+E ?? id+E ?? id+id
A sequence of replacements of non-terminal symbols is called a derivation of
id+id from E.
In general a derivation step is
??A?? ?? ???? if there is a production rule A???? in our grammar
where ?? and ?? are arbitrary strings of terminal and nonterminal
symbols
??1 ?? ??2 ?? ... ?? ??n (??n derives from ??1 or ??1 derives ??n )
?? : derives in one step
?? : derives in zero or more steps
?? : derives in one or more steps
CFG Terminology
L(G) is the language of G (the language generated by G) which is a set of
sentences.
A sentence of L(G) is a string of terminal symbols of G.
If S is the start symbol of G then
?? is a sentence of L(G) iff S ?? ?? where ?? is a string of terminals of G.
If G is a context-free grammar, L(G) is a context-free language.
Two grammars are equivalent if they produce the same language.
S ?? ?? - If ?? contains non-terminals, it is called as a sentential form of G.
- If ?? does not contain non-terminals, it is called as a sentence of
G.
Derivation Example
E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)
OR
E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)
At each derivation step, we can choose any of the non-terminal in the sentential
form of G for the replacement.
If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
If we always choose the right-most non-terminal in each derivation step, this
derivation is called as right-most derivation.
Left-Most and Right-Most Derivations
Left-Most Derivation
E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)
34
Right-Most Derivation
E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)
We will see that the top-down parsers try to find the left-most derivation of the
given source program.
We will see that the bottom-up parsers try to find the right-most derivation of the
given source program in the reverse order.
Parse Tree
Inner nodes of a parse tree are non-terminal symbols.
The leaves of a parse tree are terminal symbols.
A parse tree can be seen as a graphical representation of a derivation.
Problems on derivation of a string with parse tree:
1. Consider the grammar S (L) | a
LL,S | S
i. What are the terminals, non terminal and the start symbol
ii. Find the parse tree for the following sentence
a. (a,a)
b. (a, (a, a))
c. (a, ((a,a),(a,a)))
iii. Construct LMD and RMD for each.
2. Do the above steps for the grammar S aS | aSbS | ?? for the string aaabaab
Ambiguity
A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
35
For the most parsers, the grammar must be unambiguous.
unambiguous grammar
unique selection of the parse tree for a sentence
We should eliminate the ambiguity in the grammar during the design phase of the
compiler.
An ambiguous grammar should be written to eliminate the ambiguity.
We have to prefer one of the parse trees of a sentence (generated by an ambiguous
grammar) to disambiguate that grammar to restrict to this choice.
EG:
Ambiguity (cont.)
stmt ?? if expr then stmt |
if expr then stmt else stmt | otherstmts
if E1 then if E2 then S1 else S2
stmt
if expr then stmt else stmt
E1 if expr then stmt S2
E2 S1
stmt
if expr then stmt
E1 if expr then stmt else stmt
E2 S1 S2
1 2
We prefer the second parse tree (else matches with closest if).
So, we have to disambiguate our grammar to reflect this choice.
The unambiguous grammar will be:
stmt ?? matchedstmt | unmatchedstmt
36
matchedstmt ?? if expr then matchedstmt else matchedstmt | otherstmts
unmatchedstmt ?? if expr then stmt |
if expr then matchedstmt else unmatchedstmt
Problems on ambiguous grammar:
Show that the following grammars are ambiguous grammar by constructing either
2 lmd or 2 rmd for the given string.
1. S S(S)S | ?? with the string ( ( ) ( ) )
2. S S+S | |SS | (S) |S* a with the string (a+a)*a
3. S aS | aSbS | ?? with the string abab
Ambiguity Operator Precedence
Ambiguous grammars (because of ambiguous operators) can be disambiguated
according to the precedence and associativity rules.
E ?? E+E | E*E | E^E | id | (E)
disambiguate the grammar
precedence: ^ (right to left)
* (left to right)
+ (left to right)
E ?? E+T | T
T ?? T*F | F
F ?? G^F | G
G ?? id | (E)
Left Recursion
A grammar is left recursive if it has a non-terminal A such that there is a
derivation.
A ?? A?? for some string ??
Top-down parsing techniques cannot handle left-recursive grammars.
So, we have to convert our left-recursive grammar into an equivalent grammar
which is not left-recursive.
The left-recursion may appear in a single step of the derivation (immediate leftrecursion),
or may appear in more than one step of the derivation.
Immediate Left-Recursion
A ?? A ?? | ?? where ?? does not start with A
?? eliminate immediate left recursion
A ?? ?? A
A ?? ?? A | ?? an equivalent grammar
In general,
A ?? A ??1 | ... | A ??m | ??1 | ... | ??n where ??1 ... ??n do not start with A
?? eliminate immediate left recursion
A ?? ??1 A | ... | ??n A
A ?? ??1 A | ... | ??m A | ?? an equivalent grammar
Example
37
Left-Recursion Problem
A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S ?? Aa | b
A ?? Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S ?? Aa ?? Sca or
A ?? Sc ?? Aac causes to a left-recursion
So, we have to eliminate all left-recursions from our grammar
Eliminate Left-Recursion Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai ?? Aj ??
by
Ai ?? ??1 ?? | ... | ??k ??
where Aj ?? ??1 | ... | ??k
}
- eliminate immediate left-recursions among Ai productions
}
Example2:
S ?? Aa | b
A ?? Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A ?? SdA | fA
A ?? cA | ??
for S:
- Replace S ?? Aa with S ?? SdAa | fAa
So, we will have S ?? SdAa | fAa | b
- Eliminate the immediate left-recursion in S
38
S ?? fAaS | bS
S ?? dAaS | ??
So, the resulting equivalent grammar which is not left-recursive is:
S ?? fAaS | bS
S ?? dAaS | ??
A ?? SdA | fA
A ?? cA | ??
Problems of left recursion
1. S S(S)S | ??
2. S S+S | |SS | (S) |S* a
3. S SS+ | SS* | a
4. bexpr bexpr or bterm | bterm
bterm bterm and bfactor | bfactor
bfactor not bfactor | (bexpr) | true | false
5. S (L) | a, L L,S | S
Left-Factoring
A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left-factored.
grammar a new equivalent grammar suitable for predictive parsing
stmt ?? if expr then stmt else stmt |
if expr then stmt
when we see if, we cannot now which production rule to choose to re-write stmt
in the derivation.
In general,
A ?? ????1 | ????2 where ?? is non-empty and the first symbols
of ??1 and ??2 (if they have one)are different.
when processing ?? we cannot know whether expand
A to ????1 or
A to ????2
But, if we re-write the grammar as follows
A ?? ??A
A ?? ??1 | ??2 so, we can immediately expand A to ??A
Left-Factoring Algorithm
For each non-terminal A with two or more alternatives (production rules) with a
common non-empty prefix, let say
A ?? ????1 | ... | ????n | ??1 | ... | ??m
convert it into
A ?? ??A | ??1 | ... | ??m
A ?? ??1 | ... | ??n
Left-Factoring Example1
A ?? abB | aB | cdg | cdeB | cdfB
??
39
A ?? aA | cdg | cdeB | cdfB
A ?? bB | B
??
A ?? aA | cdA
A ?? bB | B
A ?? g | eB | fB
Example2
A ?? ad | a | ab | abc | b
??
A ?? aA | b
A ?? d | ?? | b | bc
??
A ?? aA | b
A ?? d | ?? | bA
A ?? ?? | c
Problems on left factor
1. S iEtS | iEtSeS | a, E b 6. S 0S1 | 01
2. S S(S)S | ?? 7. S S+S | |SS | (S) |S* a
3. S aS | aSbS | ?? 8. S (L) | a, L L,S | S
4. S SS+ | SS* | a
5. bexpr bexpr or bterm | bterm 9. rexpr rexpr + rterm | rterm
bterm bterm and bfactor | bfactor rterm rterm rfactor |
rfactor
bfactor not bfactor | (bexpr) | true | false rfactor rfactor* | rprimary
rprimary a |b do both
leftfactor and left recursion
Non-Context Free Language Constructs
There are some language constructions in the programming languages which are
not context-free. This means that, we cannot write a context-free grammar for
these constructions.
L1 = { ??c?? | ?? is in (a | b)*} is not context-free
Declaring an identifier and checking whether it is declared or not later.
We cannot do this with a context-free language. We need semantic analyzer (which is
not context-free).
L2 = {anbmcndm | n??1 and m??1 } is not context-free
Declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters.
Top down parser
Recursive Descent parser Predictive parser
LL(1) parser LL(k) parser
40
First L stands for left to right scan
Second L stands for LMD
(1) stands for only one i/p symbol to predict the parser
(2) stands for k no. of i/p symbol to predict the parser
The parse tree is created top to bottom.
Top-down parser
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not
work, we backtrack to try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
no backtracking
efficient
Needs a special form of grammars (LL (1) grammars).
Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.
Recursive-Descent Parsing (uses Backtracking)
Backtracking is needed.
It tries to find the left-most derivation.
S ?? aBc
B ?? bc | b
S S
input: abc
a B c a B c
b c b
Predictive Parser
When re-writing a non-terminal in a derivation step, a predictive parser can
uniquely choose a production rule by just looking the current symbol in the input
string.
A ?? ??1 | ... | ??n input: ... a .......
current token
example
stmt ?? if ...... |
while ...... |
begin ...... |
for .....
When we are trying to write the non-terminal stmt, if the current token is if we
have to choose first production rule.
41
When we are trying to write the non-terminal stmt, we can uniquely choose the
production rule by just looking the current token.
We eliminate the left recursion in the grammar, and left factor it. But it may not
be suitable for predictive parsing (not LL(1) grammar).
Non-Recursive Predictive Parsing -- LL(1) Parser
Non-Recursive predictive parsing is a table-driven parser.
It is a top-down parser.
It is also known as LL(1) Parser.
LL(1) Parser
input buffer
our string to be parsed. We will assume that its end is marked with a
special symbol $.
output
a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
stack
contains the grammar symbols
at the bottom of the stack, there is a special end marker symbol $.
initially the stack contains only the symbol $ and the starting symbol S. $S
initial stack
when the stack is emptied (ie. only $ left in the stack), the parsing is
completed.
parsing table
a two-dimensional array M[A, a]
each row is a non-terminal symbol
each column is a terminal symbol or the special symbol $
each entry holds a production rule.
Constructing LL(1) Parsing Tables
Two functions are used in the construction of LL(1) parsing tables:
FIRST FOLLOW
42
FIRST(??) is a set of the terminal symbols which occur as first symbols in strings
derived from ?? where ?? is any string of grammar symbols.
if ?? derives to ??, then ?? is also in FIRST(??) .
FOLLOW(A) is the set of the terminals which occur immediately after (follow)
the non-terminal A in the strings derived from the starting symbol.
a terminal a is in FOLLOW(A) if S ?? ??Aa??
$ is in FOLLOW(A) if S ?? ??A
Compute FIRST for Any String X
If X is a terminal symbol FIRST(X)={X}
If X is a non-terminal symbol and X ?? ?? is a production rule ?? is in
FIRST(X).
If X is a non-terminal symbol and X ?? Y1Y2..Yn is a production rule
if a terminal a in FIRST(Yi) and ?? is in all FIRST(Yj) for j=1,...,i-1then a is in
FIRST(X).
if ?? is in all FIRST(Yj) for j=1,...,n then ?? is in FIRST(X).
If X is ?? FIRST(X)={??}
If X is Y1Y2..Yn if a terminal a in FIRST(Yi) and ?? is in all
FIRST(Yj) for
j=1,...,i-1 then a is in FIRST(X).
if ?? is in all FIRST(Yj) for j=1,...,n
then ?? is in FIRST(X).
Compute FOLLOW (for non-terminals)
If S is the start symbol $ is in FOLLOW(S)
if A ?? ??B?? is a production rule everything in FIRST(??) is
FOLLOW(B) except ??
If ( A ?? ??B is a production rule ) or ( A ?? ??B?? is a production rule and ?? is in
FIRST(??) )
everything in FOLLOW(A) is in
FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
LL(1) Parser Parser Actions
The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
43
parser looks at the parsing table entry M[X, a]. If M[X, a] holds a production
rule X??Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule X??Y1Y2...Yk to represent a step of the derivation.
4. none of the above error
all empty entries in the parsing table are errors.
If X is a terminal symbol different from a, this is also an error case.
Fall 2003 CS416 Compiler Design 13
Non-Recursive predictive parsing Algorithm
LL(1) Parser Example1
S ?? aBa LL (1) Parsing Table
B ?? bB | ??
FIRST FUNCTION
FIRST(S) = {a} FIRST (aBa) = {a}
FIRST (B) = {b} FIRST (bB) = {b} FIRST (?? ) = {??}
FOLLOW FUNCTION
FOLLOW(S) = {$} FOLLOW (B) = {a}
a b $
S S ?? aBa
B B ?? ?? B ?? bB
stack input output
$S abba$ S ?? aBa
$aBa abba$
$aB bba$ B ?? bB
$aBb bba$
44
$aB ba$ B ?? bB
$aBb ba$
$aB a$ B ?? ??
$a a$
$ $ accept, successful completion
Outputs: S ?? aBa B ?? bB B ?? bB B ?? ??
Derivation(left-most): S??aBa??abBa??abbBa??abba
Example2
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
Soln:
FIRST Example
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
FIRST(F) = {(,id} FIRST(TE) = {(,id}
FIRST(T) = {*, ??} FIRST(+TE ) = {+}
FIRST(T) = {(,id} FIRST(??) = {??}
FIRST(E) = {+, ??} FIRST(FT) = {(,id}
FIRST(E) = {(,id} FIRST(*FT) = {*}
FIRST(??) = {??} FIRST((E)) = {(} FIRST(id) = {id}
FOLLOW Example
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
FOLLOW (E) = {$,)}
FOLLOW (E) = {$,)}
FOLLOW (T) = {+,), $}
45
FOLLOW (T) = {+,), $}
FOLLOW (F) = {+, *,), $}
Constructing LL (1) Parsing Table Algorithm
for each production rule A ?? ?? of a grammar G
for each terminal a in FIRST(??) add A ?? ?? to M[A, a]
If ?? in FIRST(??) for each terminal a in FOLLOW(A) add A
?? ?? to M[A, a]
If ?? in FIRST(??) and $ in FOLLOW(A) add A ?? ?? to M[A, $]
All other undefined entries of the parsing table are error entries.
Constructing LL (1) Parsing Table Example
E ?? TE FIRST (TE) = {(, id} E ?? TE into M [E, (] and M[E, id]
E ?? +TE
FIRST (+TE) = {+} E ?? +TE into M [E, +]
E ?? ?? FIRST (??) = {??} none
but since ?? in FIRST(??) and FOLLOW(E)={$,)}
E ?? ?? into M[E,$] and M[E,)]
T ?? FT FIRST (FT) = {(, id} T ?? FT into M[T,(] and M[T, id]
T ?? *FT FIRST (*FT) = {*} T ?? *FT into M [T,*]
T ?? ?? FIRST (??) = {??} none
but since ?? in FIRST(??)
and FOLLOW(T)={$, ) ,+}
T ?? ?? into M [T, $], M [T , )] and M [T,+]
F ?? (E) FIRST ((E)) = {(} F ?? (E) into M [F, (]
F ?? id FIRST (id) = {id} F ?? id into M [F, id]
id + * ( ) $
E E ??
TE
E ?? TE
E E ?? +TE E ?? ?? E ?? ??
T T ??
FT
T ?? FT
T T ?? ?? T ??
*FT
T ?? ?? T ?? ??
F F ?? id F ?? (E)
stack input output
$E id+id$ E ?? TE
$ET id+id$ T ?? FT
$E TF id+id$ F ?? id
$ E Tid id+id$
$ E T +id$ T ?? ??
$ E +id$ E ?? +TE
$ E T+ +id$
46
$ E T id$ T ?? FT
$ E T F id$ F ?? id
$ E Tid id$
$ E T $ T ?? ??
$ E $ E ?? ??
$ $ accept
Construct the predictive parser LL (1) for the following grammar and parse the
given string
1. S S(S)S | ?? with the string ( ( ) (
) )
2. S + S S | |*SS | a with the string
+*aa a
3. S aSbS | bSaS | ?? with the string
aabbbab
4. bexpr bexpr or bterm | bterm
bterm bterm and bfactor |
bfactor
bfactor not bfactor | (bexpr) |
true | false
string not(true or false)
5. S 0S1 | 01 string 00011
6. S aB | aC | Sd |Se
B bBc | f
C g
7. P Ra | Qba
R aba | caba | Rbc
Q bbc |bc string
cababca
8. S PQR
P a | Rb | ??
Q c | dP |??
R e | f string adeb
9. E E+ T |T
T id | id[ ] | id[X]
X E, E | E string id[id]
10. S (A) | 0
A SB
B ,SB |?? string (0, (0,0))
11. S a | ?? | (T)
T T,S | S
String (a,(a,a))
String ((a,a), ?? , (a),a)
LL (1) Grammars
A grammar whose parsing table has no multiply-defined entries is said to be LL
(1) grammar. one input symbol used as a look-head symbol do determine parser
action LL (1) left most derivation input scanned from left to right
The parsing table of a grammar may contain more than one production rule. In
this case, we say that it is not a LL (1) grammar.
A Grammar which is not LL (1)
S ?? i C t S E | a
E ?? e S | ??
C ?? b
FIRST(iCtSE) = {i} FOLLOW(S) = {$, e}
FIRST(a) = {a} FOLLOW (E) = {$, e}
FIRST(eS) = {e} FOLLOW(C) = {t }
FIRST(??) = {??}
FIRST(b) = {b}
47
a b e i t $
S S ?? a S ?? iCtSE
E E ?? e S
E ?? ??
E ?? ??
C C ?? b
two production rules for M[E, e]
Problem ambiguity
What do we have to do it if the resulting parsing table contains multiply defined
entries
If we didnt eliminate left recursion, eliminate the left recursion in the
grammar.
If the grammar is not left factored, we have to left factor the grammar.
If its (new grammars) parsing table still contains multiply defined
entries, that grammar is ambiguous or it is inherently not a LL(1)
grammar.
A left recursive grammar cannot be a LL (1) grammar.
A ?? A?? | ??
any terminal that appears in FIRST(??) also appears FIRST(A??)
because A?? ?? ????.
If ?? is ??, any terminal that appears in FIRST(??) also appears in
FIRST(A??) and FOLLOW(A).
A grammar is not left factored, it cannot be a LL(1) grammar
A ?? ????1 | ????2
any terminal that appears in FIRST(????1) also appears in
FIRST(????2).
An ambiguous grammar cannot be a LL (1) grammar.
Properties of LL(1) Grammars
A grammar G is LL(1) if and only if the following conditions hold for two
distinctive production rules A ?? ?? and A ?? ??
1. Both ?? and ?? cannot derive strings starting with same terminals.
2. At most one of ?? and ?? can derive to ??.
3. If ?? can derive to ??, then ?? cannot derive to any string starting with a
terminal in FOLLOW(A).
Error Recovery in Predictive Parsing
An error may occur in the predictive parsing (LL(1) parsing)
if the terminal symbol on the top of stack does not match with the current
input symbol.
if the top of stack is a non-terminal A, the current input symbol is a, and
the parsing table entry M[A, a] is empty.
What should the parser do in an error case
The parser should be able to give an error message (as much as possible
meaningful error message).
It should be recovered from that error case, and it should be able to
continue the parsing with the rest of the input.
Error Recovery Techniques
Panic-Mode Error Recovery
Skipping the input symbols until a synchronizing token is found.
Phrase-Level Error Recovery
Each empty entry in the parsing table is filled with a pointer to a specific
error routine to take care that error case.
Error-Productions
If we have a good idea of the common errors that might be encountered,
we can augment the grammar with productions that generate erroneous
constructs.
When an error production is used by the parser, we can generate
appropriate error diagnostics.
Since it is almost impossible to know all the errors that can be made by the
programmers, this method is not practical.
Global-Correction
Ideally, we would like a compiler to make as few changes as possible in
processing incorrect inputs.
We have to globally analyze the input to find the error.
This is an expensive method, and it is not in practice.
Panic-Mode Error Recovery in LL (1) Parsing
In panic-mode error recovery, we skip all the input symbols until a synchronizing
token is found.
What is the synchronizing token
All the terminal-symbols in the follow set of a non-terminal can be used as
a synchronizing token set for that non-terminal.
So, a simple panic-mode error recovery for the LL(1) parsing:
All the empty entries are marked as synch to indicate that the parser will
skip all the input symbols until a symbol in the follow set of the nonterminal
A which on the top of the stack. Then the parser will pop that
non-terminal A from the stack. The parsing continues from that state.
To handle unmatched terminal symbols, the parser pops that unmatched
terminal symbol from the stack and it issues an error message saying that
that unmatched terminal is inserted.
Panic-Mode Error Recovery Example
S ?? AbS | e | ??
A ?? a | cAd
Soln:
FIRST (S) = FIRST (A) = {a, c}
FIRST (A) = {a, c}
FOLLOW (S) = {$}
FOLLOW (A) = {b, d}
a b c d e $
S S ??
AbS
sync S ??
AbS
sync S ??
e
S ??
??
A A ?? a sync A ??
cAd
sync sync sync
Eg: input string aab
stack input output
$S aab$ S ?? AbS
$SbA aab$ A ?? a
$Sba aab$
$Sb ab$ Error: missing b, inserted
$S ab$ S ?? AbS
$SbA ab$ A ?? a
$Sba ab$
$Sb b$
$S $ S ?? ??
$ $ accept
Eg: Another input string ceadb
stack input output
$S ceadb$ S ?? AbS
$SbA ceadb$ A ?? cAd
$SbdAc ceadb$
$SbdA eadb$ Error:unexpected e (illegal A)
(Remove all input tokens until first b or d, pop A)
$Sbd db$
$Sb b$
$S $ S ?? ??
$ $ accept
Phrase-Level Error Recovery
Each empty entry in the parsing table is filled with a pointer to a special error
routine which will take care that error case.
These error routines may:
Change, insert, or delete input symbols.
issue appropriate error messages
Pop items from the stack.
We should be careful when we design these error routines, because we may put
the parser into an infinite loop.