In eduladder you can Ask,Answer,Listen,Earn and Download Questions and Question papers.

Watch related videos of your favorite subject.

Connect with students from different parts of the world.

Apply or Post Jobs, Courses ,Internships and Volunteering opportunity. For FREE

See Our team

Wondering how we keep quality?

Got unsolved questions? Ask Questions

GATE
GMAT
CBSE
NCERT
Career
Interview
Railway
UPSC
NID
NIFT-UG
NIFT-PG
PHP
AJAX
JavaScript
Node Js
Shell Script
Research

You are here:Open notes-->VTU-->COMPAILER-DESIGN-10CS63-VTU-NOTES-UNIT-2

**COMPAILER DESIGN 10CS63 VTU NOTES UNIT-2**

## UNIT II: SYNTAX ANALYSIS - 1

Syntax Analyzer creates the syntactic structure of the given source program.

This syntactic structure is mostly a parse tree.

Syntax Analyzer is also known as parser.

The syntax of a programming is described by a context-free grammar (CFG). We

will use BNF (Backus-Naur Form) notation in the description of CFGs.

The syntax analyzer (parser) checks whether a given source program satisfies the

rules implied by a context-free grammar or not.

If it satisfies, the parser creates the parse tree of that program.

Otherwise the parser gives the error messages.

A context-free grammar

gives a precise syntactic specification of a programming language.

the design of the grammar is an initial phase of the design of a compiler.

a grammar can be directly converted into a parser by some tools.

Parser works on a stream of tokens.

The smallest item is a token.

Fig :Position Of Parser in Compiler model

**SYMBOL TABLE**

We categorize the parsers into two groups:

1. Top-Down Parser

2. the parse tree is created top to bottom, starting from the root.

1. Bottom-Up Parser

the parse is created bottom to top; starting from the leaves

Both top-down and bottom-up parsers scan the input from left to right (one

symbol at a time).

Efficient top-down and bottom-up parsers can be implemented only for subclasses

of context-free grammars.

LL for top-down parsing

LR for bottom-up parsing

Syntax Error Handling

Common Programming errors can occur at many different levels.

1. Lexical errors: include misspelling of identifiers, keywords, or operators.

2. Syntactic errors : include misplaced semicolons or extra or missing braces.

31

3. Semantic errors: include type mismatches between operators and operands.

4. Logical errors: can be anything from incorrect reasoning on the part of the

programmer.

Goals of the Parser

Report the presence of errors clearly and accurately

Recover from each error quickly enough to detect subsequent errors.

Add minimal overhead to the processing of correct programs.

Error-Recovery Strategies

Panic-Mode Recovery

Phrase-Level Recovery

Error Productions

Global Correction

Panic-Mode Recovery

On discovering an error, the parser discards input symbols one at a time until one

of a designated set of Synchronizing tokens is found.

Synchronizing tokens are usually delimiters.

Ex: semicolon or } whose role in the source program is clear and unambiguous.

It often skips a considerable amount of input without checking it for additional

errors.

Advantage:

Simplicity

Is guaranteed not to go into an infinite loop

Phrase-Level Recovery

A parser may perform local correction on the remaining input. i.e

it may replace a prefix of the remaining input by some string that allows the parser to

continue.

Ex: replace a comma by a semicolon, insert a missing semicolon

Local correction is left to the compiler designer.

It is used in several error-repairing compliers, as it can correct any input string.

Difficulty in coping with the situations in which the actual error has occurred

before the point of detection.

Error Productions

We can augment the grammar for the language at hand with productions that

generate the erroneous constructs.

Then we can use the grammar augmented by these error productions to

Construct a parser.

If an error production is used by the parser, we can generate appropriate error

diagnostics to indicate the erroneous construct that has been recognized in the

input.

Global Correction

We use algorithms that perform minimal sequence of changes to obtain a globally

least cost correction.

32

Given an incorrect input string x and grammar G, these algorithms will find a

parse tree for a related string y.

Such that the number of insertions, deletions and changes of tokens required to

transform x into y is as small as possible.

It is too costly to implement in terms of time space, so these techniques only of

theoretical interest.

Context-Free Grammars

Inherently recursive structures of a programming language are defined by a

context-free grammar.

In a context-free grammar, we have:

A finite set of terminals (in our case, this will be the set of tokens)

A finite set of non-terminals (syntactic-variables)

A finite set of productions rules in the following form

A ?? ?? where A is a non-terminal and

?? is a string of terminals and non-terminals (including the empty

string)

A start symbol (one of the non-terminal symbol)

NOTATIONAL CONVENTIONS

1. Symbols used for terminals are :

Lower case letters early in the alphabet (such as a, b, c, . . .)

Operator symbols (such as +, *, . . . )

Punctuation symbols (such as parenthesis, comma and so on)

The digits(09)

Boldface strings and keywords (such as id or if) each of which represents

a single terminal symbol

2. Symbols used for non terminals are:

Uppercase letters early in the alphabet (such as A, B, C, )

The letter S, which when it appears is usually the start symbol.

Lowercase, italic names (such as expr or stmt).

3. Lower case greek letters, α, β, ?? for example represent (possibly empty)

strings of grammar symbols.

Example: using above notations list out terminals, non terminals and start symbol in

the following example

E ?? E + T | E T | T

T ?? T * F | T / F | F

F ?? ( E ) | id

Here terminal are +, -, *, / , (, ), id

Non terminals are E, T, F

Start symbol is E

Derivations

33

E ?? E+E

E+E derives from E

we can replace E by E+E

to able to do this, we have to have a production rule E??E+E in our

grammar.

E ?? E+E ?? id+E ?? id+id

A sequence of replacements of non-terminal symbols is called a derivation of

id+id from E.

In general a derivation step is

??A?? ?? ???? if there is a production rule A???? in our grammar

where ?? and ?? are arbitrary strings of terminal and nonterminal

symbols

??1 ?? ??2 ?? ... ?? ??n (??n derives from ??1 or ??1 derives ??n )

?? : derives in one step

?? : derives in zero or more steps

?? : derives in one or more steps

CFG Terminology

L(G) is the language of G (the language generated by G) which is a set of

sentences.

A sentence of L(G) is a string of terminal symbols of G.

If S is the start symbol of G then

?? is a sentence of L(G) iff S ?? ?? where ?? is a string of terminals of G.

If G is a context-free grammar, L(G) is a context-free language.

Two grammars are equivalent if they produce the same language.

S ?? ?? - If ?? contains non-terminals, it is called as a sentential form of G.

- If ?? does not contain non-terminals, it is called as a sentence of

G.

Derivation Example

E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)

OR

E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)

At each derivation step, we can choose any of the non-terminal in the sentential

form of G for the replacement.

If we always choose the left-most non-terminal in each derivation step, this

derivation is called as left-most derivation.

If we always choose the right-most non-terminal in each derivation step, this

derivation is called as right-most derivation.

Left-Most and Right-Most Derivations

Left-Most Derivation

E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)

34

Right-Most Derivation

E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)

We will see that the top-down parsers try to find the left-most derivation of the

given source program.

We will see that the bottom-up parsers try to find the right-most derivation of the

given source program in the reverse order.

Parse Tree

Inner nodes of a parse tree are non-terminal symbols.

The leaves of a parse tree are terminal symbols.

A parse tree can be seen as a graphical representation of a derivation.

Problems on derivation of a string with parse tree:

1. Consider the grammar S (L) | a

LL,S | S

i. What are the terminals, non terminal and the start symbol

ii. Find the parse tree for the following sentence

a. (a,a)

b. (a, (a, a))

c. (a, ((a,a),(a,a)))

iii. Construct LMD and RMD for each.

2. Do the above steps for the grammar S aS | aSbS | ?? for the string aaabaab

Ambiguity

A grammar produces more than one parse tree for a sentence is

called as an ambiguous grammar.

35

For the most parsers, the grammar must be unambiguous.

unambiguous grammar

unique selection of the parse tree for a sentence

We should eliminate the ambiguity in the grammar during the design phase of the

compiler.

An ambiguous grammar should be written to eliminate the ambiguity.

We have to prefer one of the parse trees of a sentence (generated by an ambiguous

grammar) to disambiguate that grammar to restrict to this choice.

EG:

Ambiguity (cont.)

stmt ?? if expr then stmt |

if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt

if expr then stmt else stmt

E1 if expr then stmt S2

E2 S1

stmt

if expr then stmt

E1 if expr then stmt else stmt

E2 S1 S2

1 2

We prefer the second parse tree (else matches with closest if).

So, we have to disambiguate our grammar to reflect this choice.

The unambiguous grammar will be:

stmt ?? matchedstmt | unmatchedstmt

36

matchedstmt ?? if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt ?? if expr then stmt |

if expr then matchedstmt else unmatchedstmt

Problems on ambiguous grammar:

Show that the following grammars are ambiguous grammar by constructing either

2 lmd or 2 rmd for the given string.

1. S S(S)S | ?? with the string ( ( ) ( ) )

2. S S+S | |SS | (S) |S* a with the string (a+a)*a

3. S aS | aSbS | ?? with the string abab

Ambiguity Operator Precedence

Ambiguous grammars (because of ambiguous operators) can be disambiguated

according to the precedence and associativity rules.

E ?? E+E | E*E | E^E | id | (E)

disambiguate the grammar

precedence: ^ (right to left)

* (left to right)

+ (left to right)

E ?? E+T | T

T ?? T*F | F

F ?? G^F | G

G ?? id | (E)

Left Recursion

A grammar is left recursive if it has a non-terminal A such that there is a

derivation.

A ?? A?? for some string ??

Top-down parsing techniques cannot handle left-recursive grammars.

So, we have to convert our left-recursive grammar into an equivalent grammar

which is not left-recursive.

The left-recursion may appear in a single step of the derivation (immediate leftrecursion),

or may appear in more than one step of the derivation.

Immediate Left-Recursion

A ?? A ?? | ?? where ?? does not start with A

?? eliminate immediate left recursion

A ?? ?? A

A ?? ?? A | ?? an equivalent grammar

In general,

A ?? A ??1 | ... | A ??m | ??1 | ... | ??n where ??1 ... ??n do not start with A

?? eliminate immediate left recursion

A ?? ??1 A | ... | ??n A

A ?? ??1 A | ... | ??m A | ?? an equivalent grammar

Example

37

Left-Recursion Problem

A grammar cannot be immediately left-recursive, but it still can be

left-recursive.

By just eliminating the immediate left-recursion, we may not get

a grammar which is not left-recursive.

S ?? Aa | b

A ?? Sc | d This grammar is not immediately left-recursive,

but it is still left-recursive.

S ?? Aa ?? Sca or

A ?? Sc ?? Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammar

Eliminate Left-Recursion Algorithm

- Arrange non-terminals in some order: A1 ... An

- for i from 1 to n do {

- for j from 1 to i-1 do {

replace each production

Ai ?? Aj ??

by

Ai ?? ??1 ?? | ... | ??k ??

where Aj ?? ??1 | ... | ??k

}

- eliminate immediate left-recursions among Ai productions

}

Example2:

S ?? Aa | b

A ?? Ac | Sd | f

- Order of non-terminals: A, S

for A:

- we do not enter the inner loop.

- Eliminate the immediate left-recursion in A

A ?? SdA | fA

A ?? cA | ??

for S:

- Replace S ?? Aa with S ?? SdAa | fAa

So, we will have S ?? SdAa | fAa | b

- Eliminate the immediate left-recursion in S

38

S ?? fAaS | bS

S ?? dAaS | ??

So, the resulting equivalent grammar which is not left-recursive is:

S ?? fAaS | bS

S ?? dAaS | ??

A ?? SdA | fA

A ?? cA | ??

Problems of left recursion

1. S S(S)S | ??

2. S S+S | |SS | (S) |S* a

3. S SS+ | SS* | a

4. bexpr bexpr or bterm | bterm

bterm bterm and bfactor | bfactor

bfactor not bfactor | (bexpr) | true | false

5. S (L) | a, L L,S | S

Left-Factoring

A predictive parser (a top-down parser without backtracking) insists that the

grammar must be left-factored.

grammar a new equivalent grammar suitable for predictive parsing

stmt ?? if expr then stmt else stmt |

if expr then stmt

when we see if, we cannot now which production rule to choose to re-write stmt

in the derivation.

In general,

A ?? ????1 | ????2 where ?? is non-empty and the first symbols

of ??1 and ??2 (if they have one)are different.

when processing ?? we cannot know whether expand

A to ????1 or

A to ????2

But, if we re-write the grammar as follows

A ?? ??A

A ?? ??1 | ??2 so, we can immediately expand A to ??A

Left-Factoring Algorithm

For each non-terminal A with two or more alternatives (production rules) with a

common non-empty prefix, let say

A ?? ????1 | ... | ????n | ??1 | ... | ??m

convert it into

A ?? ??A | ??1 | ... | ??m

A ?? ??1 | ... | ??n

Left-Factoring Example1

A ?? abB | aB | cdg | cdeB | cdfB

??

39

A ?? aA | cdg | cdeB | cdfB

A ?? bB | B

??

A ?? aA | cdA

A ?? bB | B

A ?? g | eB | fB

Example2

A ?? ad | a | ab | abc | b

??

A ?? aA | b

A ?? d | ?? | b | bc

??

A ?? aA | b

A ?? d | ?? | bA

A ?? ?? | c

Problems on left factor

1. S iEtS | iEtSeS | a, E b 6. S 0S1 | 01

2. S S(S)S | ?? 7. S S+S | |SS | (S) |S* a

3. S aS | aSbS | ?? 8. S (L) | a, L L,S | S

4. S SS+ | SS* | a

5. bexpr bexpr or bterm | bterm 9. rexpr rexpr + rterm | rterm

bterm bterm and bfactor | bfactor rterm rterm rfactor |

rfactor

bfactor not bfactor | (bexpr) | true | false rfactor rfactor* | rprimary

rprimary a |b do both

leftfactor and left recursion

Non-Context Free Language Constructs

There are some language constructions in the programming languages which are

not context-free. This means that, we cannot write a context-free grammar for

these constructions.

L1 = { ??c?? | ?? is in (a | b)*} is not context-free

Declaring an identifier and checking whether it is declared or not later.

We cannot do this with a context-free language. We need semantic analyzer (which is

not context-free).

L2 = {anbmcndm | n??1 and m??1 } is not context-free

Declaring two functions (one with n parameters, the other one with

m parameters), and then calling them with actual parameters.

Top down parser

Recursive Descent parser Predictive parser

LL(1) parser LL(k) parser

40

First L stands for left to right scan

Second L stands for LMD

(1) stands for only one i/p symbol to predict the parser

(2) stands for k no. of i/p symbol to predict the parser

The parse tree is created top to bottom.

Top-down parser

Recursive-Descent Parsing

Backtracking is needed (If a choice of a production rule does not

work, we backtrack to try other alternatives.)

It is a general parsing technique, but not widely used.

Not efficient

Predictive Parsing

no backtracking

efficient

Needs a special form of grammars (LL (1) grammars).

Recursive Predictive Parsing is a special form of Recursive

Descent parsing without backtracking.

Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.

Recursive-Descent Parsing (uses Backtracking)

Backtracking is needed.

It tries to find the left-most derivation.

S ?? aBc

B ?? bc | b

S S

input: abc

a B c a B c

b c b

Predictive Parser

When re-writing a non-terminal in a derivation step, a predictive parser can

uniquely choose a production rule by just looking the current symbol in the input

string.

A ?? ??1 | ... | ??n input: ... a .......

current token

example

stmt ?? if ...... |

while ...... |

begin ...... |

for .....

When we are trying to write the non-terminal stmt, if the current token is if we

have to choose first production rule.

41

When we are trying to write the non-terminal stmt, we can uniquely choose the

production rule by just looking the current token.

We eliminate the left recursion in the grammar, and left factor it. But it may not

be suitable for predictive parsing (not LL(1) grammar).

Non-Recursive Predictive Parsing -- LL(1) Parser

Non-Recursive predictive parsing is a table-driven parser.

It is a top-down parser.

It is also known as LL(1) Parser.

LL(1) Parser

input buffer

our string to be parsed. We will assume that its end is marked with a

special symbol $.

output

a production rule representing a step of the derivation sequence (left-most

derivation) of the string in the input buffer.

stack

contains the grammar symbols

at the bottom of the stack, there is a special end marker symbol $.

initially the stack contains only the symbol $ and the starting symbol S. $S

initial stack

when the stack is emptied (ie. only $ left in the stack), the parsing is

completed.

parsing table

a two-dimensional array M[A, a]

each row is a non-terminal symbol

each column is a terminal symbol or the special symbol $

each entry holds a production rule.

Constructing LL(1) Parsing Tables

Two functions are used in the construction of LL(1) parsing tables:

FIRST FOLLOW

42

FIRST(??) is a set of the terminal symbols which occur as first symbols in strings

derived from ?? where ?? is any string of grammar symbols.

if ?? derives to ??, then ?? is also in FIRST(??) .

FOLLOW(A) is the set of the terminals which occur immediately after (follow)

the non-terminal A in the strings derived from the starting symbol.

a terminal a is in FOLLOW(A) if S ?? ??Aa??

$ is in FOLLOW(A) if S ?? ??A

Compute FIRST for Any String X

If X is a terminal symbol FIRST(X)={X}

If X is a non-terminal symbol and X ?? ?? is a production rule ?? is in

FIRST(X).

If X is a non-terminal symbol and X ?? Y1Y2..Yn is a production rule

if a terminal a in FIRST(Yi) and ?? is in all FIRST(Yj) for j=1,...,i-1then a is in

FIRST(X).

if ?? is in all FIRST(Yj) for j=1,...,n then ?? is in FIRST(X).

If X is ?? FIRST(X)={??}

If X is Y1Y2..Yn if a terminal a in FIRST(Yi) and ?? is in all

FIRST(Yj) for

j=1,...,i-1 then a is in FIRST(X).

if ?? is in all FIRST(Yj) for j=1,...,n

then ?? is in FIRST(X).

Compute FOLLOW (for non-terminals)

If S is the start symbol $ is in FOLLOW(S)

if A ?? ??B?? is a production rule everything in FIRST(??) is

FOLLOW(B) except ??

If ( A ?? ??B is a production rule ) or ( A ?? ??B?? is a production rule and ?? is in

FIRST(??) )

everything in FOLLOW(A) is in

FOLLOW(B).

We apply these rules until nothing more can be added to any follow set.

LL(1) Parser Parser Actions

The symbol at the top of the stack (say X) and the current symbol in the input

string (say a) determine the parser action.

There are four possible parser actions.

1. If X and a are $ parser halts (successful completion)

2. If X and a are the same terminal symbol (different from $)

parser pops X from the stack, and moves the next symbol in the input buffer.

3. If X is a non-terminal

43

parser looks at the parsing table entry M[X, a]. If M[X, a] holds a production

rule X??Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The

parser also outputs the production rule X??Y1Y2...Yk to represent a step of the derivation.

4. none of the above error

all empty entries in the parsing table are errors.

If X is a terminal symbol different from a, this is also an error case.

Fall 2003 CS416 Compiler Design 13

Non-Recursive predictive parsing Algorithm

LL(1) Parser Example1

S ?? aBa LL (1) Parsing Table

B ?? bB | ??

FIRST FUNCTION

FIRST(S) = {a} FIRST (aBa) = {a}

FIRST (B) = {b} FIRST (bB) = {b} FIRST (?? ) = {??}

FOLLOW FUNCTION

FOLLOW(S) = {$} FOLLOW (B) = {a}

a b $

S S ?? aBa

B B ?? ?? B ?? bB

stack input output

$S abba$ S ?? aBa

$aBa abba$

$aB bba$ B ?? bB

$aBb bba$

44

$aB ba$ B ?? bB

$aBb ba$

$aB a$ B ?? ??

$a a$

$ $ accept, successful completion

Outputs: S ?? aBa B ?? bB B ?? bB B ?? ??

Derivation(left-most): S??aBa??abBa??abbBa??abba

Example2

E ?? TE

E ?? +TE

| ??

T ?? FT

T ?? *FT

| ??

F ?? (E) | id

Soln:

FIRST Example

E ?? TE

E ?? +TE

| ??

T ?? FT

T ?? *FT

| ??

F ?? (E) | id

FIRST(F) = {(,id} FIRST(TE) = {(,id}

FIRST(T) = {*, ??} FIRST(+TE ) = {+}

FIRST(T) = {(,id} FIRST(??) = {??}

FIRST(E) = {+, ??} FIRST(FT) = {(,id}

FIRST(E) = {(,id} FIRST(*FT) = {*}

FIRST(??) = {??} FIRST((E)) = {(} FIRST(id) = {id}

FOLLOW Example

E ?? TE

E ?? +TE

| ??

T ?? FT

T ?? *FT

| ??

F ?? (E) | id

FOLLOW (E) = {$,)}

FOLLOW (E) = {$,)}

FOLLOW (T) = {+,), $}

45

FOLLOW (T) = {+,), $}

FOLLOW (F) = {+, *,), $}

Constructing LL (1) Parsing Table Algorithm

for each production rule A ?? ?? of a grammar G

for each terminal a in FIRST(??) add A ?? ?? to M[A, a]

If ?? in FIRST(??) for each terminal a in FOLLOW(A) add A

?? ?? to M[A, a]

If ?? in FIRST(??) and $ in FOLLOW(A) add A ?? ?? to M[A, $]

All other undefined entries of the parsing table are error entries.

Constructing LL (1) Parsing Table Example

E ?? TE FIRST (TE) = {(, id} E ?? TE into M [E, (] and M[E, id]

E ?? +TE

FIRST (+TE) = {+} E ?? +TE into M [E, +]

E ?? ?? FIRST (??) = {??} none

but since ?? in FIRST(??) and FOLLOW(E)={$,)}

E ?? ?? into M[E,$] and M[E,)]

T ?? FT FIRST (FT) = {(, id} T ?? FT into M[T,(] and M[T, id]

T ?? *FT FIRST (*FT) = {*} T ?? *FT into M [T,*]

T ?? ?? FIRST (??) = {??} none

but since ?? in FIRST(??)

and FOLLOW(T)={$, ) ,+}

T ?? ?? into M [T, $], M [T , )] and M [T,+]

F ?? (E) FIRST ((E)) = {(} F ?? (E) into M [F, (]

F ?? id FIRST (id) = {id} F ?? id into M [F, id]

id + * ( ) $

E E ??

TE

E ?? TE

E E ?? +TE E ?? ?? E ?? ??

T T ??

FT

T ?? FT

T T ?? ?? T ??

*FT

T ?? ?? T ?? ??

F F ?? id F ?? (E)

stack input output

$E id+id$ E ?? TE

$ET id+id$ T ?? FT

$E TF id+id$ F ?? id

$ E Tid id+id$

$ E T +id$ T ?? ??

$ E +id$ E ?? +TE

$ E T+ +id$

46

$ E T id$ T ?? FT

$ E T F id$ F ?? id

$ E Tid id$

$ E T $ T ?? ??

$ E $ E ?? ??

$ $ accept

Construct the predictive parser LL (1) for the following grammar and parse the

given string

1. S S(S)S | ?? with the string ( ( ) (

) )

2. S + S S | |*SS | a with the string

+*aa a

3. S aSbS | bSaS | ?? with the string

aabbbab

4. bexpr bexpr or bterm | bterm

bterm bterm and bfactor |

bfactor

bfactor not bfactor | (bexpr) |

true | false

string not(true or false)

5. S 0S1 | 01 string 00011

6. S aB | aC | Sd |Se

B bBc | f

C g

7. P Ra | Qba

R aba | caba | Rbc

Q bbc |bc string

cababca

8. S PQR

P a | Rb | ??

Q c | dP |??

R e | f string adeb

9. E E+ T |T

T id | id[ ] | id[X]

X E, E | E string id[id]

10. S (A) | 0

A SB

B ,SB |?? string (0, (0,0))

11. S a | ?? | (T)

T T,S | S

String (a,(a,a))

String ((a,a), ?? , (a),a)

LL (1) Grammars

A grammar whose parsing table has no multiply-defined entries is said to be LL

(1) grammar. one input symbol used as a look-head symbol do determine parser

action LL (1) left most derivation input scanned from left to right

The parsing table of a grammar may contain more than one production rule. In

this case, we say that it is not a LL (1) grammar.

A Grammar which is not LL (1)

S ?? i C t S E | a

E ?? e S | ??

C ?? b

FIRST(iCtSE) = {i} FOLLOW(S) = {$, e}

FIRST(a) = {a} FOLLOW (E) = {$, e}

FIRST(eS) = {e} FOLLOW(C) = {t }

FIRST(??) = {??}

FIRST(b) = {b}

47

a b e i t $

S S ?? a S ?? iCtSE

E E ?? e S

E ?? ??

E ?? ??

C C ?? b

two production rules for M[E, e]

Problem ambiguity

What do we have to do it if the resulting parsing table contains multiply defined

entries

If we didnt eliminate left recursion, eliminate the left recursion in the

grammar.

If the grammar is not left factored, we have to left factor the grammar.

If its (new grammars) parsing table still contains multiply defined

entries, that grammar is ambiguous or it is inherently not a LL(1)

grammar.

A left recursive grammar cannot be a LL (1) grammar.

A ?? A?? | ??

any terminal that appears in FIRST(??) also appears FIRST(A??)

because A?? ?? ????.

If ?? is ??, any terminal that appears in FIRST(??) also appears in

FIRST(A??) and FOLLOW(A).

A grammar is not left factored, it cannot be a LL(1) grammar

A ?? ????1 | ????2

any terminal that appears in FIRST(????1) also appears in

FIRST(????2).

An ambiguous grammar cannot be a LL (1) grammar.

Properties of LL(1) Grammars

A grammar G is LL(1) if and only if the following conditions hold for two

distinctive production rules A ?? ?? and A ?? ??

1. Both ?? and ?? cannot derive strings starting with same terminals.

2. At most one of ?? and ?? can derive to ??.

3. If ?? can derive to ??, then ?? cannot derive to any string starting with a

terminal in FOLLOW(A).

Error Recovery in Predictive Parsing

An error may occur in the predictive parsing (LL(1) parsing)

if the terminal symbol on the top of stack does not match with the current

input symbol.

if the top of stack is a non-terminal A, the current input symbol is a, and

the parsing table entry M[A, a] is empty.

What should the parser do in an error case

The parser should be able to give an error message (as much as possible

meaningful error message).

It should be recovered from that error case, and it should be able to

continue the parsing with the rest of the input.

Error Recovery Techniques

Panic-Mode Error Recovery

Skipping the input symbols until a synchronizing token is found.

Phrase-Level Error Recovery

Each empty entry in the parsing table is filled with a pointer to a specific

error routine to take care that error case.

Error-Productions

If we have a good idea of the common errors that might be encountered,

we can augment the grammar with productions that generate erroneous

constructs.

When an error production is used by the parser, we can generate

appropriate error diagnostics.

Since it is almost impossible to know all the errors that can be made by the

programmers, this method is not practical.

Global-Correction

Ideally, we would like a compiler to make as few changes as possible in

processing incorrect inputs.

We have to globally analyze the input to find the error.

This is an expensive method, and it is not in practice.

Panic-Mode Error Recovery in LL (1) Parsing

In panic-mode error recovery, we skip all the input symbols until a synchronizing

token is found.

What is the synchronizing token

All the terminal-symbols in the follow set of a non-terminal can be used as

a synchronizing token set for that non-terminal.

So, a simple panic-mode error recovery for the LL(1) parsing:

All the empty entries are marked as synch to indicate that the parser will

skip all the input symbols until a symbol in the follow set of the nonterminal

A which on the top of the stack. Then the parser will pop that

non-terminal A from the stack. The parsing continues from that state.

To handle unmatched terminal symbols, the parser pops that unmatched

terminal symbol from the stack and it issues an error message saying that

that unmatched terminal is inserted.

**Panic-Mode Error Recovery Example**

S ?? AbS | e | ??

A ?? a | cAd

Soln:

FIRST (S) = FIRST (A) = {a, c}

FIRST (A) = {a, c}

FOLLOW (S) = {$}

FOLLOW (A) = {b, d}

a b c d e $

S S ??

AbS

sync S ??

AbS

sync S ??

e

S ??

??

A A ?? a sync A ??

cAd

sync sync sync

Eg: input string aab

stack input output

$S aab$ S ?? AbS

$SbA aab$ A ?? a

$Sba aab$

$Sb ab$ Error: missing b, inserted

$S ab$ S ?? AbS

$SbA ab$ A ?? a

$Sba ab$

$Sb b$

$S $ S ?? ??

$ $ accept

Eg: Another input string ceadb

stack input output

$S ceadb$ S ?? AbS

$SbA ceadb$ A ?? cAd

$SbdAc ceadb$

$SbdA eadb$ Error:unexpected e (illegal A)

(Remove all input tokens until first b or d, pop A)

$Sbd db$

$Sb b$

$S $ S ?? ??

$ $ accept

**Phrase-Level Error Recovery**

Each empty entry in the parsing table is filled with a pointer to a special error

routine which will take care that error case.

These error routines may:

Change, insert, or delete input symbols.

issue appropriate error messages

Pop items from the stack.

We should be careful when we design these error routines, because we may put

the parser into an infinite loop.