We are building EduLadder(ELADR) - Protocol

The Eladr Protocol is a decentralized, security and efficiency enhanced Web3 noSQL database powered by IPFS as the data storage layer https://ipfs.io/, and the Cardano block chain as the rewards token platform, https://cardano.org/. It provides a JSON based, IPFS layer 2 solution for data indexing and retrieval in an 'append only' file system built with open source Node.js API libraries.

The ELADR token was designed to incentivize and reward community members as a proof of contribution. Token holders are also granted access to EduLadder.com premium features as well as associated ELADR token enabled apps.

WHITE PAPER Buy Now Try BETA

Real Problems! Real Experts!

Join Our Telegram Channel !


The Eduladder is a community of students, teachers, and programmers. We help you to solve your academic and programming questions fast.
In eduladder you can Ask,Answer,Listen,Earn and Download Questions and Question papers.
Watch related videos of your favorite subject.
Connect with students from different parts of the world.
Apply or Post Jobs, Courses ,Internships and Volunteering opportunity. For FREE
See Our team
Wondering how we keep quality?
Got unsolved questions? Ask Questions


You are here:Open notes-->VTU-->COMPAILER-DESIGN-10CS63-VTU-NOTES-UNIT-2

COMPAILER DESIGN 10CS63 VTU NOTES UNIT-2

UNIT II: SYNTAX ANALYSIS - 1


 Syntax Analyzer creates the syntactic structure of the given source program.
 This syntactic structure is mostly a parse tree.
 Syntax Analyzer is also known as parser.
 The syntax of a programming is described by a context-free grammar (CFG). We
will use BNF (Backus-Naur Form) notation in the description of CFGs.
 The syntax analyzer (parser) checks whether a given source program satisfies the
rules implied by a context-free grammar or not.
 If it satisfies, the parser creates the parse tree of that program.
 Otherwise the parser gives the error messages.
 A context-free grammar
 gives a precise syntactic specification of a programming language.
 the design of the grammar is an initial phase of the design of a compiler.
 a grammar can be directly converted into a parser by some tools.
 Parser works on a stream of tokens.
 The smallest item is a token.
Fig :Position Of Parser in Compiler model
SYMBOL TABLE

 We categorize the parsers into two groups:
1. Top-Down Parser
2. the parse tree is created top to bottom, starting from the root.
1. Bottom-Up Parser
 the parse is created bottom to top; starting from the leaves
 Both top-down and bottom-up parsers scan the input from left to right (one
symbol at a time).
 Efficient top-down and bottom-up parsers can be implemented only for subclasses
of context-free grammars.
 LL for top-down parsing
 LR for bottom-up parsing
Syntax Error Handling
 Common Programming errors can occur at many different levels.
1. Lexical errors: include misspelling of identifiers, keywords, or operators.
2. Syntactic errors : include misplaced semicolons or extra or missing braces.

  31
3. Semantic errors: include type mismatches between operators and operands.
4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Goals of the Parser
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect subsequent errors.
 Add minimal overhead to the processing of correct programs.
Error-Recovery Strategies
 Panic-Mode Recovery
 Phrase-Level Recovery
 Error Productions
 Global Correction
Panic-Mode Recovery
 On discovering an error, the parser discards input symbols one at a time until one
of a designated set of Synchronizing tokens is found.
 Synchronizing tokens are usually delimiters.
Ex: semicolon or } whose role in the source program is clear and unambiguous.
 It often skips a considerable amount of input without checking it for additional
errors.
Advantage:
Simplicity
Is guaranteed not to go into an infinite loop
Phrase-Level Recovery
 A parser may perform local correction on the remaining input. i.e
it may replace a prefix of the remaining input by some string that allows the parser to
continue.
Ex: replace a comma by a semicolon, insert a missing semicolon
 Local correction is left to the compiler designer.
 It is used in several error-repairing compliers, as it can correct any input string.
 Difficulty in coping with the situations in which the actual error has occurred
before the point of detection.
Error Productions
 We can augment the grammar for the language at hand with productions that
generate the erroneous constructs.
 Then we can use the grammar augmented by these error productions to
Construct a parser.
 If an error production is used by the parser, we can generate appropriate error
diagnostics to indicate the erroneous construct that has been recognized in the
input.
Global Correction
 We use algorithms that perform minimal sequence of changes to obtain a globally
least cost correction.

  32
 Given an incorrect input string x and grammar G, these algorithms will find a
parse tree for a related string y.
 Such that the number of insertions, deletions and changes of tokens required to
transform x into y is as small as possible.
 It is too costly to implement in terms of time space, so these techniques only of
theoretical interest.
Context-Free Grammars
 Inherently recursive structures of a programming language are defined by a
context-free grammar.
 In a context-free grammar, we have:
 A finite set of terminals (in our case, this will be the set of tokens)
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
 A ?? ?? where A is a non-terminal and
?? is a string of terminals and non-terminals (including the empty
string)
 A start symbol (one of the non-terminal symbol)
NOTATIONAL CONVENTIONS
1. Symbols used for terminals are :
 Lower case letters early in the alphabet (such as a, b, c, . . .)
 Operator symbols (such as +, *, . . . )
 Punctuation symbols (such as parenthesis, comma and so on)
 The digits(09)
 Boldface strings and keywords (such as id or if) each of which represents
a single terminal symbol
2. Symbols used for non terminals are:
 Uppercase letters early in the alphabet (such as A, B, C, )
 The letter S, which when it appears is usually the start symbol.
 Lowercase, italic names (such as expr or stmt).
3. Lower case greek letters, α, β, ?? for example represent (possibly empty)
strings of grammar symbols.
Example: using above notations list out terminals, non terminals and start symbol in
the following example
E ?? E + T | E  T | T
T ?? T * F | T / F | F
F ?? ( E ) | id
Here terminal are +, -, *, / , (, ), id
Non terminals are E, T, F
Start symbol is E
Derivations

  33
E ?? E+E
 E+E derives from E
 we can replace E by E+E
 to able to do this, we have to have a production rule E??E+E in our
grammar.
E ?? E+E ?? id+E ?? id+id
 A sequence of replacements of non-terminal symbols is called a derivation of
id+id from E.
 In general a derivation step is
??A?? ?? ???? if there is a production rule A???? in our grammar
where ?? and ?? are arbitrary strings of terminal and nonterminal
symbols
??1 ?? ??2 ?? ... ?? ??n (??n derives from ??1 or ??1 derives ??n )
?? : derives in one step
?? : derives in zero or more steps
?? : derives in one or more steps
CFG  Terminology
 L(G) is the language of G (the language generated by G) which is a set of
sentences.
 A sentence of L(G) is a string of terminal symbols of G.
 If S is the start symbol of G then
?? is a sentence of L(G) iff S ?? ?? where ?? is a string of terminals of G.
 If G is a context-free grammar, L(G) is a context-free language.
 Two grammars are equivalent if they produce the same language.
 S ?? ?? - If ?? contains non-terminals, it is called as a sentential form of G.
- If ?? does not contain non-terminals, it is called as a sentence of
G.
Derivation Example
E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)
OR
E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)
 At each derivation step, we can choose any of the non-terminal in the sentential
form of G for the replacement.
 If we always choose the left-most non-terminal in each derivation step, this
derivation is called as left-most derivation.
 If we always choose the right-most non-terminal in each derivation step, this
derivation is called as right-most derivation.
Left-Most and Right-Most Derivations
Left-Most Derivation
E ?? -E ?? -(E) ?? -(E+E) ?? -(id+E) ?? -(id+id)

  34
Right-Most Derivation
E ?? -E ?? -(E) ?? -(E+E) ?? -(E+id) ?? -(id+id)
 We will see that the top-down parsers try to find the left-most derivation of the
given source program.
 We will see that the bottom-up parsers try to find the right-most derivation of the
given source program in the reverse order.
Parse Tree
 Inner nodes of a parse tree are non-terminal symbols.
 The leaves of a parse tree are terminal symbols.
 A parse tree can be seen as a graphical representation of a derivation.
Problems on derivation of a string with parse tree:
1. Consider the grammar S (L) | a
LL,S | S
i. What are the terminals, non terminal and the start symbol
ii. Find the parse tree for the following sentence
a. (a,a)
b. (a, (a, a))
c. (a, ((a,a),(a,a)))
iii. Construct LMD and RMD for each.
2. Do the above steps for the grammar S  aS | aSbS | ?? for the string aaabaab
Ambiguity
 A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.

  35
 For the most parsers, the grammar must be unambiguous.
 unambiguous grammar
 unique selection of the parse tree for a sentence
 We should eliminate the ambiguity in the grammar during the design phase of the
compiler.
 An ambiguous grammar should be written to eliminate the ambiguity.
 We have to prefer one of the parse trees of a sentence (generated by an ambiguous
grammar) to disambiguate that grammar to restrict to this choice.
 EG:
Ambiguity (cont.)
stmt ?? if expr then stmt |
if expr then stmt else stmt | otherstmts
if E1 then if E2 then S1 else S2
stmt
if expr then stmt else stmt
E1 if expr then stmt S2
E2 S1
stmt
if expr then stmt
E1 if expr then stmt else stmt
E2 S1 S2
1 2
 We prefer the second parse tree (else matches with closest if).
 So, we have to disambiguate our grammar to reflect this choice.
 The unambiguous grammar will be:
 stmt ?? matchedstmt | unmatchedstmt

  36
 matchedstmt ?? if expr then matchedstmt else matchedstmt | otherstmts
 unmatchedstmt ?? if expr then stmt |
 if expr then matchedstmt else unmatchedstmt
Problems on ambiguous grammar:
Show that the following grammars are ambiguous grammar by constructing either
2 lmd or 2 rmd for the given string.
1. S  S(S)S | ?? with the string ( ( ) ( ) )
2. S  S+S | |SS | (S) |S* a with the string (a+a)*a
3. S  aS | aSbS | ?? with the string abab
Ambiguity  Operator Precedence
 Ambiguous grammars (because of ambiguous operators) can be disambiguated
according to the precedence and associativity rules.
E ?? E+E | E*E | E^E | id | (E)
disambiguate the grammar
precedence: ^ (right to left)
* (left to right)
+ (left to right)
E ?? E+T | T
T ?? T*F | F
F ?? G^F | G
G ?? id | (E)
Left Recursion
 A grammar is left recursive if it has a non-terminal A such that there is a
derivation.
A ?? A?? for some string ??
 Top-down parsing techniques cannot handle left-recursive grammars.
 So, we have to convert our left-recursive grammar into an equivalent grammar
which is not left-recursive.
 The left-recursion may appear in a single step of the derivation (immediate leftrecursion),
or may appear in more than one step of the derivation.
Immediate Left-Recursion
A ?? A ?? | ?? where ?? does not start with A
?? eliminate immediate left recursion
A ?? ?? A
A ?? ?? A | ?? an equivalent grammar
In general,
A ?? A ??1 | ... | A ??m | ??1 | ... | ??n where ??1 ... ??n do not start with A
?? eliminate immediate left recursion
A ?? ??1 A | ... | ??n A
A ?? ??1 A | ... | ??m A | ?? an equivalent grammar
Example

  37
Left-Recursion  Problem
 A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
 By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S ?? Aa | b
A ?? Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S ?? Aa ?? Sca or
A ?? Sc ?? Aac causes to a left-recursion
 So, we have to eliminate all left-recursions from our grammar
Eliminate Left-Recursion  Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai ?? Aj ??
by
Ai ?? ??1 ?? | ... | ??k ??
where Aj ?? ??1 | ... | ??k
}
- eliminate immediate left-recursions among Ai productions
}
Example2:
S ?? Aa | b
A ?? Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A ?? SdA | fA
A ?? cA | ??
for S:
- Replace S ?? Aa with S ?? SdAa | fAa
So, we will have S ?? SdAa | fAa | b
- Eliminate the immediate left-recursion in S

  38
S ?? fAaS | bS
S ?? dAaS | ??
So, the resulting equivalent grammar which is not left-recursive is:
S ?? fAaS | bS
S ?? dAaS | ??
A ?? SdA | fA
A ?? cA | ??
Problems of left recursion
1. S  S(S)S | ??
2. S  S+S | |SS | (S) |S* a
3. S  SS+ | SS* | a
4. bexpr  bexpr or bterm | bterm
bterm bterm and bfactor | bfactor
bfactor  not bfactor | (bexpr) | true | false
5. S  (L) | a, L  L,S | S
Left-Factoring
 A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left-factored.
grammar  a new equivalent grammar suitable for predictive parsing
stmt ?? if expr then stmt else stmt |
if expr then stmt
 when we see if, we cannot now which production rule to choose to re-write stmt
in the derivation.
 In general,
A ?? ????1 | ????2 where ?? is non-empty and the first symbols
of ??1 and ??2 (if they have one)are different.
 when processing ?? we cannot know whether expand
A to ????1 or
A to ????2
 But, if we re-write the grammar as follows
A ?? ??A
A ?? ??1 | ??2 so, we can immediately expand A to ??A
Left-Factoring  Algorithm
 For each non-terminal A with two or more alternatives (production rules) with a
common non-empty prefix, let say
A ?? ????1 | ... | ????n | ??1 | ... | ??m
convert it into
A ?? ??A | ??1 | ... | ??m
A ?? ??1 | ... | ??n
Left-Factoring  Example1
A ?? abB | aB | cdg | cdeB | cdfB
??

  39
A ?? aA | cdg | cdeB | cdfB
A ?? bB | B
??
A ?? aA | cdA
A ?? bB | B
A ?? g | eB | fB
Example2
A ?? ad | a | ab | abc | b
??
A ?? aA | b
A ?? d | ?? | b | bc
??
A ?? aA | b
A ?? d | ?? | bA
A ?? ?? | c
Problems on left factor
1. S  iEtS | iEtSeS | a, E  b 6. S 0S1 | 01
2. S  S(S)S | ?? 7. S  S+S | |SS | (S) |S* a
3. S  aS | aSbS | ?? 8. S  (L) | a, L  L,S | S
4. S  SS+ | SS* | a
5. bexpr  bexpr or bterm | bterm 9. rexpr  rexpr + rterm | rterm
bterm bterm and bfactor | bfactor rterm rterm rfactor |
rfactor
bfactor  not bfactor | (bexpr) | true | false rfactor  rfactor* | rprimary
rprimary a |b do both
leftfactor and left recursion
Non-Context Free Language Constructs
 There are some language constructions in the programming languages which are
not context-free. This means that, we cannot write a context-free grammar for
these constructions.
 L1 = { ??c?? | ?? is in (a | b)*} is not context-free
 Declaring an identifier and checking whether it is declared or not later.
We cannot do this with a context-free language. We need semantic analyzer (which is
not context-free).
 L2 = {anbmcndm | n??1 and m??1 } is not context-free
 Declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters.
Top  down parser
Recursive Descent parser Predictive parser
LL(1) parser LL(k) parser

  40
First L stands for left to right scan
Second L stands for LMD
(1) stands for only one i/p symbol to predict the parser
(2) stands for k no. of i/p symbol to predict the parser
 The parse tree is created top to bottom.
 Top-down parser
 Recursive-Descent Parsing
 Backtracking is needed (If a choice of a production rule does not
work, we backtrack to try other alternatives.)
 It is a general parsing technique, but not widely used.
 Not efficient
 Predictive Parsing
 no backtracking
 efficient
 Needs a special form of grammars (LL (1) grammars).
 Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.
Recursive-Descent Parsing (uses Backtracking)
 Backtracking is needed.
 It tries to find the left-most derivation.
S ?? aBc
B ?? bc | b
S S
input: abc
a B c a B c
b c b
Predictive Parser
 When re-writing a non-terminal in a derivation step, a predictive parser can
uniquely choose a production rule by just looking the current symbol in the input
string.
A ?? ??1 | ... | ??n input: ... a .......
current token
example
stmt ?? if ...... |
while ...... |
begin ...... |
for .....
 When we are trying to write the non-terminal stmt, if the current token is if we
have to choose first production rule.

  41
 When we are trying to write the non-terminal stmt, we can uniquely choose the
production rule by just looking the current token.
 We eliminate the left recursion in the grammar, and left factor it. But it may not
be suitable for predictive parsing (not LL(1) grammar).
Non-Recursive Predictive Parsing -- LL(1) Parser
 Non-Recursive predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.
LL(1) Parser
input buffer
 our string to be parsed. We will assume that its end is marked with a
special symbol $.
output
 a production rule representing a step of the derivation sequence (left-most
derivation) of the string in the input buffer.
stack
 contains the grammar symbols
 at the bottom of the stack, there is a special end marker symbol $.
 initially the stack contains only the symbol $ and the starting symbol S. $S
 initial stack
 when the stack is emptied (ie. only $ left in the stack), the parsing is
completed.
parsing table
 a two-dimensional array M[A, a]
 each row is a non-terminal symbol
 each column is a terminal symbol or the special symbol $
each entry holds a production rule.
Constructing LL(1) Parsing Tables
 Two functions are used in the construction of LL(1) parsing tables:
 FIRST FOLLOW

  42
 FIRST(??) is a set of the terminal symbols which occur as first symbols in strings
derived from ?? where ?? is any string of grammar symbols.
 if ?? derives to ??, then ?? is also in FIRST(??) .
 FOLLOW(A) is the set of the terminals which occur immediately after (follow)
the non-terminal A in the strings derived from the starting symbol.
 a terminal a is in FOLLOW(A) if S ?? ??Aa??
 $ is in FOLLOW(A) if S ?? ??A
Compute FIRST for Any String X
 If X is a terminal symbol  FIRST(X)={X}
 If X is a non-terminal symbol and X ?? ?? is a production rule  ?? is in
FIRST(X).
 If X is a non-terminal symbol and X ?? Y1Y2..Yn is a production rule
 if a terminal a in FIRST(Yi) and ?? is in all FIRST(Yj) for j=1,...,i-1then a is in
FIRST(X).
 if ?? is in all FIRST(Yj) for j=1,...,n then ?? is in FIRST(X).
 If X is ??  FIRST(X)={??}
 If X is Y1Y2..Yn  if a terminal a in FIRST(Yi) and ?? is in all
FIRST(Yj) for
j=1,...,i-1 then a is in FIRST(X).
 if ?? is in all FIRST(Yj) for j=1,...,n
then ?? is in FIRST(X).
Compute FOLLOW (for non-terminals)
 If S is the start symbol  $ is in FOLLOW(S)
 if A ?? ??B?? is a production rule  everything in FIRST(??) is
FOLLOW(B) except ??
 If ( A ?? ??B is a production rule ) or ( A ?? ??B?? is a production rule and ?? is in
FIRST(??) )
 everything in FOLLOW(A) is in
FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
LL(1) Parser  Parser Actions
 The symbol at the top of the stack (say X) and the current symbol in the input
string (say a) determine the parser action.
 There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal

  43
 parser looks at the parsing table entry M[X, a]. If M[X, a] holds a production
rule X??Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule X??Y1Y2...Yk to represent a step of the derivation.
4. none of the above  error
 all empty entries in the parsing table are errors.
 If X is a terminal symbol different from a, this is also an error case.
Fall 2003 CS416 Compiler Design 13
Non-Recursive predictive parsing Algorithm
LL(1) Parser  Example1
S ?? aBa LL (1) Parsing Table
B ?? bB | ??
FIRST FUNCTION
FIRST(S) = {a} FIRST (aBa) = {a}
FIRST (B) = {b} FIRST (bB) = {b} FIRST (?? ) = {??}
FOLLOW FUNCTION
FOLLOW(S) = {$} FOLLOW (B) = {a}
a b $
S S ?? aBa
B B ?? ?? B ?? bB
stack input output
$S abba$ S ?? aBa
$aBa abba$
$aB bba$ B ?? bB
$aBb bba$

  44
$aB ba$ B ?? bB
$aBb ba$
$aB a$ B ?? ??
$a a$
$ $ accept, successful completion
Outputs: S ?? aBa B ?? bB B ?? bB B ?? ??
Derivation(left-most): S??aBa??abBa??abbBa??abba
Example2
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
Soln:
FIRST Example
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
FIRST(F) = {(,id} FIRST(TE) = {(,id}
FIRST(T) = {*, ??} FIRST(+TE ) = {+}
FIRST(T) = {(,id} FIRST(??) = {??}
FIRST(E) = {+, ??} FIRST(FT) = {(,id}
FIRST(E) = {(,id} FIRST(*FT) = {*}
FIRST(??) = {??} FIRST((E)) = {(} FIRST(id) = {id}
FOLLOW Example
E ?? TE
E ?? +TE
| ??
T ?? FT
T ?? *FT
| ??
F ?? (E) | id
FOLLOW (E) = {$,)}
FOLLOW (E) = {$,)}
FOLLOW (T) = {+,), $}

  45
FOLLOW (T) = {+,), $}
FOLLOW (F) = {+, *,), $}
Constructing LL (1) Parsing Table  Algorithm
 for each production rule A ?? ?? of a grammar G
 for each terminal a in FIRST(??)  add A ?? ?? to M[A, a]
 If ?? in FIRST(??)  for each terminal a in FOLLOW(A) add A
?? ?? to M[A, a]
 If ?? in FIRST(??) and $ in FOLLOW(A)  add A ?? ?? to M[A, $]
 All other undefined entries of the parsing table are error entries.
Constructing LL (1) Parsing Table  Example
E ?? TE FIRST (TE) = {(, id}  E ?? TE into M [E, (] and M[E, id]
E ?? +TE
FIRST (+TE) = {+}  E ?? +TE into M [E, +]
E ?? ?? FIRST (??) = {??}  none
but since ?? in FIRST(??) and FOLLOW(E)={$,)}
 E ?? ?? into M[E,$] and M[E,)]
T ?? FT FIRST (FT) = {(, id}  T ?? FT into M[T,(] and M[T, id]
T ?? *FT FIRST (*FT) = {*}  T ?? *FT into M [T,*]
T ?? ?? FIRST (??) = {??}  none
but since ?? in FIRST(??)
and FOLLOW(T)={$, ) ,+}
 T ?? ?? into M [T, $], M [T , )] and M [T,+]
F ?? (E) FIRST ((E)) = {(}  F ?? (E) into M [F, (]
F ?? id FIRST (id) = {id}  F ?? id into M [F, id]
id + * ( ) $
E E ??
TE
E ?? TE
E E ?? +TE E ?? ?? E ?? ??
T T ??
FT
T ?? FT
T T ?? ?? T ??
*FT
T ?? ?? T ?? ??
F F ?? id F ?? (E)
stack input output
$E id+id$ E ?? TE
$ET id+id$ T ?? FT
$E TF id+id$ F ?? id
$ E Tid id+id$
$ E T +id$ T ?? ??
$ E +id$ E ?? +TE
$ E T+ +id$

  46
$ E T id$ T ?? FT
$ E T F id$ F ?? id
$ E Tid id$
$ E T $ T ?? ??
$ E $ E ?? ??
$ $ accept
Construct the predictive parser LL (1) for the following grammar and parse the
given string
1. S  S(S)S | ?? with the string ( ( ) (
) )
2. S  + S S | |*SS | a with the string
+*aa a
3. S  aSbS | bSaS | ?? with the string
aabbbab
4. bexpr  bexpr or bterm | bterm
bterm bterm and bfactor |
bfactor
bfactor  not bfactor | (bexpr) |
true | false
string not(true or false)
5. S  0S1 | 01 string 00011
6. S  aB | aC | Sd |Se
B  bBc | f
C g
7. P  Ra | Qba
R  aba | caba | Rbc
Q  bbc |bc string
cababca
8. S  PQR
P  a | Rb | ??
Q  c | dP |??
R  e | f string adeb
9. E  E+ T |T
T  id | id[ ] | id[X]
X  E, E | E string id[id]
10. S  (A) | 0
A  SB
B  ,SB |?? string (0, (0,0))
11. S  a | ?? | (T)
T  T,S | S
String (a,(a,a))
String ((a,a), ?? , (a),a)
LL (1) Grammars
 A grammar whose parsing table has no multiply-defined entries is said to be LL
(1) grammar. one input symbol used as a look-head symbol do determine parser
action LL (1) left most derivation input scanned from left to right
 The parsing table of a grammar may contain more than one production rule. In
this case, we say that it is not a LL (1) grammar.
A Grammar which is not LL (1)
S ?? i C t S E | a
E ?? e S | ??
C ?? b
FIRST(iCtSE) = {i} FOLLOW(S) = {$, e}
FIRST(a) = {a} FOLLOW (E) = {$, e}
FIRST(eS) = {e} FOLLOW(C) = {t }
FIRST(??) = {??}
FIRST(b) = {b}

  47
a b e i t $
S S ?? a S ?? iCtSE
E E ?? e S
E ?? ??
E ?? ??
C C ?? b
two production rules for M[E, e]
Problem  ambiguity
 What do we have to do it if the resulting parsing table contains multiply defined
entries
 If we didnt eliminate left recursion, eliminate the left recursion in the
grammar.
 If the grammar is not left factored, we have to left factor the grammar.
 If its (new grammars) parsing table still contains multiply defined
entries, that grammar is ambiguous or it is inherently not a LL(1)
grammar.
 A left recursive grammar cannot be a LL (1) grammar.
 A ?? A?? | ??
 any terminal that appears in FIRST(??) also appears FIRST(A??)
because A?? ?? ????.
 If ?? is ??, any terminal that appears in FIRST(??) also appears in
FIRST(A??) and FOLLOW(A).
 A grammar is not left factored, it cannot be a LL(1) grammar
 A ?? ????1 | ????2
 any terminal that appears in FIRST(????1) also appears in
FIRST(????2).
 An ambiguous grammar cannot be a LL (1) grammar.
Properties of LL(1) Grammars
 A grammar G is LL(1) if and only if the following conditions hold for two
distinctive production rules A ?? ?? and A ?? ??
1. Both ?? and ?? cannot derive strings starting with same terminals.
2. At most one of ?? and ?? can derive to ??.
3. If ?? can derive to ??, then ?? cannot derive to any string starting with a
terminal in FOLLOW(A).
Error Recovery in Predictive Parsing
 An error may occur in the predictive parsing (LL(1) parsing)
 if the terminal symbol on the top of stack does not match with the current
input symbol.
 if the top of stack is a non-terminal A, the current input symbol is a, and
the parsing table entry M[A, a] is empty.
 What should the parser do in an error case
 The parser should be able to give an error message (as much as possible
meaningful error message).
 It should be recovered from that error case, and it should be able to
continue the parsing with the rest of the input.
Error Recovery Techniques
 Panic-Mode Error Recovery
 Skipping the input symbols until a synchronizing token is found.
 Phrase-Level Error Recovery
 Each empty entry in the parsing table is filled with a pointer to a specific
error routine to take care that error case.
 Error-Productions
 If we have a good idea of the common errors that might be encountered,
we can augment the grammar with productions that generate erroneous
constructs.
 When an error production is used by the parser, we can generate
appropriate error diagnostics.
 Since it is almost impossible to know all the errors that can be made by the
programmers, this method is not practical.
 Global-Correction
 Ideally, we would like a compiler to make as few changes as possible in
processing incorrect inputs.
 We have to globally analyze the input to find the error.
 This is an expensive method, and it is not in practice.
Panic-Mode Error Recovery in LL (1) Parsing
 In panic-mode error recovery, we skip all the input symbols until a synchronizing
token is found.
 What is the synchronizing token
 All the terminal-symbols in the follow set of a non-terminal can be used as
a synchronizing token set for that non-terminal.
 So, a simple panic-mode error recovery for the LL(1) parsing:
 All the empty entries are marked as synch to indicate that the parser will
skip all the input symbols until a symbol in the follow set of the nonterminal
A which on the top of the stack. Then the parser will pop that
non-terminal A from the stack. The parsing continues from that state.
 To handle unmatched terminal symbols, the parser pops that unmatched
terminal symbol from the stack and it issues an error message saying that
that unmatched terminal is inserted.
Panic-Mode Error Recovery  Example
S ?? AbS | e | ??
A ?? a | cAd
Soln:
FIRST (S) = FIRST (A) = {a, c}
FIRST (A) = {a, c}
FOLLOW (S) = {$}
FOLLOW (A) = {b, d}
a b c d e $
S S ??
AbS
sync S ??
AbS
sync S ??
e
S ??
??
A A ?? a sync A ??
cAd
sync sync sync
Eg: input string aab
stack input output
$S aab$ S ?? AbS
$SbA aab$ A ?? a
$Sba aab$
$Sb ab$ Error: missing b, inserted
$S ab$ S ?? AbS
$SbA ab$ A ?? a
$Sba ab$
$Sb b$
$S $ S ?? ??
$ $ accept
Eg: Another input string ceadb
stack input output
$S ceadb$ S ?? AbS
$SbA ceadb$ A ?? cAd
$SbdAc ceadb$
$SbdA eadb$ Error:unexpected e (illegal A)
(Remove all input tokens until first b or d, pop A)
$Sbd db$
$Sb b$
$S $ S ?? ??
$ $ accept
Phrase-Level Error Recovery
 Each empty entry in the parsing table is filled with a pointer to a special error
routine which will take care that error case.
 These error routines may:
 Change, insert, or delete input symbols.
 issue appropriate error messages
 Pop items from the stack.
 We should be careful when we design these error routines, because we may put
the parser into an infinite loop.

Editors




You might like this video:Watch more here

Watch more videos from this user Here

Learn how to upload a video over here