You are here:
Open notes-->
VTU-->
CD-notes-for-6th-semCD notes for 6th sem
Units and prelimenary requerments
PART A
UNIT 1
8
Hours
Introduction, Lexical analysis: Language processors; The structure of a Compiler; The
evolution pf programming languages; The science of building a Compiler; Applications
of compiler technology; Programming language basics.
Lexical analysis: The Role of Lexical Analyzer; Input Buffering; Specifications of
Tokens; Recognition of Tokens.
UNIT 2 6
Hours
Syntax Analysis 1: Introduction; Context-free Grammars; Writing a Grammar. Topdown
Parsing; Bottom-up Parsing.
UNIT 3 6
Hours
Syntax Analysis 2: Top-down Parsing; Bottom-up Parsing.
UNIT 4 6
Hours
Syntax Analysis 3: Introduction to LR Parsing: Simple LR; More powerful LR parsers
(excluding Efficient construction and compaction of parsing tables) ; Using ambiguous
grammars; Parser Generators.
PART B
UNIT 5 7
Hours
Syntax-Directed Translation: Syntax-directed definitions; Evaluation orders for SDDs;
Applications of syntax-directed translation; Syntax-directed translation schemes.
UNIT 6 6
Hours
Intermediate Code Generation: Variants of syntax trees; Three-address code;
Translation of expressions; Control flow; Back patching; Switch statements; Procedure
calls.
UNIT 7 6
Hours
Run-Time Environments: Storage Organization; Stack allocation of space; Access to
non-local data on the stack; Heap management; Introduction to garbage collection.
UNIT 8 7
Hours
DEPT. OF CSE, SJBIT
Code Generation: Issues in the design of Code Generator; The Target Language;
Addresses in the target code; Basic blocks and Flow graphs; Optimization of basic
blocks; A Simple Code Generator
Text Books:
1. Alfred V Aho, Monica S.Lam, Ravi Sethi, Jeffrey D Ullman: Compilers- Principles,
Techniques and Tools, 2nd Edition, Pearson Education, 2007.
(Chapters 1, 3.1 to 3.4, 4 excluding 4.7.5 and 4.7.6, 5.1 to 5.4, 6.1, 6.2, 6.4, 6.6, 6.7 to
6.9, 7.1 to 7.5, 8.1 to 8.6.)
Reference Books:
1. Charles N. Fischer, Richard J. leBlanc, Jr.: Crafting a Compiler with C, Pearson
Education, 1991.
2. Andrew W Apple: Modern Compiler Implementation in C, Cambridge University
Press, 1997.
3. Kenneth C Louden: Compiler Construction Principles & Practice, Cengage Learning,
1997.
Preliminaries Required
Basic knowledge of programming languages.
Basic knowledge of DFA and NFA( FAFL concepts).
Knowledge of a high programming language for the programming assignments.
Textbook:
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Monica,
Compilers: Principles, Techniques, and Tools
Addison-Wesley, 2nd Edition.
Course Outline
Introduction to Compiling
Lexical Analysis
Syntax Analysis
Context Free Grammars
Top-Down Parsing, LL Parsing
Bottom-Up Parsing, LR Parsing
Syntax-Directed Translation
Attribute Definitions
Evaluation of Attribute Definitions
Semantic Analysis, Type Checking
Run-Time Organization
Intermediate Code Generation
UNIT I: INTRODUCTION, LEXICAL ANALYSIS
SYLLABUS:
Lexical analysis: Language processors;
The structure of a Compiler;
The evolution of programming languages;
The science of building a Compiler;
Applications of compiler technology;
Programming language basics.
Lexical analysis: The Role of Lexical Analyzer;
Input Buffering;
Specifications of Tokens;
Recognition of Tokens.
LANGUAGE PROCESSORS?
Is a Software which process a program given in a certain source language
Types:- compilers, Interpreters, Preprocessors, assembler ..etc
COMPILER
Pictorial representation of compiler is given below
Source program COMPILER target program
If target program is executable machine language then
Input target program output
INTERPRETER
Pictorial representation of an Interpreter is given below
Source program Interpreter Output
Input
HYBRID COMPILER
Source program Translator
Intermediate pgm
i/p virtual machine o/p
COMPILERS
A compiler is a program takes a program written in a source language and
translates it into an equivalent program in a target language.
source program COMPILER target program
error messages
Other Applications
In addition to the development of a compiler, the techniques used in compiler
design can be applicable to many problems in computer science.
Techniques used in a lexical analyzer can be used in text editors,
information retrieval system, and pattern recognition programs.
Techniques used in a parser can be used in a query processing system such
as SQL.
Many software having a complex front-end may need techniques used in
compiler design.
A symbolic equation solver which takes an equation as input. That
program should parse the given input equation.
Most of the techniques used in compiler design can be used in Natural
Language Processing (NLP) systems.
Major Parts of Compilers
There are two major parts of a compiler: Analysis and Synthesis
In analysis phase, an intermediate representation is created from the given source
program.
Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of
this phase.
In synthesis phase, the equivalent target program is created from this intermediate
representation.
Intermediate Code Generator, Code Generator, and Code Optimizer are
the parts of this phase.
Target
Program
Source
Program Analysis Synthesis
DEPT. OF CSE, SJBIT Page 5
Phases of A Compiler
Lexical
Analyzer
Semantic
Analyzer
Syntax
Analyzer
Intermediate
Code Generator
Code
Optimizer
Code
Generator
Target
Program
Source
Program
Each phase transforms the source program from one representation
into another representation.
They communicate with error handlers.
They communicate with the symbol table.
Symbol table
Symbol table
Compiler v/s interpreter
Source program target program
Input output
Source program output
Input
Assignments
1. What is the difference between a compiler and an interpreter?
2. what are advantages of (i) a compiler over an interpreter (ii) an interpreter over a
compiler?
Compiler
Target program
Interpreter
Error handler
Front end Back end
DEPT. OF CSE, SJBIT Page 6
Translation of an assignment
Pisition = initial + rate * 60
< = > < + > < * > < 60> t1 = id3 * 60.0
id1 = id2 + t1
=
+
*
60
LDF R2, ID3
= MULF R2, R2, #60.0
+ LDF R1, ID2
* ADDF R1,R1, R2
inttofloat STF ID1, R1
60
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id3 + t2
id1 = t3
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code Generator
Code Optimizer
Code Generator
Lexical Analyzer
Lexical Analyzer reads the source program character by character and returns the
tokens of the source program.
Lexical Analyzer also called as Scanner.
It reads the stream of characters making up the source program and groups the
characters into meaningful sequences called Lexemes.
For each lexeme, it produces as output a token of the form
token_name is an abstract symbol that is used during syntax analysis, and
the second component attribute_value points to an entry in the symbol
table for this token.
A token describes a pattern of characters having same meaning in the source
program. (such as identifiers, operators, keywords, numbers, delimeters and so
on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number
Puts information about identifiers into the symbol table not all attributes.
Regular expressions are used to describe tokens (lexical constructs).
A (Deterministic) Finite State Automaton(DFA) can be used in the
implementation of a lexical analyzer.
Blank spaces removed
Syntax Analyzer
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the
given program.
A syntax analyzer is also called as a parser.
A parse tree describes a syntactic structure.
In a parse tree, all terminals are at leaves.
All inner nodes are non-terminals in a context free grammar
The syntax of a language is specified by a context free grammar (CFG).
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules implied by a
CFG or not.
1. If it satisfies, the syntax analyzer creates a parse tree for the given
program.
Ex: We use BNF (Backus Naur Form) to specify a CFG
1. EIdentifier
2. ENumber
3. EE+E
4. EE*E
5. E(E)
Where E is an expression
Syntax Analyzer list out the followings
By rule 1, newval is an expresiion
By rule 1, oldval is an expression
By rule 2, 12 is an expression
By rule 3, we get oldval+12 is an expression
Syntax Analyzer versus Lexical Analyzer
Which constructs of a program should be recognized by the lexical analyzer, and
which ones by the syntax analyzer?
Both of them do similar things; But the lexical analyzer deals with simple
non-recursive constructs of the language.
The syntax analyzer deals with recursive constructs of the language.
=
newval +
oldval 12
DEPT. OF CSE, SJBIT Page 8
The lexical analyzer simplifies the job of the syntax analyzer.
The lexical analyzer recognizes the smallest meaningful units (tokens) in a
source program.
The syntax analyzer works on the smallest meaningful units (tokens) in a
source program to recognize meaningful structures in our programming
language.
Parsing Techniques
Depending on how the parse tree is created, there are different parsing techniques.
These parsing techniques are categorized into two groups:
Top-Down Parsing, Bottom-Up Parsing
Top-Down Parsing:
Construction of the parse tree starts at the root, and proceeds towards the
leaves.
Efficient top-down parsers can be easily constructed by hand.
Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL
Parsing).
Bottom-Up Parsing:
Construction of the parse tree starts at the leaves, and proceeds towards
the root.
Normally efficient bottom-up parsers are created with the help of some
software tools.
Bottom-up parsing is also known as shift-reduce parsing.
Operator-Precedence Parsing simple, restrictive, easy to implement
LR Parsing much general form of shift-reduce parsing, LR, SLR, LALR
Semantic Analyzer
A semantic analyzer checks the source program for semantic errors and collects
the type information for the code generation.
Determines meaning of source string.
Matching of parenthesis
Matching if else stmt.
Checking scope of operation
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a context-free language
used in syntax analyzers.
Context-free grammars used in the syntax analysis are integrated with attributes
(semantic rules)
the result is a syntax-directed translation,
Attribute grammars
Ex:
newval = oldval + 12
The type of the identifier newval must match with type of the
expression (oldval+12)
Intermediate Code Generation
A compiler may produce an explicit intermediate codes representing the source
program.
Is a kind of code.
Easy to generate
Easily converted to target code
It can be in three address code or quadruples, triples etc
Ex:
newval = oldval + 1
id1 = id2 + 1
Intermediates Codes (Quadraples) 3- address code
t1 = id2 + 1 t1 = 12
id1= t1 t2 = id2 + 1
id1 = t2
Code Optimizer (for Intermediate Code Generator)
The code optimizer optimizes the code produced by the intermediate code
generator in the terms of time and space.
T1=id2*id3
Id1=t1+1
Code Generator
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file containing the machine
codes.
If target code is machine code then registers (memory locations) are selected for
each variable in pgm. Then intermediate instructions are translated to sequences
of machine instruction which perform same task.
Ex: ( assume that we have an architecture with instructions whose at least one of
its operands is a machine register)
MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1
Symbol Table Management
This record variable name used in the source program
This stores attributes of each name
o Eg: name, its type, its scope
o Method of passing each argument(by value or by reference)
o Return type
Implementation of symbol table can be done is either linear list or hash table. There will
be n entries e enquires to fetch information from this table.
If n and e are more than linear list method is poor in performance. But hash table is better
than list method
Grouping of phases into passes
In an implementation, activities from several phases may be grouped together
into a pass that reads an input file and writes an output file.
For example,
o front-end phases of lexical analysis, syntax analysis, semantic analysis and
from back end intermediate code generation might be grouped together
into one pass.
o Code optimization might be an optional pass.
o The there could be a back-end pass consisting of code generation for a
particular target machine.
CLASSIFICATION OF LANGUAGES
1.Based on generation:
a)First generation language-machine languages.
b)second generation languages-assembly languages.
c)Third generation languages-higher level languages.
d)Fourth generation languages-designed for specific applications like sql for database
applications.
e)Fifth generation languages-applied to logic and constraint based languages like prolog
and ops5.
Imperative languages:
languages in which a program specifies how a computation is to be done.
eg: c, c++.
Declarative languages:
languages in which a program specifies what computation is to be done.
eg: prolog.
The science of building a compiler
Compiler design deals with complicated real world problems.
First the problem is taken.
A mathematical abstraction is formulated.
Solve using mathematical techniques.
Right mathematical model and right algorithm.
Fundamental models finite state machine, regular expression, context free
grammar.
The science of code optimization
Optimization : It is an attempt made by compiler to produce code that is more efficient
than the obvious one.
Compiler optimizations-Design objectives
Must improve performance of many programs.
Optimization must be correct.
Compilation time must be kept reasonable.
Engineering effort required must be manageable.
APPLICATIONS OF COMPILER TECHNOLOGY
IMPLEMENTATION OF HIGH LEVEL PROGRAMING LANGUAGES.
The programmer expresses an alg using the Lang, and the compiler must translate
that prog to the target language.
Generally HLP langs are easier to program in, but are less efficient, i.e., the target
prog runs more slowly.
Programmers using LLPL have more control over a computation and can produce
more efficient code.
Unfortunately, LLP are harder to write and still worse less portable, more prone to
errors and harder to maintain.
Optimizing compilers include techniques to improve the performance of general
code, thus offsetting the inefficiency introduced by HL abstractions.
OPTIMIZATIONS FOR COMPUTER ARCHITECTURES
The rapid evolution of comp architecture has also led to an insatiable demand for
a new complier technology.
Almost all high performance systems take advantage of the same basic 2
techniques: parallelism and memory hierarchies.
Parallelism can be found at several levels : at the instruction level, where multiple
operations are executed simultaneously and at the processor level, where different
threads of same application are run on different processors.
Memory hierarchies are a response to the basic limitation that we can built very
fast storage or very large storage, but not storage that is both fast and large.
PARALLELISM
All modern microprocessors exploit instruction-level parallelism. this can be
hidden from the programmer.
The hardware scheduler dynamically checks for dependencies in the sequential
instruction stream and issues them in parallel when possible.
Whether the hardware reorders the instruction or not, compilers can rearrange the
instruction to make instr-level parallelism more effective.
MEMORY HIERARCHIES.
a memory hierarchy consists of several levels of storage with different speeds and
sizes.
a processor usually has a small number of registers consisting of hundred of bytes,
several levels of caches containing kilobytes to megabytes, and finally secondary
storage that contains gigabytes and beyond.
Correspondingly, the speed of accesses between adjacent levels of the hierarchy
can differ by two or three orders of magnitude.
The performance of a system is often limited not by the speed of the processor but
by the performance of the memory subsystem.
While compliers traditionally focus on optimizing the processor execution, more
emphasis is now placed on making the memory hierarchy more effective.
DESIGN OF NEW COMPUTER ARCHITECTURES.
In modern computer arch development, compilers are developed in the processordesign
stage, and compiled code running on simulators, is used to evaluate the
proposed architectural design.
One of the best known ex of how compilers influenced the design of computer
arch was the invention of RISC (reduced inst-set comp) arch.
Over the last 3 decades, many architectural concepts have been proposed. they
include data flow machines, vector machines, VLIW(very long inst word)
machines, multiprocessors with shared memory, and with distributed memory.
The development of each of these architectural concepts was accompanied by the
research and development of corresponding compiler technology.
Compiler technology is not only needed to support programming of these arch,
but also to evaluate the proposed architectural designs.
PROGRAM TRANSLATIONS
Normally we think of compiling as a translation of a high level Lang to machine level
Lang, the same technology can be applied to translate between diff kinds of
languages.
The following are some of the imp applications of program translation techniques:
BINARY TRANSLATION
Compiler technology can be used to translate the binary code for one machine to
that of another, allowing a machine to run programs originally compiled for
another instr set.
This tech has been used by various computer companies to increase the
availability of software to their machines.
HARDWARE SYNTHESIS
Not only is most software written in high level languages, even hardware designs
are mostly described in high level hardware description languages like verilog and
VHDL(very high speed integrated circuit hardware description languages).
Hardware designs are typically described at the register transfer level (RTL).
Hardware synthesis tools translate RTL descriptions automatically into gates
which are then mapped to transistors and eventually to a physical layout. This
process takes long hours to optimize the circuits unlike compilers for
programming langs.
DATABASE QUERY INTERPRETERS
Query languages like SQL are used to search databases.
These database queries consist of relational and Boolean operators.
They can be compiled into commands to search a database for records satisfying
the needs.
SOFTWARE PRODUCTIVITY TOOLS
Several ways in which program analysis, building techniques originally developed to
optimize code in compilers, have improved software productivity.
TYPE CHECKING
BOUNDS CHECKING
MEMORY MANAGEMENT TOOLS.
Programming Language Basics
Static/Dynamic Distinction
The language uses a policy that allows the compiler to decide an issue then that
language uses staticor issue decided at compile time.
The decision is made during execution of a program is said to be dynamic or
decision at run time.
Scope of declarations.
Eg:public static int x;
Environments and States Programming language changes occuring as the program runs affect the values of
data elements.eg:x=y+1;
Names locations values.
Eg:
..
Int i;
..
Void f(
.)
{
int I;
..
i=3;
}
x=i+1;
Static Scope and Block Structure
Most of the languages,including C and its family uses static scope.The scope rules for
C are based on prgm struc,the scope of declaration is determined implicitly where the
declaration appears in the prgm
for java,c++ provide explicit control over scopes by
using the keywords like PUBLIC,PRIVATE,and PROTECTED
Static_scope rules for a language with blocks-grouping of declarations and
statements.. e.g.: C uses braces
Main()
{
int a=1;
int b=1;
{
int b=2;
{
int a=3;
Cout<< a< b)
return a;
else
return b;
}
2. Examples of tokens
Token Informal description Sample Lexemes
if Characters i, f if
else Characters e, l, s, e else
Comparis
on
< or> or <= or >= or == or != <=, !=
id Letter followed by letters and
digits
Pi, score, D2
Number Any numeric constant 3.14, 0.6, 20
Literal Anything but surrounded by
s
total= %d\n , core
dumped
In FORTRAN,
DO 5 I = 1.25 DO5I is a lexeme
DO 5 I = 1, 25 do- statement
Lexeme Token
int Keyword
Max Identifier
( Operator
a Identifier
, Operator
. .
. .
. .
Disadvantage of
Lexical Analyzer
Another disadvantage of LA is
Instead of if(a==b) statement if we mistype it as fi(a==b) then lexical analyzer will
not rectify this mistake.
Advantages of Lexical Analyzer
Lexical analysis v/s Parsing
Simplicity of design is the most important consideration.
Compiler efficiency is improved.
- can apply specialized techniques that serve only the lexical task
- Specialized buffering techniques for reading input characters can speed up the
compiler.
Compiler portability is enhanced.
- Input-device-specific peculiarities can be restricted to the lexical analyzer.
Lexical analysis v/s Parsing
Simplicity of design is the most important consideration.
Compiler efficiency is improved.
- can apply specialized techniques that serve only the lexical task
- Specialized buffering techniques for reading input characters can speed up the
compiler.
Compiler portability is enhanced.
- Input-device-specific peculiarities can be restricted to the lexical analyzer.
Input Buffering
To recognize tokens reading data/ source program from hard disk is done. Accessing
hard disk each time is time consuming so special buffer technique have been
developed to reduce the amount of overhead required.
- One such technique is two-buffer scheme each of which is alternately
loaded.
- Size of each buffer is N(size of disk block) Ex:4096 bytes
One read command is used to read N characters.
If fewer than N characters remain in the input file , then special character,
represented by eof, marks the end of source file.
.Sentinel is a special character that cannot be a part of source program. eof is used as
sentinel
Two pointers to the input are maintained
Pointer lexemeBegin, marks the beginning of the current lexeme, whose
extent we are attempting to determine.
Pointer forward scans ahead until a pattern match if found. Algorithm for i/p buffer (or)
15
Lookahead code with sentinels
Switch(*forward++)
{
case eof:
if (forward is at end of first buffer)
{ reload second buffer;
forward = beginning of second buffer;
}
else if (forward is at end of second buffer)
{ reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input terminate lexical analysis*/
break;
case for the other characters
}
Terminology of Languages
Alphabet : a finite set of symbols (ASCII characters)
Editors