We are building EduLadder(ELADR) - Protocol

The Eladr Protocol is a decentralized, security and efficiency enhanced Web3 noSQL database powered by IPFS as the data storage layer https://ipfs.io/, and the Cardano block chain as the rewards token platform, https://cardano.org/. It provides a JSON based, IPFS layer 2 solution for data indexing and retrieval in an 'append only' file system built with open source Node.js API libraries.

The ELADR token was designed to incentivize and reward community members as a proof of contribution. Token holders are also granted access to EduLadder.com premium features as well as associated ELADR token enabled apps.


Real Problems! Real Experts!

Join Our Telegram Channel !

The Eduladder is a community of students, teachers, and programmers. We help you to solve your academic and programming questions fast.
In eduladder you can Ask,Answer,Listen,Earn and Download Questions and Question papers.
Watch related videos of your favorite subject.
Connect with students from different parts of the world.
Apply or Post Jobs, Courses ,Internships and Volunteering opportunity. For FREE
See Our team
Wondering how we keep quality?
Got unsolved questions? Ask Questions

You are here:Open notes-->VTU-->CD-notes-for-6th-sem

CD notes for 6th sem

Units and prelimenary requerments


8 Hours Introduction, Lexical analysis: Language processors; The structure of a Compiler; The evolution pf programming languages; The science of building a Compiler; Applications of compiler technology; Programming language basics. Lexical analysis: The Role of Lexical Analyzer; Input Buffering; Specifications of Tokens; Recognition of Tokens.

UNIT – 2 6 Hours Syntax Analysis – 1: Introduction; Context-free Grammars; Writing a Grammar. Topdown Parsing; Bottom-up Parsing.

UNIT – 3 6 Hours Syntax Analysis – 2: Top-down Parsing; Bottom-up Parsing.

UNIT – 4 6 Hours Syntax Analysis – 3: Introduction to LR Parsing: Simple LR; More powerful LR parsers (excluding Efficient construction and compaction of parsing tables) ; Using ambiguous grammars; Parser Generators.


UNIT – 5 7 Hours Syntax-Directed Translation: Syntax-directed definitions; Evaluation orders for SDDs; Applications of syntax-directed translation; Syntax-directed translation schemes.

UNIT – 6 6 Hours Intermediate Code Generation: Variants of syntax trees; Three-address code; Translation of expressions; Control flow; Back patching; Switch statements; Procedure calls.

UNIT – 7 6 Hours Run-Time Environments: Storage Organization; Stack allocation of space; Access to non-local data on the stack; Heap management; Introduction to garbage collection.

UNIT – 8 7 Hours DEPT. OF CSE, SJBIT Code Generation: Issues in the design of Code Generator; The Target Language; Addresses in the target code; Basic blocks and Flow graphs; Optimization of basic blocks; A Simple Code Generator

Text Books: 1. Alfred V Aho, Monica S.Lam, Ravi Sethi, Jeffrey D Ullman: Compilers- Principles, Techniques and Tools, 2nd Edition, Pearson Education, 2007. (Chapters 1, 3.1 to 3.4, 4 excluding 4.7.5 and 4.7.6, 5.1 to 5.4, 6.1, 6.2, 6.4, 6.6, 6.7 to 6.9, 7.1 to 7.5, 8.1 to 8.6.)

Reference Books: 1. Charles N. Fischer, Richard J. leBlanc, Jr.: Crafting a Compiler with C, Pearson Education, 1991. 2. Andrew W Apple: Modern Compiler Implementation in C, Cambridge University Press, 1997. 3. Kenneth C Louden: Compiler Construction Principles & Practice, Cengage Learning, 1997.

Preliminaries Required

• Basic knowledge of programming languages.

• Basic knowledge of DFA and NFA( FAFL concepts).

• Knowledge of a high programming language for the programming assignments. Textbook: Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Monica, “Compilers: Principles, Techniques, and Tools” Addison-Wesley, 2nd Edition. Course Outline

• Introduction to Compiling

• Lexical Analysis

• Syntax Analysis – Context Free Grammars – Top-Down Parsing, LL Parsing – Bottom-Up Parsing, LR Parsing

• Syntax-Directed Translation – Attribute Definitions – Evaluation of Attribute Definitions • Semantic Analysis, Type Checking

• Run-Time Organization • Intermediate Code Generation


􀁸 Lexical analysis: Language processors;

􀁸 The structure of a Compiler;

􀁸 The evolution of programming languages;

􀁸 The science of building a Compiler;

􀁸 Applications of compiler technology;

􀁸 Programming language basics.

􀁸 Lexical analysis: The Role of Lexical Analyzer;

􀁸 Input Buffering;

􀁸 Specifications of Tokens;

􀁸 Recognition of Tokens.


􀁸 Is a Software which process a program given in a certain source language

􀁸 Types:- compilers, Interpreters, Preprocessors, assembler ..etc COMPILER Pictorial representation of compiler is given below Source program COMPILER target program If target program is executable machine language then Input target program output INTERPRETER Pictorial representation of an Interpreter is given below Source program Interpreter Output Input HYBRID COMPILER Source program Translator Intermediate pgm i/p virtual machine o/p COMPILERS

• A compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language. source program COMPILER target program error messages Other Applications

• In addition to the development of a compiler, the techniques used in compiler design can be applicable to many problems in computer science. – Techniques used in a lexical analyzer can be used in text editors, information retrieval system, and pattern recognition programs. – Techniques used in a parser can be used in a query processing system such as SQL. – Many software having a complex front-end may need techniques used in compiler design.

• A symbolic equation solver which takes an equation as input. That program should parse the given input equation. – Most of the techniques used in compiler design can be used in Natural Language Processing (NLP) systems. Major Parts of Compilers

• There are two major parts of a compiler: Analysis and Synthesis

• In analysis phase, an intermediate representation is created from the given source program. – Lexical Analyzer, Syntax Analyzer and Semantic Analyzer are the parts of this phase.

• In synthesis phase, the equivalent target program is created from this intermediate representation. – Intermediate Code Generator, Code Generator, and Code Optimizer are the parts of this phase. Target Program Source Program Analysis Synthesis DEPT. OF CSE, SJBIT Page 5 Phases of A Compiler Lexical Analyzer Semantic Analyzer Syntax Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program Source Program

• Each phase transforms the source program from one representation into another representation.

• They communicate with error handlers.

• They communicate with the symbol table. Symbol table Symbol table Compiler v/s interpreter

• Source program target program

• Input output

• Source program output

• Input

• Assignments

• 1. What is the difference between a compiler and an interpreter?

• 2. what are advantages of (i) a compiler over an interpreter (ii) an interpreter over a compiler? Compiler Target program Interpreter Error handler Front end Back end DEPT. OF CSE, SJBIT Page 6 Translation of an assignment

• Pisition = initial + rate * 60 • < = > < + > < * > < 60> t1 = id3 * 60.0 id1 = id2 + t1 = + * 60 LDF R2, ID3 = MULF R2, R2, #60.0 + LDF R1, ID2 * ADDF R1,R1, R2 inttofloat STF ID1, R1 60 t1 = inttofloat(60) t2 = id3 * t1 t3 = id3 + t2 id1 = t3 Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Lexical Analyzer

• Lexical Analyzer reads the source program character by character and returns the tokens of the source program.

• Lexical Analyzer also called as Scanner.

• It reads the stream of characters making up the source program and groups the characters into meaningful sequences called Lexemes.

• For each lexeme, it produces as output a token of the form •

• token_name is an abstract symbol that is used during syntax analysis, and the second component attribute_value points to an entry in the symbol table for this token.

• A token describes a pattern of characters having same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimeters and so on) Ex: newval := oldval + 12 => tokens: newval identifier := assignment operator oldval identifier + add operator 12 a number

• Puts information about identifiers into the symbol table not all attributes.

• Regular expressions are used to describe tokens (lexical constructs).

• A (Deterministic) Finite State Automaton(DFA) can be used in the implementation of a lexical analyzer.

• Blank spaces removed Syntax Analyzer

 A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.

 A syntax analyzer is also called as a parser.

 A parse tree describes a syntactic structure.

• In a parse tree, all terminals are at leaves.

• All inner nodes are non-terminals in a context free grammar

• The syntax of a language is specified by a context free grammar (CFG).

• The rules in a CFG are mostly recursive.

• A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not. 1. If it satisfies, the syntax analyzer creates a parse tree for the given program. • Ex: We use BNF (Backus Naur Form) to specify a CFG

1. EIdentifier

2. ENumber

3. EE+E

4. EE*E 5. E(E)

• Where E is an expression Syntax Analyzer list out the followings By rule 1, newval is an expresiion By rule 1, oldval is an expression By rule 2, 12 is an expression By rule 3, we get oldval+12 is an expression Syntax Analyzer versus Lexical Analyzer

• Which constructs of a program should be recognized by the lexical analyzer, and which ones by the syntax analyzer? – Both of them do similar things; But the lexical analyzer deals with simple non-recursive constructs of the language. – The syntax analyzer deals with recursive constructs of the language. = newval + oldval 12 DEPT. OF CSE, SJBIT Page 8 – The lexical analyzer simplifies the job of the syntax analyzer. – The lexical analyzer recognizes the smallest meaningful units (tokens) in a source program. – The syntax analyzer works on the smallest meaningful units (tokens) in a source program to recognize meaningful structures in our programming language. Parsing Techniques

• Depending on how the parse tree is created, there are different parsing techniques.

• These parsing techniques are categorized into two groups: – Top-Down Parsing, Bottom-Up Parsing

• Top-Down Parsing: – Construction of the parse tree starts at the root, and proceeds towards the leaves. – Efficient top-down parsers can be easily constructed by hand. – Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing). • Bottom-Up Parsing: – Construction of the parse tree starts at the leaves, and proceeds towards the root. – Normally efficient bottom-up parsers are created with the help of some software tools. – Bottom-up parsing is also known as shift-reduce parsing. – Operator-Precedence Parsing – simple, restrictive, easy to implement – LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR Semantic Analyzer • A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation.

• Determines meaning of source string. – Matching of parenthesis – Matching if else stmt. – Checking scope of operation

• Type-checking is an important part of semantic analyzer.

• Normally semantic information cannot be represented by a context-free language used in syntax analyzers.

• Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules) – the result is a syntax-directed translation, – Attribute grammars

• Ex: newval = oldval + 12

• The type of the identifier newval must match with type of the expression (oldval+12) Intermediate Code Generation

• A compiler may produce an explicit intermediate codes representing the source program.

• Is a kind of code.

• Easy to generate

• Easily converted to target code

• It can be in three address code or quadruples, triples etc

• Ex: newval = oldval + 1 id1 = id2 + 1 Intermediates Codes (Quadraples) 3- address code t1 = id2 + 1 t1 = 12 id1= t1 t2 = id2 + 1 id1 = t2 Code Optimizer (for Intermediate Code Generator)

 The code optimizer optimizes the code produced by the intermediate code generator in the terms of time and space.

 T1=id2*id3  Id1=t1+1 Code Generator

 Produces the target language in a specific architecture.

 The target program is normally is a relocatable object file containing the machine codes.

 If target code is machine code then registers (memory locations) are selected for each variable in pgm. Then intermediate instructions are translated to sequences of machine instruction which perform same task.

 Ex: ( assume that we have an architecture with instructions whose at least one of its operands is a machine register) MOVE id2,R1 MULT id3,R1 ADD #1,R1 MOVE R1,id1 Symbol Table Management • This record variable name used in the source program

• This stores attributes of each name

o Eg: name, its type, its scope

o Method of passing each argument(by value or by reference)

o Return type Implementation of symbol table can be done is either linear list or hash table. There will be n entries e enquires to fetch information from this table. If n and e are more than linear list method is poor in performance. But hash table is better than list method Grouping of phases into passes

• In an implementation, activities from several phases may be grouped together into a pass that reads an input file and writes an output file.

• For example,

o front-end phases of lexical analysis, syntax analysis, semantic analysis and from back end intermediate code generation might be grouped together into one pass.

o Code optimization might be an optional pass.

o The there could be a back-end pass consisting of code generation for a particular target machine.


1.Based on generation:

a)First generation language-machine languages.

b)second generation languages-assembly languages.

c)Third generation languages-higher level languages.

d)Fourth generation languages-designed for specific applications like sql for database applications.

e)Fifth generation languages-applied to logic and constraint based languages like prolog and ops5.

Imperative languages: languages in which a program specifies how a computation is to be done. eg: c, c++.

Declarative languages: languages in which a program specifies what computation is to be done. eg: prolog. The science of building a compiler

• Compiler design deals with complicated real world problems.

• First the problem is taken.

• A mathematical abstraction is formulated.

• Solve using mathematical techniques.

 Right mathematical model and right algorithm.

 Fundamental models – finite state machine, regular expression, context free grammar. The science of code optimization Optimization : It is an attempt made by compiler to produce code that is more efficient than the obvious one. Compiler optimizations-Design objectives

 Must improve performance of many programs.

 Optimization must be correct.

 Compilation time must be kept reasonable.

 Engineering effort required must be manageable.


• The programmer expresses an alg using the Lang, and the compiler must translate that prog to the target language.

• Generally HLP langs are easier to program in, but are less efficient, i.e., the target prog runs more slowly.

• Programmers using LLPL have more control over a computation and can produce more efficient code.

• Unfortunately, LLP are harder to write and still worse less portable, more prone to errors and harder to maintain.

• Optimizing compilers include techniques to improve the performance of general code, thus offsetting the inefficiency introduced by HL abstractions.


• The rapid evolution of comp architecture has also led to an insatiable demand for a new complier technology.

• Almost all high performance systems take advantage of the same basic 2 techniques: parallelism and memory hierarchies.

• Parallelism can be found at several levels : at the instruction level, where multiple operations are executed simultaneously and at the processor level, where different threads of same application are run on different processors.

• Memory hierarchies are a response to the basic limitation that we can built very fast storage or very large storage, but not storage that is both fast and large.


• All modern microprocessors exploit instruction-level parallelism. this can be hidden from the programmer.

• The hardware scheduler dynamically checks for dependencies in the sequential instruction stream and issues them in parallel when possible.

• Whether the hardware reorders the instruction or not, compilers can rearrange the instruction to make instr-level parallelism more effective.


• a memory hierarchy consists of several levels of storage with different speeds and sizes.

• a processor usually has a small number of registers consisting of hundred of bytes, several levels of caches containing kilobytes to megabytes, and finally secondary storage that contains gigabytes and beyond.

• Correspondingly, the speed of accesses between adjacent levels of the hierarchy can differ by two or three orders of magnitude.

• The performance of a system is often limited not by the speed of the processor but by the performance of the memory subsystem.

• While compliers traditionally focus on optimizing the processor execution, more emphasis is now placed on making the memory hierarchy more effective.


• In modern computer arch development, compilers are developed in the processordesign stage, and compiled code running on simulators, is used to evaluate the proposed architectural design.

• One of the best known ex of how compilers influenced the design of computer arch was the invention of RISC (reduced inst-set comp) arch.

• Over the last 3 decades, many architectural concepts have been proposed. they include data flow machines, vector machines, VLIW(very long inst word) machines, multiprocessors with shared memory, and with distributed memory.

• The development of each of these architectural concepts was accompanied by the research and development of corresponding compiler technology.

• Compiler technology is not only needed to support programming of these arch, but also to evaluate the proposed architectural designs. PROGRAM TRANSLATIONS Normally we think of compiling as a translation of a high level Lang to machine level Lang, the same technology can be applied to translate between diff kinds of languages. The following are some of the imp applications of program translation techniques:


• Compiler technology can be used to translate the binary code for one machine to that of another, allowing a machine to run programs originally compiled for another instr set.

• This tech has been used by various computer companies to increase the availability of software to their machines.


• Not only is most software written in high level languages, even hardware designs are mostly described in high level hardware description languages like verilog and VHDL(very high speed integrated circuit hardware description languages).

• Hardware designs are typically described at the register transfer level (RTL). • Hardware synthesis tools translate RTL descriptions automatically into gates which are then mapped to transistors and eventually to a physical layout. This process takes long hours to optimize the circuits unlike compilers for programming langs.


• Query languages like SQL are used to search databases.

• These database queries consist of relational and Boolean operators.

• They can be compiled into commands to search a database for records satisfying the needs. SOFTWARE PRODUCTIVITY TOOLS Several ways in which program analysis, building techniques originally developed to optimize code in compilers, have improved software productivity.


Programming Language Basics Static/Dynamic Distinction

 The language uses a policy that allows the compiler to decide an issue then that language uses staticor issue decided at compile time.

 The decision is made during execution of a program is said to be dynamic or decision at run time.

 Scope of declarations.

 Eg:public static int x; Environments and States Programming language changes occuring as the program runs affect the values of data elements.eg:x=y+1;

Names locations values. Eg: ….. Int i; ….. Void f(….) { int I; ….. i=3; } x=i+1;

Static Scope and Block Structure Most of the languages,including C and its family uses static scope.The scope rules for C are based on prgm struc,the scope of declaration is determined implicitly where the declaration appears in the prgm…for java,c++ provide explicit control over scopes by using the keywords like PUBLIC,PRIVATE,and PROTECTED Static_scope rules for a language with blocks-grouping of declarations and statements.. e.g.: C uses braces Main() { int a=1; int b=1; { int b=2; { int a=3; Cout<< a< b) return a; else return b; }

2. Examples of tokens Token Informal description Sample Lexemes if Characters i, f if else Characters e, l, s, e else Comparis on < or> or <= or >= or == or != <=, != id Letter followed by letters and digits Pi, score, D2 Number Any numeric constant 3.14, 0.6, 20 Literal Anything but surrounded by “ ” ’s “total= %d\n” , “core dumped” In FORTRAN, DO 5 I = 1.25 DO5I is a lexeme DO 5 I = 1, 25 do- statement Lexeme Token int Keyword Max Identifier ( Operator a Identifier , Operator . . . . . . Disadvantage of Lexical Analyzer Another disadvantage of LA is Instead of if(a==b) statement if we mistype it as fi(a==b) then lexical analyzer will not rectify this mistake. Advantages of Lexical Analyzer Lexical analysis v/s Parsing

• Simplicity of design is the most important consideration.

• Compiler efficiency is improved. - can apply specialized techniques that serve only the lexical task - Specialized buffering techniques for reading input characters can speed up the compiler.

• Compiler portability is enhanced. - Input-device-specific peculiarities can be restricted to the lexical analyzer. Lexical analysis v/s Parsing

• Simplicity of design is the most important consideration.

• Compiler efficiency is improved. - can apply specialized techniques that serve only the lexical task - Specialized buffering techniques for reading input characters can speed up the compiler.

• Compiler portability is enhanced. - Input-device-specific peculiarities can be restricted to the lexical analyzer. Input Buffering To recognize tokens reading data/ source program from hard disk is done. Accessing hard disk each time is time consuming so special buffer technique have been developed to reduce the amount of overhead required. - One such technique is two-buffer scheme each of which is alternately loaded. - Size of each buffer is N(size of disk block) Ex:4096 bytes – One read command is used to read N characters. – If fewer than N characters remain in the input file , then special character, represented by eof, marks the end of source file. .Sentinel is a special character that cannot be a part of source program. eof is used as sentinel

• Two pointers to the input are maintained – Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are attempting to determine. – Pointer forward scans ahead until a pattern match if found. Algorithm for i/p buffer (or) 15 Lookahead code with sentinels Switch(*forward++) { case eof: if (forward is at end of first buffer) { reload second buffer; forward = beginning of second buffer; } else if (forward is at end of second buffer) { reload first buffer; forward = beginning of first buffer; } else /* eof within a buffer marks the end of input terminate lexical analysis*/ break; case for the other characters } Terminology of Languages • Alphabet : a finite set of symbols (ASCII characters)


You might like this video:Watch more here

Watch more videos from this user Here

Learn how to upload a video over here