Compiler translates an input source program written in any high-level
programming language into an equivalent target program in machine
language. As compilation is a complex process, it is divided into
several phases. A phase is a reasonably interrelated procedure
that takes input in one representation and produces the output in
another representation. The structure of compiler comprises various
phases as shown in Figure
Lexical analysis phase: Lexical analysis (also known as scanning)
is the first phase of a compiler. Lexical analyzer or scanner reads the
source program in the form of character stream and groups the logically
related characters together that are known as lexemes. For each lexeme, a token
is generated by the lexical analyzer. A stream of tokens is generated
as the output of the lexical analysis phase, which acts as an input for
the syntax analysis phase. Tokens can be of different types, namely, keywords, identifiers, constants, punctuation symbols, operator symbols, etc. The syntax for any token is:
where token_name is the name or symbol which
is used during the syntax analysis phase and value is the location of
that token in the symbol table.
Syntax analysis phase: Syntax analysis phase is also known as parsing. Syntax analysis phase can be further divided into two parts, namely, syntax analysis and semantic analysis.
• Syntax analysis: Parser uses the token_name token from the token stream to generate the output in the form of a tree-like structure known as syntax tree or parse tree. The parse tree illustrates the grammatical structure of the token stream.
• Semantic analysis: Semantic analyzer
uses the parse tree and symbol table for checking the semantic
consistency of the language definition of the source program. The main
function of the semantic analysis is type checking in which semantic
analyzer checks whether the operator has the operands of matching type.
Semantic analyzer gathers the type information and saves it either in
the symbol table or in the parse tree.
Intermediate code generation phase:
In intermediate code generation phase, the parse tree representation of
the source code is converted into low-level or machine-like
intermediate representation. The intermediate code should be easy to
generate and easy to translate into machine language. There are several
forms for representing the intermediate code. Three address code is the
most popular form for representing intermediate code. An example of
three address code language is given below.
Code optimization phase:
Code optimization phase, which is an optional phase, performs the
optimization of the intermediate code. Optimization means making the
code shorter and less complex, so that it can execute faster and takes
lesser space. The output of the code generation phase is also an
intermediate code, which performs the same task as the input code, but
requires lesser time and space.
Code generation phase:
Code generation phase translates the intermediate code representation
of the source program into the target language program. If the target
program is in machine language, the code generator produces the target
code by assigning registers or memory locations to store variables
defined in the program and to hold the intermediate computation results.
The machine code produced by the code generation phase can be executed
directly on the machine.
Symbol table management: A symbol table
is a data structure that is used by the compiler to record and collect
information about source program constructs like variable names and all
of its attributes, which provide information about the storage space
occupied by a variable (name, type, and scope of the variables). A
symbol table should be designed in an efficient way so that it permits
the compiler to locate the record for each token name quickly and to
allow rapid transfer of data from the records.
Error handler: Error handler is invoked whenever any fault occurs in the compilation process of source program.
Both the symbol table management and error handling mechanisms are associated with all phases of the compiler.