Lexical Analysis in Compiler
Posted by Unknown
0
comments
One of the most important part of compiler is LEXICAL ANALYSIS. In the following posts we will going to discuss about this phase. It plays a initial role in the working of compiler
What
is lexical analyser?
Prerequisite
of lexical analyser?
Tasks
of lexical analyser?
Example
showing how lexical analyser identifies tokens?
Limitations
of lexical analyser
·
The
lexical analyser takes a source program as input, and produces a stream of
tokens as output.
·
Its tasks is to separate continuous
string of characters into groups that make sense. This group is called TOKEN.
·
A token may contain 1 character or
sequence of character. Sequence of character is known as LEXEMES.
·
In order to perform this task, lexical
analyser must know KEYWORDS, IDENTIFIERS, OPERATORS, DELIMITERS and
PUNCTUATIONS symbols of language to be implemented.
·
At the same time of forming character
into symbols, lexical analyser should deal with spaces and remove comments and
other character not relevant to later stages of analysis.
·
Specify token of language: using REGULAR
EXPRESSION
·
Efficiently recognize tokens: using
DETEMINISTIC FINITE AUTOMATA(DFA)
Example:
sum=oldsum-value/10
Lexemes
|
Token
|
sum
|
Identifier
|
=
|
Assignment
operator
|
oldsum
|
Identifier
|
-
|
Subtraction
operator
|
value
|
Identifier
|
/
|
Division
operator
|
10
|
Integer
constant
|
·
Although it identifies tokens but it
doesn’t know which token should come where.
·
Lexical analyser has no context to work
with. It means after processing 1 symbol it has no knowledge of symbols that
preceded or will follow this processed symbol.
·
It
does not look for garbled sequences, tokens out of place,
undeclared identifiers, misspelled keywords,
mismatched types
example
: int a double } switch b[2] =;
The scanner has no idea how
tokens are grouped. In the above sequence, it
returns b, [, 2,
and ] as four separate tokens, having no idea they collectively form an array
access




