Shared Flashcard Set


Comp2010 - Flashcard Set 1 - Lexical Analysis
Comp2010 - Flashcard Set 1 - Lexical Analysis
Computer Science
Undergraduate 2

Additional Computer Science Flashcards




What does lexical analysis consist of?
  • Splits the source into words or tokents
  • Computes semantic values
    • Convert numeric strings into numebrs
    • Convert names into identifiers
  • Discard white space (spaces & comments)
What is a token?

A token is a sequence of characters that is treated as a unit in the grammar of the programming language.

A specific semantic value can have different lexemes:

Sematic value: REAL(42.0)

Lexemes: 4.2e1, 42.0, 42.

Why a separate lexical analysis makes syntax analysis simpler and easier?
  • if spaces and comments are allowed everywhere, we can rewrite "if n==/*base*/ 0 then return 0" to "IF ID(n) ROP(==) NUM(0) THEN RETURN NUM(0)"
  • Tokens group characters and encapsulate variability
    • Syntax rules become simpler
  • Scanner uses more efficient mechanism than parser
    • Finite-state machine vs. push-down automaton
What would be a theory based approach to lexical analysis?
  1. Regular expressions to describe tokens
    • Conversion 
  2. Non-deterministic finite automaton
    • Subset construction
  3. Deterministic finite automaton
    • Minimization
  4. Linear-time algorithm in C, Java to recognize tokns
Define alphabet, string and language
Alphabet is a finite set of symbols - string is a finite sequence of symbols in the alphabet - language is a possibly infinite set of string
What is a regular expression M?
  • concisely describes a regular language L(M)
    • Finite representation of infinite sets
    • Built form operators: a ε | • * ( ) + 
Define a finite automaton

Consists of:

  • A set of input symbols (alphabet) ∑
  • a finite set of states S
  • a finite set of labeled edges E proper subset of (∑ U ε) x S x S 
    • Notation : s1 -a> s2
  • A set of final states F proper subset of S
What is the difference between a deterministic and non-deterministic finite automata?

Difference is in the edges:

  • A non deterministic finite automaton (NFA)
    • May have edges with label ε (ε-edges)
    • May have multiple edges with the same label (starting at the same state)
  • A deterministic finte automaton (DFA)
    • Has no ε-edges
    • Has at most one edge with label a for each state s and symbol a
How can regular expressions be converted into finite automata?


  • Recurses over structure fo regular expression
    • Defines one pattern for each opearator
      • X | Y would be state s1 and s2 where x and y would both have an edge from s1 to s2
      • XY would be states s1, s2, s3, where s1 -x> s2 and s2 -y> s3
      • X* would be s1 -x> s1
    • uses ε-edges to glue together sub-automata
    • each automaton
      • has a single accepting state
      • has a start state with no incoming edge
What is the formal definition of acceptance of a string by a DFA?
  • A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 of staets with
    • s1 is the initial state
    • sn+1 is a final state
    • s1 -a1> s2 ... sn -an> Sn+1
  • DFA accepts ε iff s1 is in F



What is the formal definition of acceptance of a string by a DFA?


  • A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 and t1 ... tn states with
    • s1 is the initial state
    • sn+1 is a final state
    • t1 -a1> s2 ... tn -an> Sn+1
    • ti is a ε-closure(si) iff there exists a sequence of s2, ..., s-1 states with
      • s -ε> s2, ... sn-1 -ε> t
  • DFA accepts ε iff s1 is in F
How is a DFA runned over a string to check acceptace?


  • curr = s1;
  • for i = 1 to n
    • curr:= E[curr, w[i]]; 
  • return curr in F;

w = string to be checked

E = edges represented as state array indexed by state x symbol

curr  = current state

What do we need to define to convert an NFA into DFA?

DFA state ==  set(NFA states)

DFA edge == NFA edge + ε-edges


  • We let:
    • s, t be NFA states
    • T be a set of NFA states (= single DFA state)
    • 'a' be an input symbol
  • We define
    • move(T, a) = {t | Ê s in T and S -a> t}
    • ε-closure(T) = union of all s in T of ε-closure(s)
What is the algorithm for a subset construction?

Let D = set of DFA states

Let DTran[T, a] be the DFA transition table 

  • D = ε-closure(s1);
  • while (There exists unmarked state T in D) {
    • mark T;
    • for each (a in Σ) {
      • V = ε-closure(Move(T,a));
      • if (V not in D) D = D U {V};
      • DTran[T, a] = V;
    • }
  • }
What is ε-closure of a state s?

It is all the states you can reach from state s by following all the ε paths

Intuitively explain the conversion from NFA to DFA

  1. We write a table where the columns are the transition inputs followed by ε*, and the rows are the states - initially we just have the starting state 
  2. We take the starting state and see all the possible transitions with the inputs followed by ε* and we write the results in the respective cell all the states we can reach with that input ε*
  3. We then add all the states we obtained as a state (if they contain a final state, they are a final state, so we circle them
  4. We then draw all the states we obtained, and we join them with their corresponding inputs
How to minimize a DFA?
  1. Add "dead" state so that transitions are defined for all symbols in all states
  2. Partition states into final and non-final
  3. Partition each group so that two states s, t are in the same (new) subgroup iff for all a in ∑, s -a> S' and t -a> t' then s' and t' are in the same (old) group
  4. Repeat 3 until no further change
  5. Merge equivalent states
  6. Remove dead states
Explain intuitively how to minimize a DFA

  1. From the DFA we make two sets, the first set will hav all the final sets, and the second set will have the rest of the sets
  2. We take one set and for each element of that set we try all the inputs and observe which are the states it reaches. 
    • If the states it reaches are the same states that we have in the set we don't have to do anything
    • Otherwise, if it reaches states outside its own set we classify it as a new set.
      • If elements reach the same sets, they would belong to the same set
  3. We repeat S3 until we see no changes
Supporting users have an ad free experience!