Term 
        
        | What does lexical analysis consist of? |  
          | 
        
        
        Definition 
        
        
- Splits the source into words or tokents
 
- Computes semantic values
 
- Convert numeric strings into numebrs
 
- Convert names into identifiers
 
 
- Discard white space (spaces & comments)
 
  |  
          | 
        
        
         | 
        
        
        Term 
         | 
        
        
        Definition 
        
        A token is a sequence of characters that is treated as a unit in the grammar of the programming language. 
A specific semantic value can have different lexemes: 
Sematic value: REAL(42.0) 
Lexemes: 4.2e1, 42.0, 42.  |  
          | 
        
        
         | 
        
        
        Term 
        
        | Why a separate lexical analysis makes syntax analysis simpler and easier? |  
          | 
        
        
        Definition 
        
        
- if spaces and comments are allowed everywhere, we can rewrite "if n==/*base*/ 0 then return 0" to "IF ID(n) ROP(==) NUM(0) THEN RETURN NUM(0)"
 
- Tokens group characters and encapsulate variability
 
- Syntax rules become simpler
 
 
- Scanner uses more efficient mechanism than parser
 
- Finite-state machine vs. push-down automaton
 
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What would be a theory based approach to lexical analysis? |  
          | 
        
        
        Definition 
        
        
- Regular expressions to describe tokens
 
- Non-deterministic finite automaton
 
- Deterministic finite automaton
 
- Linear-time algorithm in C, Java to recognize tokns
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | Define alphabet, string and language |  
          | 
        
        
        Definition 
        
        | Alphabet is a finite set of symbols - string is a finite sequence of symbols in the alphabet - language is a possibly infinite set of string |  
          | 
        
        
         | 
        
        
        Term 
        
        | What is a regular expression M? |  
          | 
        
        
        Definition 
        
        
- concisely describes a regular language L(M)
 
- Finite representation of infinite sets
 
- Built form operators: a ε | • * ( ) + 
 
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | Define a finite automaton |  
          | 
        
        
        Definition 
        
        Consists of: 
- A set of input symbols (alphabet) ∑
 
- a finite set of states S
 
- a finite set of labeled edges E proper subset of (∑ U ε) x S x S 
 
- A set of final states F proper subset of S
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What is the difference between a deterministic and non-deterministic finite automata? |  
          | 
        
        
        Definition 
        
        Difference is in the edges: 
- A non deterministic finite automaton (NFA)
 
- May have edges with label ε (ε-edges)
 
- May have multiple edges with the same label (starting at the same state)
 
 
- A deterministic finte automaton (DFA)
 
- Has no ε-edges
 
- Has at most one edge with label a for each state s and symbol a
 
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | How can regular expressions be converted into finite automata? |  
          | 
        
        
        Definition 
        
        McNaughton/Yamada/Tompson-algorithm: 
- Recurses over structure fo regular expression
 
- Defines one pattern for each opearator
 
- X | Y would be state s1 and s2 where x and y would both have an edge from s1 to s2
 
- XY would be states s1, s2, s3, where s1 -x> s2 and s2 -y> s3
 
- X* would be s1 -x> s1
 
 
- uses ε-edges to glue together sub-automata
 
- each automaton
 
- has a single accepting state
 
- has a start state with no incoming edge
 
 
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What is the formal definition of acceptance of a string by a DFA? |  
          | 
        
        
        Definition 
        
        
- A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 of staets with
 
- s1 is the initial state
 
- sn+1 is a final state
 
- s1 -a1> s2 ... sn -an> Sn+1
 
 
- DFA accepts ε iff s1 is in F
 
 
   |  
          | 
        
        
         | 
        
        
        Term 
        
        What is the formal definition of acceptance of a string by a DFA? 
   |  
          | 
        
        
        Definition 
        
        
- A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 and t1 ... tn states with
 
- s1 is the initial state
 
- sn+1 is a final state
 
- t1 -a1> s2 ... tn -an> Sn+1
 
- ti is a ε-closure(si) iff there exists a sequence of s2, ..., s-1 states with
 
 
- DFA accepts ε iff s1 is in F
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | How is a DFA runned over a string to check acceptace? |  
          | 
        
        
        Definition 
        
        Algorithm 
- curr = s1;
 
- for i = 1 to n
 
- return curr in F;
 
 
w = string to be checked 
E = edges represented as state array indexed by state x symbol 
curr  = current state  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What do we need to define to convert an NFA into DFA? |  
          | 
        
        
        Definition 
        
        DFA state ==  set(NFA states) 
DFA edge == NFA edge + ε-edges 
  
- We let:
 
- s, t be NFA states
 
- T be a set of NFA states (= single DFA state)
 
- 'a' be an input symbol
 
 
- We define
 
- move(T, a) = {t | Ê s in T and S -a> t}
 
- ε-closure(T) = union of all s in T of ε-closure(s)
 
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What is the algorithm for a subset construction? |  
          | 
        
        
        Definition 
        
        Let D = set of DFA states 
Let DTran[T, a] be the DFA transition table  
- D = ε-closure(s1);
 
- while (There exists unmarked state T in D) {
 
- mark T;
 
- for each (a in Σ) {
 
- V = ε-closure(Move(T,a));
 
- if (V not in D) D = D U {V};
 
- DTran[T, a] = V;
 
 
- }
 
 
- }
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | What is ε-closure of a state s? |  
          | 
        
        
        Definition 
        
        It is all the states you can reach from state s by following all the ε paths  |  
          | 
        
        
         | 
        
        
        Term 
        
        | Intuitively explain the conversion from NFA to DFA |  
          | 
        
        
        Definition 
        
        http://www.youtube.com/watch?v=taClnxU-nao 
- We write a table where the columns are the transition inputs followed by ε*, and the rows are the states - initially we just have the starting state 
 
- We take the starting state and see all the possible transitions with the inputs followed by ε* and we write the results in the respective cell all the states we can reach with that input ε*
 
- We then add all the states we obtained as a state (if they contain a final state, they are a final state, so we circle them
 
- We then draw all the states we obtained, and we join them with their corresponding inputs
 
  |  
          | 
        
        
         | 
        
        
        Term 
         | 
        
        
        Definition 
        
        
- Add "dead" state so that transitions are defined for all symbols in all states
 
- Partition states into final and non-final
 
- Partition each group so that two states s, t are in the same (new) subgroup iff for all a in ∑, s -a> S' and t -a> t' then s' and t' are in the same (old) group
 
- Repeat 3 until no further change
 
- Merge equivalent states
 
- Remove dead states
 
  |  
          | 
        
        
         | 
        
        
        Term 
        
        | Explain intuitively how to minimize a DFA |  
          | 
        
        
        Definition 
        
        http://www.youtube.com/watch?v=TvMEX2htBYw 
- From the DFA we make two sets, the first set will hav all the final sets, and the second set will have the rest of the sets
 
- We take one set and for each element of that set we try all the inputs and observe which are the states it reaches. 
 
- If the states it reaches are the same states that we have in the set we don't have to do anything
 
- Otherwise, if it reaches states outside its own set we classify it as a new set.
 
- If elements reach the same sets, they would belong to the same set
 
 
 
- We repeat S3 until we see no changes
 
  |  
          | 
        
        
         |