Term
What does lexical analysis consist of? 

Definition
 Splits the source into words or tokents
 Computes semantic values
 Convert numeric strings into numebrs
 Convert names into identifiers
 Discard white space (spaces & comments)



Term

Definition
A token is a sequence of characters that is treated as a unit in the grammar of the programming language.
A specific semantic value can have different lexemes:
Sematic value: REAL(42.0)
Lexemes: 4.2e1, 42.0, 42. 


Term
Why a separate lexical analysis makes syntax analysis simpler and easier? 

Definition
 if spaces and comments are allowed everywhere, we can rewrite "if n==/*base*/ 0 then return 0" to "IF ID(n) ROP(==) NUM(0) THEN RETURN NUM(0)"
 Tokens group characters and encapsulate variability
 Syntax rules become simpler
 Scanner uses more efficient mechanism than parser
 Finitestate machine vs. pushdown automaton



Term
What would be a theory based approach to lexical analysis? 

Definition
 Regular expressions to describe tokens
 Nondeterministic finite automaton
 Deterministic finite automaton
 Lineartime algorithm in C, Java to recognize tokns



Term
Define alphabet, string and language 

Definition
Alphabet is a finite set of symbols  string is a finite sequence of symbols in the alphabet  language is a possibly infinite set of string 


Term
What is a regular expression M? 

Definition
 concisely describes a regular language L(M)
 Finite representation of infinite sets
 Built form operators: a ε  • * ( ) +



Term
Define a finite automaton 

Definition
Consists of:
 A set of input symbols (alphabet) ∑
 a finite set of states S
 a finite set of labeled edges E proper subset of (∑ U ε) x S x S
 A set of final states F proper subset of S



Term
What is the difference between a deterministic and nondeterministic finite automata? 

Definition
Difference is in the edges:
 A non deterministic finite automaton (NFA)
 May have edges with label ε (εedges)
 May have multiple edges with the same label (starting at the same state)
 A deterministic finte automaton (DFA)
 Has no εedges
 Has at most one edge with label a for each state s and symbol a



Term
How can regular expressions be converted into finite automata? 

Definition
McNaughton/Yamada/Tompsonalgorithm:
 Recurses over structure fo regular expression
 Defines one pattern for each opearator
 X  Y would be state s1 and s2 where x and y would both have an edge from s1 to s2
 XY would be states s1, s2, s3, where s1 _{x}> s2 and s2 _{y}> s3
 X* would be s1 _{x}> s1
 uses εedges to glue together subautomata
 each automaton
 has a single accepting state
 has a start state with no incoming edge



Term
What is the formal definition of acceptance of a string by a DFA? 

Definition
 A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 of staets with
 s1 is the initial state
 sn+1 is a final state
 s1 _{a1}> s2 ... sn _{an}> Sn+1
 DFA accepts ε iff s1 is in F



Term
What is the formal definition of acceptance of a string by a DFA?


Definition
 A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 and t1 ... tn states with
 s1 is the initial state
 sn+1 is a final state
 t1 _{a1}> s2 ... tn _{an}> Sn+1
 ti is a εclosure(si) iff there exists a sequence of s2, ..., s1 states with
 DFA accepts ε iff s1 is in F



Term
How is a DFA runned over a string to check acceptace? 

Definition
Algorithm
 curr = s1;
 for i = 1 to n
 return curr in F;
w = string to be checked
E = edges represented as state array indexed by state x symbol
curr = current state 


Term
What do we need to define to convert an NFA into DFA? 

Definition
DFA state == set(NFA states)
DFA edge == NFA edge + εedges
 We let:
 s, t be NFA states
 T be a set of NFA states (= single DFA state)
 'a' be an input symbol
 We define
 move(T, a) = {t  Ê s in T and S a> t}
 εclosure(T) = union of all s in T of εclosure(s)



Term
What is the algorithm for a subset construction? 

Definition
Let D = set of DFA states
Let DTran[T, a] be the DFA transition table
 D = εclosure(s1);
 while (There exists unmarked state T in D) {
 mark T;
 for each (a in Σ) {
 V = εclosure(Move(T,a));
 if (V not in D) D = D U {V};
 DTran[T, a] = V;
 }
 }



Term
What is εclosure of a state s? 

Definition
It is all the states you can reach from state s by following all the ε paths 


Term
Intuitively explain the conversion from NFA to DFA 

Definition
http://www.youtube.com/watch?v=taClnxUnao
 We write a table where the columns are the transition inputs followed by ε*, and the rows are the states  initially we just have the starting state
 We take the starting state and see all the possible transitions with the inputs followed by ε* and we write the results in the respective cell all the states we can reach with that input ε*
 We then add all the states we obtained as a state (if they contain a final state, they are a final state, so we circle them
 We then draw all the states we obtained, and we join them with their corresponding inputs



Term

Definition
 Add "dead" state so that transitions are defined for all symbols in all states
 Partition states into final and nonfinal
 Partition each group so that two states s, t are in the same (new) subgroup iff for all a in ∑, s a> S' and t a> t' then s' and t' are in the same (old) group
 Repeat 3 until no further change
 Merge equivalent states
 Remove dead states



Term
Explain intuitively how to minimize a DFA 

Definition
http://www.youtube.com/watch?v=TvMEX2htBYw
 From the DFA we make two sets, the first set will hav all the final sets, and the second set will have the rest of the sets
 We take one set and for each element of that set we try all the inputs and observe which are the states it reaches.
 If the states it reaches are the same states that we have in the set we don't have to do anything
 Otherwise, if it reaches states outside its own set we classify it as a new set.
 If elements reach the same sets, they would belong to the same set
 We repeat S3 until we see no changes


