Ayush Sharma (2019101004)
Code for following conversion :
- Regular Expression → NFA [ code in q1.py ]
- NFA → DFA [ code in q2.py ]
- DFA → Regular Expression [ code in q3.py ]
- Minimizing a DFA [ code in q4.py ]
Working Code Video link:
- One Drive Link - https://iiitaphyd-my.sharepoint.com/:v:/g/personal/ayush_sharma_students_iiit_ac_in/EcnheHymBjlIlqr9yJGylmUBUBRsTA8SeE0LnNC1r2qwrw?e=klRil7
- Google Drive Link - https://drive.google.com/file/d/1tAZSP8VnlR3KN4RiIgzm3U0D832QHaSH/view
-
Checking input file format. Loading if correct using
json
module. -
Called
convertToNFA()
function, which does the following:--
If
regex
is empy string then it will returnNFA
containing 2 state with epsilon on the arc connecting them. -
Otherwise, I'll add character
.
, if possible, between the characters for recognising as concatenation operator for two NFA. -
Then, converted the given regex which is in
infix
format to new regex inpostfix
format using stack. Precedence order :()
>*
>.
>+
. -
Next,
makeNFA(postfixRegex)
is called, that utilises method known asThompson contruction rules
to finally produce theε-NFA
. (Explained in below points) -
Working of
makeNFA(postfixRegex)
function:- Made an empty stack.
- Iterate through
postfixRegex
. - If the character is either of ε or an alphabet , push NFA for that using
unitLenRegexToNFA(alphabet)
function in the stack. - When our expression contains union or concatenation, we take two NFAs from the stack and combine them with the operator. Then push the new NFA in the stack. Function used :
unionNFA(NFA1, NFA2)
orconcatenationNFA(NFA1,NFA2)
repectively. - In the kleene star operation, only one element from the stack is popped and the operation is done on it using
starNFA(NFA1)
. The new NFA is again pushed in the stack. - At the end of string
postfixRegex
, we have only one NFA in the stack. That is our finalε-NFA
.
-
This construction exploits these 2 facts:-
- Closure property of a NFA over
Kleene star
,Union
&concatenation
operation. ε-transitions
contributes nothing.
Note : Using ε-transitions
, this approach creates a regular expression from its elements. The ε-transitions
serve as a "glue" or "mortar" for the NFA subcomponents. Since concatenation with the empty string leaves a regular expression unchanged (concatenation ε with is the identity operation), hence ε-transition contributes nothing.
-
The NFA’s for single character regular expressions ε, a, b :
a/b/ε →(Q0)----------->((Q1)) ; Q0 start states & Q1 is final States
-
UNION : The NFA for the union of N1 and N2:
N1+N2
is made up of individual NFAs with the NFA acting as "glue." Individual accepting states can be removed and replaced with the general accepting state as following: -
Concatenation : The initial state of
N1
is the initial state of the whole NFA. The final state ofN1
becomes the initial state ofN2
. The final state ofN2
is the final state of the whole NFA. -
Kleene Star : The NFA for the star of N1:
N*
is made by havingε-transitions
in two ways:- from new Global start state to all old start states and from all final states to old start states.
- from new Global start state to all old start states , from all final states to old start states, from all final states to new Final state and rom new Global start state to new Final state.
A Deterministic Finite Automaton (DFA) is a good basis for a transition table since it has at most one edge from each state for a given symbol. We have to use subset construction to get rid of the ε-transitions.
Some Terms used :
ε-closure(s) : Set of NFA states reachable from NFA state s on ε-transitions alone. Implemented usng function computeNfaEpsilonClosureDict()
, which creates a dictionary for each state of NFA as key with it's value to be ε-closure()
.
ε-closure(T) : Set of NFA states reachable from set of states T on ε-transitions alone. Implemented using function epsilonClosure(stateList)
, which returns a list of state i.e. ε-closure(stateList)
.
-
Checking input file format. Loading if correct using
json
module. -
Called
convertNFAToDFA(NFA)
function, which does the following:-- Copy corresponding letter from
NFA
, as language is same for DFA. - Calculated power set of the NFA states. For DFA's state set.
- Then collected elements of power set having states as one of the NFA's final accept state. This collection is our DFA's final accept state.
- Created above described dicitionary named
unitStateEpsilonClosure
i.e. usingcomputeNfaEpsilonClosureDict()
function.[ Used concept ofDFS(Depth First Search)
. ] - Calculated
ε-closure()
for start state, which wll be our final DFA state. - Calculated
transition_function/transition_matrix
for final DFA by calculatingtransitionedState
i.e. set of states to which there is a transition on input symbol a from some DFA state in T. Then appending totransition_function/transition_matrix
list[T,a,epsilonClosure(transitionedState)]
. - Finaly, returned
objectFA(DFA_S,DFA_L,DFA_TM,DFA_SS,DFA_FS)
i.e. our final DFA.
- Copy corresponding letter from
def convertNFAToDFA(NFA):
DFA_L = NFA['letters'][:]
DFA_S = allStatesCombination(NFA['states'])
DFA_FS = getFinalState(NFA['final_states'],DFA_S)
for st in NFA['states']:
for st1 in NFA['states']:
visited[st1] = False
nfaEpsilonClosureDict[st1]=[]
unitStateEpsilonClosure[st] = computeNfaEpsilonClosureDict(st,NFA['transition_matrix'])
DFA_SS = [ epsilonClosure([sst]) for sst in NFA['start_states'] ]
DFA_TM = formTransitionMatrix(NFA['transition_matrix'], DFA_S, DFA_L)
return objectFA(DFA_S,DFA_L,DFA_TM,DFA_SS,DFA_FS)
For the required cnversion there 3 thumb rules to be followed:
- If the initial state has any incoming edges, construct a new initial state that does not have any incoming edges.
- Convert all final states in the DFA to non-final states and create a new single final state where there are multiple final states.
- If the final state has any outgoing edges, build a new final state that does not have any outgoing edges.
-
Checking input file format. Loading if correct using
json
module. -
Called
convertDFAToRegex(myDFA)
function, which does the following:--
Converted DFA → GNFA, by following steps:-
- Add a new start state with an ε-arc to the old start state.
- Add a new accept state with ε-arc from old accept state.
- If any arc has multiple labels we replace each with a single arc whose label is a union of the previous labels.
- Finally we add arc labelled
None
between states that had no arcs between them. - This last step won't change the language recognise because a transition labelled with
None
can never be used. - This has been achieved using function
makeGNFA(DFA)
that returnsnew start state
,new final state
anddeltaFunction
.deltaFunction
is new transition matrix kind of thing for GNFA.
-
Performed
state elimination
method until no states found in the GNFA other thannew start state
andnew final state
. Steps involved are:-- For each state
ripState
in DFA state, get parents & children nodes. - Performed the following arc-updation using function
updateArc()
whereQrip
isripState
i.e. to be removed andQi
is one of the parent and 'Qj` is one of the children :- - Finaly returned the value at the arc from
new start state
tonew final state
. This will be our required Regular Expression.
- For each state
-
Used Myhill-Nerode Theorem :
Myhill nerode theorem states that a language is regular if and only if the number of equivalence classes in R_L is finite moreover, if finite, the number of classes is the number of state in the smallest dfa for language L.
So followed the following steps to achive the minimization of DFA:-
-
Construct a table with all pairs of states (Qi, Qj) that aren't necessarily related [all are unmarked at first]. Achieve this by making class named
MinimizeDFA()
& putting following variable in__init__()
function while creating a instance for it.self.table = [ [ 0 for _ in range(len(self.s)) ] for _ in range(len(self.s))]
-
Consider and mark of state pair (Qi, Qj) in the DFA where Qi ∈ F and Qj ∉ F or vice versa. [The sequence of final states is denoted by the letter F.]. Function
__initializeTable(self)
accounts for the same. -
While not converge( basically no more partition of state space is possible) :-
- If the pair {δ (Qi, A), δ (Qi, A)} is labelled for any input alphabet, mark pair (Qi, Qj) if not marked.
-
In the reduced DFA, combine all the unmarked pairs (Qi, Qj) into a single state.