Skip to content

Regular Expressions and Regular Languages

Regular Expressions (RE)

A regular expression describes a regular language using operators:

  • Union (R+S or RS): L(R+S)=L(R)L(S)
  • Concatenation (RS): L(RS)={xyxL(R),yL(S)}
  • Kleene Star (R): L(R)=i0L(R)i={ε}L(R)L(RR)

Examples

  • L(01)={01}
  • L(01+0)={01,0}
  • L(0)={ε,0,00,000,}
  • L(01)={0i1ji,j0}

Converting RE to ε-NFA

Theorem: For every regular expression, there is an ε-NFA that accepts the same language.

Proof by induction on the number of operators in the RE. Each constructed ε-NFA has exactly:

  • One start state (no incoming arcs)
  • One accepting state (no outgoing arcs)

Basis (0 operators)

  • Symbol a: Two states connected by a transition on a

Induction

Union (R+S):

  • Take ε-NFAs for R and S
  • Add new start and final states
  • ε-transitions from new start to R and S start states
  • ε-transitions from R and S final states to new final state

Concatenation (RS):

  • Take ε-NFAs for R and S
  • Merge R's final state with S's start state (via ε-transition)

Kleene Star (R):

  • Add new start and final states
  • ε-transition from new start to R start, from R final to new final
  • ε-transition from new start to new final (for ε in R)
  • ε-transition from R final back to R start (for repetition)

Converting DFA to Regular Expression

k-Path Method

States of the DFA are named 1,2,,n. A k-path is a path through the transition graph that goes through no intermediate state numbered higher than k.

Let Rij(k) = regular expression for the set of strings that take the DFA from state i to state j using only k-paths.

Recursive Construction

  • Base (k=0): Rij(0) is:

    • The union of labels of arcs from i to j, or if none
    • Plus ε if i=j
  • Induction: Rij(k)=Rij(k1)+Rik(k1)(Rkk(k1))Rkj(k1)

    Intuitively: go from i to j either without using state k, or go to k, loop at k zero or more times, then go from k to j.

Final RE

The language of the DFA = qjFR1j(n), where n is the number of states and 1 is the start state.


Equivalence of RE and Finite Automata

We now have a complete cycle:

REinductionε-NFAε-removalNFAsubset constructionDFAk-pathRE

All four representations define exactly the same class — regular languages.


Closure Properties of Regular Languages

Regular languages are closed under:

  • Union: Take RE R for L1, S for L2, then R+S for L1L2
  • Concatenation: RS for L1L2
  • Kleene Star: R for L1
  • Intersection: Product construction of DFAs
  • Complement: Swap final/non-final states in DFA
  • Difference: L1L2=L1L2
  • Reversal: Reverse all transitions, swap start/final, convert NFA to DFA
  • Homomorphism: Replace each symbol a by the RE for h(a)
  • Inverse Homomorphism: Modify DFA transition function

Proving Non-Regularity: Pumping Lemma for Regular Languages

Statement: If L is regular, then there exists n1 (pumping length) such that for all wL with |w|n, we can write w=xyz where:

  1. |xy|n
  2. |y|>0 (non-empty pumpable portion)
  3. For all k0, xykzL

Intuition: A DFA with n states, when processing a string of length n, must repeat a state (pigeonhole principle). The substring between occurrences can be pumped.

Strategy for Proving Non-Regularity

  1. Assume L is regular. Let n be the PL constant.
  2. Choose wL with |w|n (carefully — must work for ALL legal decompositions)
  3. Consider any decomposition w=xyz with |xy|n and |y|>0
  4. Find k such that xykzL
  5. ContradictionL is not regular

Classic Non-Regular Examples

  • L={0n1nn1}: Pick w=0n1n. Since |xy|n, y consists of only 0's. Pumping gives 0n+|y|(k1)1nL for k1.
  • L={ww has equal number of 0 and 1}: Same idea, y is all 0's.
  • L={0i1ji>j}: Pick w=0n+11n. Pumping down (k=0) removes 0's, giving at most n zeros and n ones — contradiction.