Normal Forms for Context-Free Grammars
A CFG can be transformed into equivalent but structurally simpler forms. These normal forms are essential for parsing algorithms (like CYK) and theoretical proofs.
Eliminating Useless Variables
A symbol is useful if it appears in some derivation
Two Kinds of Uselessness
1. Variables That Derive Nothing
Example:
Discovery Algorithm:
- Basis: If
where , then derives something - Induction: If
and all either are terminals or already discovered variables, then derives something - Elimination: Remove all undiscovered variables and all productions containing them
Proof: Induction on parse tree height — if
2. Unreachable Symbols
A symbol is reachable if it appears in some sentential form derived from
Discovery Algorithm:
- Basis:
is reachable - Induction: If
is reachable and , then all symbols in are reachable - Elimination: Remove unreachable symbols and their productions
Correct Two-Step Elimination
Important: Eliminate non-deriving variables FIRST, then unreachable symbols. Doing it in reverse order can leave useless symbols.
Why: Eliminating non-deriving variables may make previously reachable symbols unreachable.
Eliminating -Productions
A production of the form
Goal: Eliminate all
Nullable Symbols
Discovery Algorithm:
- Basis: If
, is nullable - Induction: If
and ALL are nullable, then is nullable
Elimination Method
Key idea: Turn each production
Example:
- Nullable:
(because of ), then (because ), then (because ) - New productions for
: (omit , omit , omit both — but removed) : (omit nullable in body) :
Handling
If
Eliminating Unit Productions
A unit production is one whose body consists of a single variable:
Finding Unit Pairs
Compute all pairs
Algorithm: Find all such pairs by transitive closure.
- Basis:
is a unit pair for all - Induction: If
is a unit pair and , then is a unit pair
Elimination
For each unit pair
- If
(where is not a single variable), add
Remove all unit productions.
Chomsky Normal Form (CNF)
A CFG is in Chomsky Normal Form if every production is of the form:
(two variables) (single terminal)
(Plus
CNF Theorem
Every CFL (without
Proof (Construction)
Step 0 (
Step 1 (Clean grammar): Eliminate useless variables,
Step 2: For every terminal
Example:
Step 3: Break right sides longer than 2 into a chain of productions, each with two variables.
Example:
Significance of CNF
- Every derivation of a string of length
uses exactly steps - Parse trees are binary — every interior node has exactly 2 children
- Enables the CYK parsing algorithm (
)