CFG Removing ambiguity

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2022 Oct 5 3:50
Editor
Edited
Edited
2022 Oct 5 3:55
Refs
Refs
No algorithgm - make higher binding strength terminal to lower part of tree
  1. no grouping of sequences - make precedence grouping symbol
  1. Since there is no precedence - solve by binding strength factor : cannot be broken apart by an adjacent term : cannot be broken by small binding strength expressions : all Identifier : operand for operation like terminal
    • various left / right most derivation imply various parse tree
      • notion image
        → left, right distinct derivation go along

     

    Simplifications

    1. Remove λ-productions
    1. Remove unit productions
    1. Remove redundant symbols - unreachable, non-terminating
    We only consider CFLs that do not have the empty string λ since it makes easy to understand many claims and proofs. This does not lose generality
    • just by adding new start symbol method math: S_0 \rightarrow S \, | \, \lambda
    • To any nontrivial results, there is method of obtaining G' math: L(G') = L(G) \, \backslash \, {\lambda}
     

    Substitution rule

    G' constructed by replacing variable A,B with A' in G has same CFL - by first and second line, we can construct final line (y can be sentential form)

    Proof - easy

    • if math: w \in L(G), then math: w \in L(G')
    1. by assumption, math: S →^*_G w
    1. if this derivation does not use production math: A → x_1Bx_2, it is immediate that math: S \rightarrow^*_{G'} w
    1. if it does, at G, math: S →^*_G u_1Au_2 →_G u_1x_1Bx_2u_2 →_G u_1x_1y_jx_2u_2
    1. if it does, at G, math: S →^*_{G'} u_1Au_2 →_{G'} u_1x_1y_jx_2u_2
    1. we reach the same sentential form with G and G'
    1. by induction of rewriting steps, we have then part
    • if math: w \in L(G'), then math: w \in L(G) case also has same sentential form
     

    2. Removing redundant symbols

    All terminals in T are always terminating
    1. non-terminating : unable to derive any terminal strings
    1. unreachable : does not appear in any sentential form
    Redundant : non-terminating remove by substitution and unreachable removed directly

    Algorithms

    • detect terminating symbols
    A in V has a production that consists solely of terminating symbols then A is terminating - start from terminal and lambda
    • detect reachable symbols (terminal is symbol either)
    if A is reachable then all symbols in A are reachable - start from S
     

    3. Removing λ-productions

    λ-free : G property if G has no null production
    λ-variable : math: A →^+ λ
    If no λ-productions → length monotonically increasing
    1. Find set math: V_N of all λ-variables of G (a) Find all λ-productions and put that variable into math: V_N (b) Foll all production B → math: A_1A_2 \dots A_n, where math: A_1A_2 \dots A_n \in V_N then put B into math: V_N
    1. For each element A of math: V_N
      1. Remove all λ-productions
      1. Then finally remove all non-terminating symbols
      λ-productions = empty production = null production production means of 'or' part, not all of right side of p in P
       

      4. Removing unit productions

      Unit production : A → B by substitution rule

      Claim
      Given a CFG G = (V,T,P,S) without λ-productions, there is an equivalent CFG G' = (V 0,T,S,P') that does not have any unit productions
      Method
      If a unit production is A → A, then, we can remove it from P immediately without affecting L(G).
      In case of A → B, for each variable A, we find all variables such that A* ⇒ B using a dependency graph for variables in V based on P. (A*⇒B holds when there is a path from A to B in the graph.)
      1. Put into P' all non-unit productions of P
      And then remove unreachable
      2. For all A and B of A*⇒B(has transitive property), we add to P' A → y1 | y2 |···| yn, where B → y1 | y2 |···| yn is the set of all rules in P' with B on the left from not dependency which has not B
       
      • in this picture remove B
      • final is new + right down - unit production
      notion image

      So by these 4 Simplification, given a CFG, there is an equivalent CFG that does not have any redundant symbols, λ-productions or unit productions.
       
       
       
       
       
       
       
       

      Recommendations