Context Free Grammars

31 Jan 04:30 PM

Prof. Calvin

CSCI 5100

Sketch

  • Context Free Grammars
    • Rules
    • Formal Definition
    • Examples
    • Ambiguity

Rules

Context Free Grammars

\[ \begin{align*} G_1& \\ &\left. \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \right\} \quad \text{(Substitution) Rules} \end{align*} \]

  • We use \(\varepsilon\) to denote the empty string.

Rules

  • Term these “rules”
  • They are of form:
    • Symbol
    • String of symbols

Terms

  • Rule: Statements of form Variable → (string of symbols and terminals)
  • Variable: Those symbols on the left-hand side (LHS) of a → in a rule
  • Terminals: Those symbols which appear in only on the right-hand side (RHS)
  • Starting variable: The topmost symbol.

3 Rules

  • \(S \rightarrow 0S1\)
  • \(S \rightarrow R\)
  • \(R \rightarrow \varepsilon\)

2 Variables / 2 Terminals

  • Variables
    • \(S\)
    • \(R\)
  • Terminals
    • \(0\)
    • \(1\)

Generation

  1. Write down the start variable.
  2. Substitute the start variable according to any rule.
  3. Substitute any variable for any rule until no variables remain.
  • That is, only terminals.
  1. Every possible string is the language \(L(G)\)

Example

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

  • \(S\)
  • \(0S1\)
  • \(00S11\)
  • \(00R11\)
  • \(0011\)
  • \(S\)
  • \(0S1\)
  • \(0R1\)
  • \(01\)

Graphically

finite_automata 1 S 2 0 1->2 3 S 1->3 4 1 1->4 5 0 3->5 6 S 3->6 7 1 3->7 8 R 6->8 9 ε 8->9

  • \(S\)
  • \(0S1\)
  • \(00S11\)
  • \(00R11\)
  • \(0011\)

Graphically

finite_automata 1 S 2 0 1->2 3 S 1->3 4 1 1->4 5 0 3->5 6 S 3->6 7 1 3->7 8 R 6->8 9 ε 8->9

finite_automata 1 S 2 0 1->2 3 S 1->3 4 1 1->4 5 0 2->5 3->5 6 S 3->6 7 1 3->7 9 ε 5->9 8 R 6->8 7->4 8->9 9->7

Example

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

  • \(L(G_1) = \{ 0^k1^k | k \in \mathbb{N} \}\)
  • This is a language a CFG can do.
    • That a DFA/NFA/GNFA cannot.

Shorthand

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

\[ \begin{aligned} &S \rightarrow 0S1\quad|\quad R\\ &R \rightarrow \varepsilon \end{aligned} \]

Formal Definition

Context Free Grammar

  • A Context Free Grammar (CFG)
  • A CFG is a 4-tuple \((V, \Sigma, R, S)\)
    • \(V\) : finite set of variables
    • \(\Sigma\) : finite set of terminal symbols
      • We note this is the alphabet, a la DFA
    • \(R\) finite set of rules
      • Of form \(V \rightarrow (V \cup \Sigma)*\)
    • \(S\) start variable

Substitutions

  • \(\forall u,v \in (V \cup \Sigma)*\)
  • \(u \Rightarrow v := \exists R : R(u) \rightarrow v\)
  • \(u \overset{*}{\Rightarrow} v := \exists u_1, u_2, ... u_n : \underset{i \leq n}{\Rightarrow} u_i = v\)
    • \(\overset{*}{\Rightarrow}\) is the transitive closure of \(\Rightarrow\)
    • Term this a derivation of \(v\) from \(u\)
    • If \(u = S\) term this the derivation of \(v\)
  • \(L(G) = \{ w | w \in \Sigma* \land S \overset{*}{\Rightarrow} w \}\)

Check in

  • Which of the following is a CFG

\[ \begin{align*} C_1& \\ &\left. \begin{aligned} &B \rightarrow 0B1 \\ &B1 \rightarrow 1B \\ &0B \rightarrow 0B \end{aligned} \right. \end{align*} \]

  • This has a context.

\[ \begin{align*} C_2& \\ &\left. \begin{aligned} &S \rightarrow 0S \quad | \quad S1 \\ &R \rightarrow RR \end{aligned} \right. \end{align*} \]

  • This happens to be the empty language.

Example

Arithmetic

\[ \begin{align*} G_2& \\ &\left. \begin{aligned} &E \rightarrow E+T \quad | \quad T\\ &T \rightarrow T\times F \quad | \quad F \\ &F \rightarrow ( E ) \quad | \quad a \end{aligned} \right. \end{align*} \]

\[ \begin{aligned} &V = \{ E, T, F \}\\ &\Sigma = \{+, \times, (, ), a \}\\ &R = \text{As shown}\\ &S = E \end{aligned} \]

Graphically

finite_automata 1 E 2 E 1->2 3 + 1->3 4 T 1->4 5 T 2->5 6 T 4->6 7 × 4->7 8 F 4->8 9 F 5->9 A F 6->A B a 8->B C a 9->C D a A->D

  • \(E\)
  • \(E+T\)
  • \(T+T\times F\)
  • \(F+F\times a\)
  • \(a+a\times a\)

Graphically

finite_automata 1 E 2 E 1->2 3 + 1->3 4 T 1->4 5 T 2->5 6 T 4->6 7 × 4->7 8 F 4->8 9 F 5->9 A F 6->A B a 8->B C a 9->C D a A->D

finite_automata 1 E 2 E 1->2 3 + 1->3 4 T 1->4 5 T 2->5 D a 3->D 6 T 4->6 7 × 4->7 8 F 4->8 9 F 5->9 A F 6->A B a 7->B 8->B C a 9->C A->D C->3 D->7

Takeaways

  • This is a viable way to decribe a programming language.
  • Parse trees may encode e.g. precedence
    • \(\times\) over \(+\)
  • If a string can be derived by different substitutions it is ambigious.

Ambiguity

Arithmetic

\[ \begin{align*} G_2& \\ &\left. \begin{aligned} &E \rightarrow E+T \quad | \quad T\\ &T \rightarrow T\times F \quad | \quad F \\ &F \rightarrow ( E ) \quad | \quad a \end{aligned} \right. \end{align*} \]

\[ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} \]

  • These represent the same language (\(L(G_2) = L(G_3)\))!
  • But \(G_3\) is ambigious!

\(a + a \times a\)

finite_automata 1 E 2 E 1->2 3 1->3 4 E 1->4 5 2->5 6 3->6 7 E 4->7 8 4->8 9 E 4->9 A a 5->A B + 6->B C a 7->C D × 8->D E a 9->E

finite_automata 1 E 2 E 1->2 3 + 1->3 4 E 1->4 5 E 2->5 6 2->6 7 E 2->7 8 3->8 9 4->9 A a 5->A B + 6->B C a 7->C D × 8->D E a 9->E

Exercise

Python

\[ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} \]

  • Express CFG \(G_3\) in Python.

Solution

from itertools import count # infinite iterator
from itertools import combinations_with_replacement as cwr
# define variables, so we can use them
rules = (
  lambda s : s.replace("E", "E+E"),
  lambda s : s.replace("E", "ExE"), 
  lambda s : s.replace("E", "(E)"), 
  lambda s : s.replace("E", "a")
)
trans = lambda fs, s: fs[0](trans(fs[1:], s)) if len(fs) else s

G_2 = (trans(fs,'E') for n in count() for fs in cwr(rules, n))

next(G_2), next(G_2), next(G_2)
('E', 'E+E', 'ExE')
  • This isn’t quite right. What’s wrong?

End of Day :)

  • Good work!