Context Free Grammars

Automata

Author

Prof. Calvin

Sketch

Context Free Grammars
- Rules
- Formal Definition
- Examples
- Ambiguity

Rules

Context Free Grammars

\[ \begin{align*} G_1& \\ &\left. \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \right\} \quad \text{(Substitution) Rules} \end{align*} \]

We use $\varepsilon$ to denote the empty string.

Rules

Term these “rules”
They are of form:
- Symbol
- →
- String of symbols

Terms

Rule: Statements of form Variable → (string of symbols and terminals)
Variable: Those symbols on the left-hand side (LHS) of a → in a rule
Terminals: Those symbols which appear in only on the right-hand side (RHS)
Starting variable: The topmost symbol.

3 Rules

$S \rightarrow 0S1$
$S \rightarrow R$
$R \rightarrow \varepsilon$

2 Variables / 2 Terminals

Variables
- $S$
- $R$
Terminals
- $0$
- $1$

Generation

Write down the start variable.
Substitute the start variable according to any rule.
Substitute any variable for any rule until no variables remain.

That is, only terminals.

Every possible string is the language $L(G)$

Example

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

$S$
$0S1$
$00S11$
$00R11$
$0011$

$S$
$0S1$
$0R1$
$01$

Graphically

$S$
$0S1$
$00S11$
$00R11$
$0011$

Graphically

Example

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

$L(G_1) = \{ 0^k1^k | k \in \mathbb{N} \}$
This is a language a CFG can do.
- That a DFA/NFA/GNFA cannot.

Shorthand

\[ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \]

\[ \begin{aligned} &S \rightarrow 0S1\quad|\quad R\\ &R \rightarrow \varepsilon \end{aligned} \]

Formal Definition

Context Free Grammar

A Context Free Grammar (CFG)
A CFG is a 4-tuple $(V, \Sigma, R, S)$
- $V$ : finite set of variables
- $\Sigma$ : finite set of terminal symbols
  - We note this is the alphabet, a la DFA
- $R$ finite set of rules
  - Of form $V \rightarrow (V \cup \Sigma)*$
- $S$ start variable

Substitutions

$\forall u,v \in (V \cup \Sigma)*$
$u \Rightarrow v := \exists R : R(u) \rightarrow v$
$u \overset{*}{\Rightarrow} v := \exists u_1, u_2, ... u_n : \underset{i \leq n}{\Rightarrow} u_i = v$
- $\overset{*}{\Rightarrow}$ is the transitive closure of $\Rightarrow$
- Term this a derivation of $v$ from $u$
- If $u = S$ term this the derivation of $v$
$L(G) = \{ w | w \in \Sigma* \land S \overset{*}{\Rightarrow} w \}$

Check in

Which of the following is a CFG

\[ \begin{align*} C_1& \\ &\left. \begin{aligned} &B \rightarrow 0B1 \\ &B1 \rightarrow 1B \\ &0B \rightarrow 0B \end{aligned} \right. \end{align*} \]

This has a context.

\[ \begin{align*} C_2& \\ &\left. \begin{aligned} &S \rightarrow 0S \quad | \quad S1 \\ &R \rightarrow RR \end{aligned} \right. \end{align*} \]

This happens to be the empty language.

Example

Arithmetic

\[ \begin{align*} G_2& \\ &\left. \begin{aligned} &E \rightarrow E+T \quad | \quad T\\ &T \rightarrow T\times F \quad | \quad F \\ &F \rightarrow ( E ) \quad | \quad a \end{aligned} \right. \end{align*} \]

\[ \begin{aligned} &V = \{ E, T, F \}\\ &\Sigma = \{+, \times, (, ), a \}\\ &R = \text{As shown}\\ &S = E \end{aligned} \]

Graphically

$E$
$E+T$
$T+T\times F$
$F+F\times a$
$a+a\times a$

Graphically

Takeaways

This is a viable way to decribe a programming language.
- In fact I wrote a series of grammars for my doctoral thesis (pg 18).
Parse trees may encode e.g. precedence
- $\times$ over $+$
If a string can be derived by different substitutions it is ambigious.

Ambiguity

Arithmetic

\[ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} \]

These represent the same language ($L(G_2) = L(G_3)$)!
But $G_3$ is ambigious!

$a + a \times a$

Exercise

Python

\[ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} \]

Express CFG $G_3$ in Python.

Solution

from itertools import count # infinite iterator
from itertools import combinations_with_replacement as cwr
# define variables, so we can use them
rules = (
  lambda s : s.replace("E", "E+E"),
  lambda s : s.replace("E", "ExE"), 
  lambda s : s.replace("E", "(E)"), 
  lambda s : s.replace("E", "a")
)
trans = lambda fs, s: fs[0](trans(fs[1:], s)) if len(fs) else s

G_2 = (trans(fs,'E') for n in count() for fs in cwr(rules, n))

next(G_2), next(G_2), next(G_2)

('E', 'E+E', 'ExE')

This isn’t quite right. What’s wrong?

End of Day :)

Good work!

--- title: "Context Free Grammars" --- ## Sketch ::: {.nonincremental} - Context Free Grammars - Rules - Formal Definition - Examples - Ambiguity ::: # Rules ## Context Free Grammars $$ \begin{align*} G_1& \\ &\left. \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} \right\} \quad \text{(Substitution) Rules} \end{align*} $$ - We use $\varepsilon$ to denote the empty string. ## Rules - Term these "rules" - They are of form: - Symbol - → - String of symbols ## Terms - *Rule*: Statements of form *Variable* → (string of symbols and terminals) - *Variable*: Those symbols on the left-hand side (LHS) of a → in a rule - *Terminals*: Those symbols which appear in only on the right-hand side (RHS) - *Starting* variable: The topmost symbol. ## 3 Rules - $S \rightarrow 0S1$ - $S \rightarrow R$ - $R \rightarrow \varepsilon$ ## 2 Variables / 2 Terminals - Variables - $S$ - $R$ - Terminals - $0$ - $1$ ## Generation 1. Write down the start variable. 2. Substitute the start variable according to any rule. 3. Substitute any variable for any rule until no variables remain. - That is, only terminals. 4. Every possible string is the language $L(G)$ ## Example :::: {.columns} ::: {.column width="50%"} $$ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} $$ ::: ::: {.column width="25%"} - $S$ - $0S1$ - $00S11$ - $00R11$ - $0011$ ::: ::: {.column width="25%"} - $S$ - $0S1$ - $0R1$ - $01$ ::: :::: ## Graphically :::: {.columns} ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="S"]; 2 [label="0"]; 3 [label="S"]; 4 [label="1"]; 5 [label="0"]; 6 [label="S"]; 7 [label="1"]; 8 [label="R"]; 9 [label="ε"]; 1 -> 2 1 -> 3 1 -> 4 3 -> 5 3 -> 6 3 -> 7 6 -> 8 8 -> 9 } ``` ::: ::: {.column } :::{.nonincremental} - $S$ - $0S1$ - $00S11$ - $00R11$ - $0011$ ::: ::: :::: ## Graphically :::: {.columns} ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="S"]; 2 [label="0"]; 3 [label="S"]; 4 [label="1"]; 5 [label="0"]; 6 [label="S"]; 7 [label="1"]; 8 [label="R"]; 9 [label="ε"]; 1 -> 2 1 -> 3 1 -> 4 3 -> 5 3 -> 6 3 -> 7 6 -> 8 8 -> 9 } ``` ::: ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="S"]; 2 [label="0"]; 3 [label="S"]; 4 [label="1"]; 5 [label="0"]; 6 [label="S"]; 7 [label="1"]; 8 [label="R"]; 9 [label="ε"]; 1 -> 2 1 -> 3 1 -> 4 3 -> 5 3 -> 6 3 -> 7 6 -> 8 8 -> 9 2 -> 5 5 -> 9 9 -> 7 7 -> 4 } ``` ::: :::: ## Example :::: {.columns} ::: {.column width="50%"} $$ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} $$ ::: ::: {.column width="50%"} - $L(G_1) = \{ 0^k1^k | k \in \mathbb{N} \}$ - This is a language a CFG can do. - That a DFA/NFA/GNFA cannot. ::: :::: ## Shorthand :::: {.columns} ::: {.column width="50%"} $$ \begin{aligned} &S \rightarrow 0S1 \\ &S \rightarrow R \\ &R \rightarrow \varepsilon \end{aligned} $$ ::: ::: {.column width="50%"} $$ \begin{aligned} &S \rightarrow 0S1\quad|\quad R\\ &R \rightarrow \varepsilon \end{aligned} $$ ::: :::: # Formal Definition ## Context Free Grammar - A *Context Free Grammar* (CFG) - A CFG is a 4-tuple $(V, \Sigma, R, S)$ - $V$ : finite set of variables - $\Sigma$ : finite set of terminal symbols - We note this is the alphabet, a la DFA - $R$ finite set of rules - Of form $V \rightarrow (V \cup \Sigma)*$ - $S$ start variable ## Substitutions - $\forall u,v \in (V \cup \Sigma)*$ - $u \Rightarrow v := \exists R : R(u) \rightarrow v$ - $u \overset{*}{\Rightarrow} v := \exists u_1, u_2, ... u_n : \underset{i \leq n}{\Rightarrow} u_i = v$ - $\overset{*}{\Rightarrow}$ is the transitive closure of $\Rightarrow$ - Term this a *derivation* of $v$ from $u$ - If $u = S$ term this the *derivation* of $v$ - $L(G) = \{ w | w \in \Sigma* \land S \overset{*}{\Rightarrow} w \}$ ## Check in - Which of the following is a CFG :::: {.columns} ::: {.column width="50%"} $$ \begin{align*} C_1& \\ &\left. \begin{aligned} &B \rightarrow 0B1 \\ &B1 \rightarrow 1B \\ &0B \rightarrow 0B \end{aligned} \right. \end{align*} $$ - This has a context. ::: ::: {.column width="50%"} $$ \begin{align*} C_2& \\ &\left. \begin{aligned} &S \rightarrow 0S \quad | \quad S1 \\ &R \rightarrow RR \end{aligned} \right. \end{align*} $$ - This happens to be the empty language. ::: :::: # Example ## Arithmetic :::: {.columns} ::: {.column width="50%"} $$ \begin{align*} G_2& \\ &\left. \begin{aligned} &E \rightarrow E+T \quad | \quad T\\ &T \rightarrow T\times F \quad | \quad F \\ &F \rightarrow ( E ) \quad | \quad a \end{aligned} \right. \end{align*} $$ ::: ::: {.column width="50%"} $$ \begin{aligned} &V = \{ E, T, F \}\\ &\Sigma = \{+, \times, (, ), a \}\\ &R = \text{As shown}\\ &S = E \end{aligned} $$ ::: :::: ## Graphically :::: {.columns} ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="E"]; 2 [label="E"]; 3 [label="+"]; 4 [label="T"]; 5 [label="T"]; 6 [label="T"]; 7 [label="×"]; 8 [label="F"]; 9 [label="F"]; A [label="F"]; B [label="a"]; C [label="a"]; D [label="a"]; 1 -> 2 1 -> 3 1 -> 4 2 -> 5 4 -> 6 4 -> 7 4 -> 8 5 -> 9 6 -> A 8 -> B 9 -> C A -> D } ``` ::: ::: {.column } :::{.nonincremental} - $E$ - $E+T$ - $T+T\times F$ - $F+F\times a$ - $a+a\times a$ ::: ::: :::: ## Graphically :::: {.columns} ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="E"]; 2 [label="E"]; 3 [label="+"]; 4 [label="T"]; 5 [label="T"]; 6 [label="T"]; 7 [label="×"]; 8 [label="F"]; 9 [label="F"]; A [label="F"]; B [label="a"]; C [label="a"]; D [label="a"]; 1 -> 2 1 -> 3 1 -> 4 2 -> 5 4 -> 6 4 -> 7 4 -> 8 5 -> 9 6 -> A 8 -> B 9 -> C A -> D } ``` ::: ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="E"]; 2 [label="E"]; 3 [label="+"]; 4 [label="T"]; 5 [label="T"]; 6 [label="T"]; 7 [label="×"]; 8 [label="F"]; 9 [label="F"]; A [label="F"]; B [label="a"]; C [label="a"]; D [label="a"]; 1 -> 2 1 -> 3 1 -> 4 2 -> 5 4 -> 6 4 -> 7 4 -> 8 5 -> 9 6 -> A 8 -> B 9 -> C A -> D C -> 3 3 -> D D -> 7 7 -> B } ``` ::: :::: ## Takeaways - This is a viable way to decribe a programming language. - In fact I wrote a series of grammars for my [doctoral thesis](https://cd-public.github.io/papers/2021/CD2021.pdf) (pg 18). - Parse trees may encode e.g. precedence - $\times$ over $+$ - If a string can be *derived* by different substitutions it is *ambigious*. # Ambiguity ## Arithmetic $$ \begin{align*} G_2& \\ &\left. \begin{aligned} &E \rightarrow E+T \quad | \quad T\\ &T \rightarrow T\times F \quad | \quad F \\ &F \rightarrow ( E ) \quad | \quad a \end{aligned} \right. \end{align*} $$ $$ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} $$ - These represent the same language ($L(G_2) = L(G_3)$)! - But $G_3$ is ambigious! # $a + a \times a$ :::: {.columns} ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="E"]; 2 [label="E"]; 3 [label=""]; 4 [label="E"]; 5 [label=""]; 6 [label=""]; 7 [label="E"]; 8 [label=""]; 9 [label="E"]; A [label="a"]; B [label="+"]; C [label="a"]; D [label="×"]; E [label="a"]; 1 -> 2 1 -> 3 1 -> 4 2 -> 5 3 -> 6 4 -> 7 4 -> 8 4 -> 9 5 -> A 6 -> B 7 -> C 8 -> D 9 -> E } ``` ::: ::: {.column } ```{dot} // | echo: false // | fig-width: 500px digraph finite_automata { rankdir=TB; bgcolor="#191919"; node [fontcolor = "#ffffff", color = "#ffffff", fontsize="30"] edge [color = "#ffffff",fontcolor = "#ffffff"] node [shape=plaintext]; 1 [label="E"]; 2 [label="E"]; 3 [label="+"]; 4 [label="E"]; 5 [label="E"]; 6 [label=""]; 7 [label="E"]; 8 [label=""]; 9 [label=""]; A [label="a"]; B [label="+"]; C [label="a"]; D [label="×"]; E [label="a"]; 1 -> 2 1 -> 3 1 -> 4 2 -> 5 2 -> 6 2 -> 7 3 -> 8 4 -> 9 5 -> A 6 -> B 7 -> C 8 -> D 9 -> E } ``` ::: :::: # Exercise ## Python $$ \begin{align*} G_3& \\ &\left. \begin{aligned} &E \rightarrow E+E \quad | \quad E \times E \quad | \quad ( E ) \quad | \quad a\\ \end{aligned} \right. \end{align*} $$ - Express CFG $G_3$ in Python. ## Solution ```{python} from itertools import count # infinite iterator from itertools import combinations_with_replacement as cwr # define variables, so we can use them rules = ( lambda s : s.replace("E", "E+E"), lambda s : s.replace("E", "ExE"), lambda s : s.replace("E", "(E)"), lambda s : s.replace("E", "a") ) trans = lambda fs, s: fs[0](trans(fs[1:], s)) if len(fs) else s G_2 = (trans(fs,'E') for n in count() for fs in cwr(rules, n)) next(G_2), next(G_2), next(G_2) ``` - This isn't quite right. What's wrong? # End of Day :) - Good work!

Sketch

Rules

Context Free Grammars

Rules

Terms

3 Rules

2 Variables / 2 Terminals

Generation

Example

Graphically

Graphically

Example

Shorthand

Formal Definition

Context Free Grammar

Substitutions

Check in

Example

Arithmetic

Graphically

Graphically

Takeaways

Ambiguity

Arithmetic

\(a + a \times a\)

Exercise

Python

Solution

End of Day :)