Explanation: Chomsky's Hierarchy of Languages
(related to Chapter: Classification of Formal Languages)
The definitions of the previously introduced types of grammars get (from type-3, over type-2 and type-1, to type-0) more and more general. This structure became known as the Chomsky hierarchy.
data:image/s3,"s3://crabby-images/8e04f/8e04fd1996a1a21d4b3c25448569b9be18003d90" alt="chomskyhierarchy1"
The question arises, whether or not these generalizations induce proper inclusions of the language classes generated by the grammars, i.e. if $$\mathcal L_3\subseteq\mathcal L_2\subseteq\mathcal L_1\subseteq\mathcal L_0,$$ or if $$\mathcal L_3\subset\mathcal L_2\subset\mathcal L_1\subset\mathcal L_0.$$
The following examples demonstrate that the inclusions are, indeed, proper. For instance, all regular languages are context-free languages but not vice-versa.
Example of a Regular Language ($\in\mathcal L_3\subset\mathcal L_2\subset\mathcal L_1\subset\mathcal L_0$)
- The grammar $G=(\{S,B,C\},\{a,b\},R,S)$ with the rules $R$: $$\begin{array}{rcl}
S&\to&aB,\\
B&\to&bC,\\
C&\to&\epsilon \mid aB
\end{array}$$ is regular since it allows the replacement of only one variable and every rule ends with a single variable.
- Example generation of $ababab$: $$\begin{array}{rcl}
S&\to&aB\\
&\to&abC\\
&\to&abaB\\
&\to&ababC\\
&\to&ababaB\\
&\to&abababC\\
&\to&ababab\\
\end{array}$$
- The grammar generates the language $$L(G)=\{(ab)^n\mid n\ge 1, n\in\mathbb N\}=\{ab,abab,ababab,abababab,\ldots\}.$$
- The generated language $L(G)$ is (trivially and by definitions of the grammars) is also a context-free, a context-sensitive and a phrase-structure one.
Example of a Context-Free, Not Regular Language ($\not\in\mathcal L_3$ and $\in\mathcal L_2\subset\mathcal L_1\subset\mathcal L_0$)
- The grammar $G=(\{S\},\{a,b\},R,S)$ with the rules $R$: $$\begin{array}{rcl}
S&\to&aSb\mid ab,\\
\end{array}$$ is context-free since it allows the replacement of only one variable. In contrast to the first example, every rule can involve more variables.
- Example generation of $aaabbb$: $$\begin{array}{rcl}
S&\to&aSb\\
&\to&aaSbb\\
&\to&aaabbb
\end{array}$$
- The grammar generates the language $$L(G)=\{a^nb^n\mid n\ge 1, n\in\mathbb N\}=\{ab,aabb,aaabbb,\ldots\}.$$
- Apparently, the language is not regular. It is not hard to see why it cannot be regular. The reason for this is the order of the symbols $a$ and $b$. If we wanted to generate the language $\{a^nb^n\}$ by a type-3 grammar, it would be necessary to "remember" the number $n$ of $a$'s we have generated first, before we can generate the same number of $b$'s. Because the value $n$ is not bounded from above, a type-3 language would have to allow an unlimited number of variables. This is not possible, a grammar cannot have an infinite number of variables, by definition.
- The generated language $L(G)$ is (trivially and by definitions of the grammars) is context-sensitive and a phrase-structure one, but not regular.
Example of a Context-sensitive, Not Context-free Language ($\not\in\mathcal L_3,\not\in\mathcal L_2,$ and $\in\mathcal L_1\subset\mathcal L_0$)
- The grammar $G=(\{S,B,C,D,E\},\{a,b,c\},R,S)$ with the rules $R$: $$\begin{array}{rcl}
S&\to&aBC\mid aSBC,\\
CB&\to&CE,\\
CE&\to&DE,\\
DE&\to&DC,\\
DC&\to&BC,\\
aB&\to&ab,\\
bB&\to&bb,\\
bC&\to&bc,\\
cC&\to&cc,\\
\end{array}$$ is context-sensitive, since any rule $P\to Q\in R$ fulfills the property $|P|\le|Q|.$ Since $S\to\epsilon$ is not included, $S$ may be part of a some $Q$ of rule (as is the case in the first rule of this example). Apparently, the language is not context-free.
- Example generation of $aabbcc$: $$\begin{array}{rcl}
S&\to&aSBC\\
&\to&aaBCBC\\
&\to&aaBCEC\\
&\to&aaBDEC\\
&\to&aaBDCC\\
&\to&aaBBCC\\
&\to&aabBCC\\
&\to&aabbCC\\
&\to&aabbcC\\
&\to&aabbcc\\
\end{array}$$
- The grammar generates the language $$L(G)=\{a^nb^nc^n\mid n\ge 1, n\in\mathbb N\}=\{abc,aabbcc,aaabbbccc,\ldots\}.$$
- It is not hard to see why it cannot be context-free. The symbol $c$ can only be generated, after some of the symbols $a,$ and then after some of the symbols $b$ have already been generated (only in this context). It is neither possible to start with a $c$ nor to continue with a $c$ after having generated only $a$'s.
Example of a Recursively-Enumarable, Not Context-Sensitive Language ($\not\in\mathcal L_3,\not\in\mathcal L_2,\not\in\mathcal L_1,$ and $\in\mathcal L_0$)
An example can be a language, the words of which code a terminating Turing machine. We will see later what it is and why this language is not context-sensitive.
Thank you to the contributors under CC BY-SA 4.0!
data:image/s3,"s3://crabby-images/75a92/75a927a6aac36996bfafb93ff7e0eb07aa4f5900" alt=""
- Github:
-
data:image/s3,"s3://crabby-images/792d1/792d1a4fe9be7f47943b37dceb0d6f4592553e6c" alt="bookofproofs"
References
Bibliography
- Erk, Katrin; Priese, Lutz: "Theoretische Informatik", Springer Verlag, 2000, 2nd Edition
- Hoffmann, Dirk: "Theoretische Informatik, 3. Auflage", Hanser, 2015
Footnotes