According to Kleene's theorem, any regular expression you can associate a finite machine, which is a formal model of the algorithm for recognizing lexemes denoted by a given regular expression. In the most general terms, a state machine-the recognizer is determined by a finite set of characteristic states of the input stream and transitions between them. A state change occurs when symbols of the input stream are received from a given alphabet in accordance with the transition function, which determines possible subsequent states based on the input symbol and the current state. Among the possible states, the initial one stands out(initial) and final(allowing) states in which the finite automaton recognizer can be, respectively, at the beginning and completion of processing tokens of the input stream. If the input sequence of symbols can generate a sequence of transitions that can transfer the finite state machine from the initial state to one of the final states, then it is considered admitting and belongs to the regular set recognized by it.

(00|11)*((01|10)(00|11)*(01|10)(00|11)*)*

Table 1

Lexical analysis. Regular languages and finite state machines

by the total number of characters of the alphabet of symbols and operation signs and parentheses in the entry r.

Basis. Automata for expressions of length 1: and are shown in the following figure.

Rice. 5.1.

Note that for each of these three automata, the set of final states consists of a single state.

Induction step. Now suppose that for each regular expression of length = 1, the word w can be divided into k subwords: w=w 1 w 2 ... w k and that's it. For each i= 1,... ,k the word w i translates q 0 1 into q f 1 . Then for the word w in the diagram M there is a path

Hence, . Conversely, if some word translates q 0 into q f , then either it exists or it is carried by a path that, having passed from q 0 to q 0 1 and then passing several times along the path from q 0 1 to q f 1 and returning from q f 1 to q 0 1 by -transition, ultimately from q f 1 by -transition ends in q f . Therefore such a word.

From Theorems 4.2 and 5.1 we directly obtain

Corollary 5.1. For each regular expression, one can effectively build a deterministic finite state machine that recognizes the language represented by that expression.

This statement is one of the examples of synthesis theorems: based on the description of a task (language as a regular expression), a program (DFA) that performs it is effectively constructed. The converse is also true - a theorem of analysis.

Theorem 5.2. For each deterministic (or non-deterministic) finite state machine, it is possible to construct a regular expression that represents the language recognized by that machine.

The proof of this theorem is quite technical and beyond the scope of our course.

Thus, we can conclude that the class of finitely automata languages coincides with the class of regular languages. From now on we will simply call it the class of automata languages.

The automaton M r, which is constructed in the proof of Theorem 5.1

Basic definitions Regular expressions in the alphabet Σ and the regular sets they denote are defined recursively as follows: 1) – a regular expression denoting a regular set; 2) e – regular expression denoting a regular set (e); 3) if a Σ, then a is a regular expression denoting a regular set (a); 4) if p and q are regular expressions denoting regular sets P and Q, then a) (p+q) is a regular expression denoting P Q; b) pq – regular expression denoting PQ; c) p* – regular expression denoting P*; 5) nothing else is a regular expression.

Basic definitions Prioritization: * (iteration) – highest priority; concatenation; + (union). So 0 + 10* = (0 + (1 (0*))). Examples: 1. 01 means (01); 2. 0* – (0*); 3. (0+1)* – (0, 1)*; 4. (0+1)* 011 – means the set of all chains made up of 0 and 1 and ending with the chain 011; 5. (a+b) (a+b+0+1)* means the set of all chains (0, 1, a, b)* starting with a or b.

Basic definitions of the Lemma: 1) α + β = β + α 2) * = e 3) α + (β + γ) = (α + β) + γ 4) α(βγ) = (αβ)γ 5) α( β + γ) = αβ + αγ 6) (α + β)γ = αγ + βγ 7) αe = eα = α 8) α = 9) α+α* = α* 10) (α*)* = α* 11) α+α = α 12) α+ = α

Relationship between RT and RM RM are languages generated by RT. For example: x = a+b, y = c+d, x X = (a, b), y Y = (c, d), x + y X Y = (a, b, c, d). Concatenation: xy XY = (ac, ad, bc, bd). k(u+o)t (k)(u, o)(t) = (whale, cat) or by Lemmas No. 5 and No. 6 k(u+o)t = whale + cat (whale, cat). Iteration: x = a, x* X* = (e, a, aaa, …), i.e. x* = e + xxx + …

Connection between RP and RM Iteration of concatenation and union: (xy)* = e + xyxyxy + … (x + y)* = e + (x + y)(x + y) + … = = e + xx + xy + yx + yy + xxx + … Example: 0 + 1(0+1)* (0) ((1) (0, 1)*) = (0, 1, 10, 11, 100, 101, 110, 111…). Union is commutative: x + y = y + x Concatenation is not: xy ≠ yx

Communication between RM and RM Examples for priority: x + yz (x, yz), (x + y)z (xz, yz), x + y* (e, x, y, yyy, yyyy, ...), (x + y)* (e, x, y, xx, xy, yx, yy, xxx, …), (xy)* (e, xyxy, …), xy* (x, xyy, xyyy, …). New lemmas: a* + e = a*; (a + e)* = a*; a*a* = a*; e* = e; etc.

Regular systems of equations Equation with regular coefficients X = a. X + b has a solution (smallest fixed point) a*b: aa*b + b = (aa* + e)b = a*b System of equations with regular coefficients: X 1 = α 10 + α 11 X 1 + α 12 X 2 + … + α 1 n. Xn X 2 = α 20 + α 21 X 1 + α 22 X 2 + … + α 2 n. Xn…………………………. . Xn = αn 0 + αn 1 X 1 + αn 2 X 2 + … + αnn. Xn Unknowns – Δ = (X 1, X 2, …, Xn).

Regular systems of equations Solution algorithm (Gauss method): Step 1. Set i = 1. Step 2. If i = n, go to step 4. Otherwise, write the equations for Xi in the form Xi = αXi + β (β = β 0 + βi +1 Xi+1 + … + βn. Xn). Then, on the right-hand sides for the equations Xi+1, ..., Xn, we replace Xi with the regular expression α*β. Step 3. Increase i by 1 and return to step 2. Step 4. Write the equation for Xn as Xn = αXn + β. Go to step 5 (with i = n). Step 5. The equation for Xi is Xi = αXi + β. Write at the output Xi = α*β, in the equations for Xi– 1, …, X 1, substituting α*β instead of Xi. Step 6. If i = 1, stop, otherwise decrease i by 1 and return to step 5.

Transformation of DFA into RT For DFA M = (Q, Σ, δ, q 0, F) we compose a system with regular coefficients where Δ = Q: 1. set αij: = ; 2. if δ(Xi, a) = Xj, a Σ, then αij: = αij + a; 3. if Xi F or δ(Xi,) = HALT, then αi 0: = e. After solving, the desired PV will be X 1 = q 0.

Converting DFA to RV Example: for a fixed-point number we obtain the system q 0 = + q 0 + sq 1 + pq 2 + dq 3 + q 4 q 1 = + q 0 + q 1 + pq 2 + dq 3 + q 4 q 2 = + q 0 + q 1 + q 2 + q 3 + dq 4 q 3 = e + q 0 + q 1 + q 2 + dq 3 + pq 4 = e + q 0 + q 1 + q 2 + q 3 + dq 4 Here: s – sign of the number, s = “+” + “–”; p – decimal point, p = "."; d – numbers, d = “0” + “1” + … + “9”.

Converting DFA to RT Solution: q 0 = *(sq 1 + pq 2 + dq 3 + q 4 +) = sq 1 + pq 2 + dq 3 q 1 = + q 0 + q 1 + pq 2 + dq 3 + q 4 = pq 2 + dq 3, q 2 = + q 0 + q 1 + q 2 + q 3 + dq 4 = dq 4, q 3 = e + q 0 + q 1 + q 2 + dq 3 + pq 4 = dq 3 + pq 4 + e, q 4 = e + q 0 + q 1 + q 2 + q 3 + dq 4 = dq 4 + e. From the third equation: q 3 = dq 3 + pq 4 + e = d*(pq 4 + e). From the fourth equation: q 4 = dq 4 + e = d*.

Conversion of DFA to RT Reverse stroke: q 3 = d*(pq 4 + e) = d*(pd* + e), q 2 = dq 4 = dd*, q 1 = pq 2 + dq 3 = pdd* + dd *(pd* + e), q 0 = sq 1 + pq 2 + dq 3 = s(pdd* + dd*(pd* + e)) + pdd* + dd*(pd* + e). Thus, this DFA corresponds to the RT s(pdd* + dd*(pd* + e)) + pdd* + dd*(pd* + e). Let's simplify: s(pdd* + dd*(pd* + e)) + pdd* + dd*(pd* + e) = = spdd* + sdd*(pd* + e) + pdd* + dd*(pd* + e) = (s + e)(pdd* + dd*(pd* + e)) For a shorter notation, you can use the positive iteration aa* = a*a = a+: (s + e)(pdd* + dd* (pd* + e)) = (s + e)(pd+ + d+pd* + d+)

Converting DFA to RT Mapping the DFA transition function graph to basic operations with regular expressions: q 0 a b a q 1 q 2 q 1 q 0 a+b a b ab q 2 a*

Converting DFA to RT More complex combinations of operations: q 0 a q 1 b b b q 0 a q 2 q 1 (a + e)b c b q 0 q 2 ab(cab)* q 0 (a + b)* q 0 a q 1 aa* = a+ q 0 a q 1 a b a a a (ab)+ q 2 b q 1 c e + (a + b)c*

Conversion of DFA to RT For RT (s + e)(pd+ + d+(pd* + e)): q 0 p q 2 d s p q 1 d d q 3 d p q 4 d q 5 d

RV programming Regular expressions: Built into many programming languages (PHP, Java. Script, ...); Implemented as plug-in components (for example, the Regex class for the .NET platform). Differences in notation forms: x? = x + e x(1, 3) = x + xxx, etc.

Programming RT Constructs of the Regex class (System. Text. Regular. Expressions): Symbol Interpretation of the Escape sequence b When used in square brackets, it corresponds to the symbol “←” (u 0008) t, r, n, a, f, v Tab (u 0009), carriage return (u 000 D), new line (u 000 A), etc. c. X Control character (eg c. C is Ctrl+C, u 0003) e Escape (u 001 B) ooo ASCII character in octal xhh ASCII character in hexadecimal uhhhh Unicode character The following character is not a special RT character. This symbol must be used to escape all special characters. Example (the example shows a pattern and a search string, the found matches are underlined in the string): @"rnw+" – "rn. There are two lines here."

Programming RV Subsets of symbols. Any character except the end of the line (n) Any character from the set [^xxx] Any character except characters from the set Any character from the range ] Subtracting one set or range from another p(name) Any character specified by the Unicode category named name P (name) Any character other than those specified by the Unicode category name w The set of characters used in specifying identifiers W The set of characters not used in specifying identifiers s Spaces S All except spaces d Numbers D Non-digits Examples: @". +" – "rn. There are ntwo lines"; @"+" – "0 xabcfx"; @"[^fx]+" – "0 xabcfx"; @"+" – "0 xabcfx"; @"[^a-f]+" – "0 xabcfx"; @"]+" – "0 xabcfx"; @"p(Lu)" – "City Lights"; // Lu – capital letters @"P(Lu)" – "City"; @"p(Is. Cyrillic)" – "ha. OS"; //Is. Cyrillic – Russian letters

Programming PB Anchor ^, A At the beginning of the line $, Z At the end of the line or before the "n" character at the end of the line z At the end of the line G Where the previous match ends b Word boundary B Any position not on a word boundary Examples: @ "G(d)" – "(1)(3)(5)(9)"; // three matches (1), (2) and (3) @"bnS*ionb" – "nation donation"; @"Bendw*b" – "end sends endure lender".

Programming RT Operations (quantifiers) *, *? Iteration +, +? Positive iteration? , ? ? Zero or one match (n), (n)? Exactly n matches (n, ), (n, )? At least n matches (n, m), (n, m)? From n to m matches Examples (the first quantifiers are greedy, they look for as many elements as possible, the second are lazy, they look for as few elements as possible): @"d(3, )" – "888 -5555"; @"^d(3)" – "913 -913"; @"-d(3)$" – "913 -913"; @"5+? 5" – "888 -5555"; // three matches - 55, 55 and 55 @"5+5" - "888 -5555".

RV programming Grouping () A group that automatically receives a number (? :) Do not save the group (?) or (? "name") If a match is found, a named group is created (?) or Delete a previously defined group and (? "name - name") saving in a new group a substring between a previously defined group and a new group (? imnsx:) Enables or disables any of five (? –imnsx:) possible options in a group: i – case insensitive; s – one line (then “.” is any character); m – multi-line mode (“^”, “$” – the beginning and end of each line); n – do not capture unnamed groups; x – exclude non-escaped spaces from the pattern and include comments after the number sign (#) (? =) A zero-length positive lookahead statement

RT programming (? !) Negative zero-length lookahead statement (?) Non-returnable (greedy) part of an expression Examples: @"(an)+" – "bananas annals"; @"an+" – "bananas annals"; // compare, three matches - an, an and ann @"(? i: an)+" - "ba. NAnas annals"; @"+(? =d)" – "abc xyz 12 555 w"; @"(?

Src="https://site/presentation/-112203859_437213351/image-24.jpg" alt="RV Programming Links number Link to group k Link to named group Examples: @"> Программирование РВ Ссылки число Ссылка на группу k Ссылка на именованную группу Примеры: @"(w)1" – "deep"; @"(? w)k " – "deep". Конструкции изменения | Альтернатива (соответствует операции объединения) (? (выражение)да|нет) Сопоставляется с частью «да» , если выражение соответствует; в противном случае сопоставляется с необязательной частью «нет» (? (имя)да|нет), Сопоставляется с частью «да» , если названное имя (? (число)да|нет) захвата имеет соответствие; в противном случае сопоставляется с необязательной частью «нет» Пример: @"th(e|is|at)" – "this is the day";!}

Programming RV Substitutions $number Replaces the part of the string corresponding to the group with the specified number $(name) Replaces the part of the string corresponding to the group with the specified name $$ Substitutes $ $& Replaces with a copy of the full match $` Replaces the text of the input string until it matches $" Replaces the text of the input lines after match $+ Replace last captured group $_ Replace entire line Comments (? #) Inline comment # Comment to end of line

Programming RT Results of Regex: Regex Matches() Match. Collection Match Groups() Group. Collection Group Captures() Capture. Collection Captures()

Programming RV Example in C++ CLI (Visual C++/CLR/CLR Console Application): int main() ( Regex ^r = gcnew Regex(L"((\d)+)+"); Match ^m = r-> Match(L"123 456"); int match. Count = 0; while (m->Success) ( Console: : Write. Line(L"Match (0)", ++match. Count); for (int i = 1; i Groups->Count; i++) ( Group ^g = m->Groups[i]; Console: : Write. Line(L" Group (0) = "(1)"", i, g-> Value); for (int j = 0; j Captures->Count; j++) ( Capture ^c = g->Captures[j]; Console: : Write. Line(L" Capture (0) = "(1)" , position = (2), length = (3)", j, c, c->Index, c->Length); ) ) m = m->Next. Match(); ) return 0; ) System: : Text: : Regular. Expressions

Enabling actions and finding errors Limiting the number of significant digits in a number: (s + e)(pd+ + d+(pd* + e)) s = +|p = . d = d s + e = s? = (+|-)? pd* + e = (pd*)? = (.d*)? @"(+|-)? (. d+|d+(. d*)?)" or @"^(+|-)? (. d+|d+(. d*)?)$" Regex r = new Regex (@"^(+|-)? (. (? "digit"d)+|(? "digit"d)+(. (? "digit"d)*)?)$"); Match m = r. Match("+1. 23456789"); if (m. Success) ( Group g = m. Groups["digit"]; if (g. Captures. Count

Enabling actions and finding errors Determining the error position: Regex r = new Regex(@"(+|-)? (. (? "digit"d)+|(? "digit"d)+(. (? "digit"d )*)?)"); string str = "+1. 2345!678"; Match m = r. Match(str); if (m. Success) ( Group g = m. Groups["digit"]; if (g. Captures. Count 0) Console. Write. Line("Error at position 1: unexpected character "(0)"", str ); else if (m. Length

Enabling actions and searching for errors Determining the position of the error: 1. the first position of the input chain (1), if the first match does not start from the position Index = 0; 2. the position following the last match (match. Length + 1), if it does not match the last position of the input chain; 3. position of the first gap between matches (match[i]. Index + match[i]. Length + 1), if the character following the previous match is not the first character of the next match.

Index) break; index = m[i]. Index + m[i]. Length; ) Console. Write. Line("Error at position (0) "(1)"", index + 1, str); ) "abc. xyz. pqr" – correct; "+abc. xyz. pqr” – error in position 1 (“+”); "abc. xyz. pqr! – error in position 12 (“!”); "abc. xyz!. pqr" – error in position 8 (“!”).

Enabling actions and searching for errors But! "abc. xyz. +pqr” – error at position 8 (“.”). New template option: @"w+(. w+)*(. (? !$))? " Validation: "abc. xyz. +pqr” – error in position 9 (“+”); "abc. xyz. pqr. " – error in position 12 (".").

Balanced definitions: "(? "x")" adds one element to the collection named "x"; “(? “-x”)” removes one element from the collection “x”; "(? (x)(? !))" checks that there are no elements left in the collection "x". L language describing Pascal nested operators “begin end; ": @"^s*((? "begins+)+(? "-begin"ends*; s*)+)*(? (begin)(? !))$".

It is more convenient to describe the vocabulary of a language in the form of regular expressions, and to recognize the language using CA. Therefore, it is important to be able to convert language definitions in the form of regular expressions into a definition in the form of a CA. This transformation was proposed by Kenneth Thompson.

The finite state machine is five (S, S, d, S 0 , F)

S is a finite set of states.

S is a finite set of valid input signals.

d - transition function. It reflects the set Sx(SÈ(e)) into the set of states of a non-deterministic finite automaton. For a deterministic automaton, the transition function reflects the set SxS into the set of automaton states. In other words, depending on the state and the input symbol, d determines the new state of the machine.

S 0 - initial state of the finite automaton, S 0 О S.

F is the set of final states of the automaton, F О S.

The operation of a finite state machine is a sequence of steps. The step is determined by the state of the machine and the input symbol. The step itself consists of changing the state of the machine and reading the next symbol of the input sequence.

There are the following rules for converting regular expressions into a state machine.

1 The regular expression “e” is transformed into an automaton of two states and an e-transition between them (Figure 1).

Figure 1. – Automatic device for e-transition

2 A regular expression from one character “a” is converted into a finite automaton of two states and a transition between them based on the input signal a (Figure 2).

Figure 2. – Automatic machine for moving by symbol a

3 Let there be a regular expression rs and finite state machines for the expression r and the expression s have already been built. Then two machines are connected in series. Figure 3 shows the original automata for languages r and s. The figure shows an automaton for recognizing the concatenation of these languages.

Machine for r Machine for s

Figure 3. – Initial automata

Figure 4. – Language concatenation machine

4 Let there be a regular expression r | s and finite state machines have already been built for the expression r and the expression s (Figure 3). Then the resulting automaton must have an alternative to executing one of the two automata. That is, an automaton for the expression r | s for automata for r and s from Figure 3 has the form shown in Figure 5.

Figure 5. – Automatic machine for combining languages

5 Let there be a regular expression r* for the constructed finite automaton r. In this case, two new states are introduced to make it possible to bypass the expression automaton r, and also introduce an e-transition between the final and initial states to make it possible to repeat the automaton r multiple times. If for the regular expression r an automaton similar to Figure 3 is constructed, then the regular expression r* corresponds to the finite automaton presented in Figure 6.

	0	1
Q1	Q4	Q2
Q2	Q3	Q1
Q3	Q2	Q4
Q4	Q1	Q3

The columns of the transition table indicate the characters of the input alphabet, and the rows correspond to the current states of the DFA. The elements of each line indicate the states of the DFA, to which it must transition from the current state upon receiving the corresponding characters of the input alphabet. In particular, from the first line of this transition table it follows that receiving the symbols 0 and 1 in the initial state Q1 translates the DFA to states Q4 and Q2, respectively.

When recognizing an input sequence using the transition table, it is easy to trace changes in the state of the DFA in order to determine whether one of the allowing states is achieved or not. In particular, for the binary vector 01001000 with an even number of zeros and ones, the considered DFA generates the following sequence of transitions, where each transition is labeled by the symbol of the input alphabet that causes it:

Q1 0 Q4 1 Q3 0 Q2 0 Q3 1 Q4 1 Q1 0 Q4 0 Q1

This sequence of transitions ends with the admitting state Q1, therefore, the binary vector 01001000 belongs to the regular set recognized by the considered DFA and satisfies the above regular expression.

In conclusion, it should be noted that the considered informal method of constructing

For further study of the properties of finite automata and, in particular, for solving the synthesis problem, the following theorem is important.

Theorem 7.7 (determinization theorem). For any finite automaton, an equivalent deterministic finite automaton can be constructed.

In order to prove the theorem, it is necessary, firstly, to describe the algorithm for constructing a deterministic finite automaton from the original one; second, to justify this algorithm by rigorously proving that it does indeed produce a state machine that is deterministic and equivalent to the original one. Here we present only the algorithm for constructing a deterministic automaton.

The transformation of an arbitrary finite automaton to an equivalent deterministic one is carried out in two stages: first, arcs with the label \lambda are removed, then the determinization itself is carried out.

1. Removing λ-transitions (arcs labeled \lambda).

To go from the original finite automaton M=(V,Q,q_0,F,\delta) to the equivalent finite automaton M"=(V,Q",q_0,F",\delta") without λ-transitions, it is enough in the original make the following transformations in graph M.

A. All states, except the initial one, into which only arcs with the \lambda label enter, are deleted; thereby defining the set Q" of the finite automaton M". It is clear that Q"\subseteq Q. At the same time, we assume that the initial state remains the same.

b. The set of arcs of the finite automaton M" and their labels (thereby the transition function M" ) is defined as follows: for any two states p,r\in Q",~ p\to_(a)r occurs if and only if a \in V , and in the graph M one of two things holds: either there is an arc from p to r whose label contains the symbol a, or there is a state q such that p\Rightarrow_(\lambda)^(+)q and q\ to_(a)r In this case, the vertex q, generally speaking, may not belong to the set Q", i.e. it may disappear when passing to the automaton M" (Fig. 7.11). If q\in Q" , then, naturally, the arc (q,r) will be preserved in M" and the symbol a will be one of the symbols belonging to the label of this arc (Fig. 7.12).

Thus, in M" all arcs M are stored whose labels are different from \lambda and which connect a pair (vertices) of states from the set Q" (not deleted according to part a). In addition, for any triple of states p,q,r (not necessarily different!), such that p,r\in Q" and there is a path of non-zero length from p to q, the label of which is equal to \lambda (i.e. the path by λ-transitions), and an arc leads from q to r, the label of which contains the symbol a of the input alphabet; in M" an arc is constructed from p to r, the label of which contains the symbol a (see Fig. 7.11).

V. The set of final states F" of the finite automaton M" contains all states q\in Q", i.e. states of the finite automaton M that are not deleted according to paragraph a, for which q\Rightarrow_(\lambda)^(\ast) holds q_f for some q_f\in F (i.e. either state q itself is the final state of the finite automaton M, or a path of non-zero length leads from it along arcs labeled \lambda to one of the final states of the finite automaton M) (Fig. 7.13) .

2. Determination itself.

Let M=(Q,V,q_0,F,\delta) be a finite automaton without λ-transitions. Let us construct a deterministic finite automaton M_1 equivalent to M.

This finite automaton is defined in such a way that its state set is the set of all subsets of the state set of the finite automaton M. This means that each individual state of the finite automaton M_1 is defined as a certain subset of the set of states of the finite automaton M. In this case, the initial state of the new finite state machine (i.e. M_1) is a singleton subset containing the initial state of the old finite state machine (i.e. M), and the final states of the new finite state machine are all such subsets Q that contain at least one final the vertex of the original finite automaton M.

Henceforth, allowing some freedom of speech, we will sometimes call the states of the finite automaton M_1 states-sets. It is important, however, to clearly understand that each such state-set is a separate state of a new finite automaton, but not a set of its states. At the same time, for the original ("old") finite automaton M this is precisely the set of its states. Figuratively speaking, each subset of states of the old finite automaton is “collapsed” into one state of the new finite automaton*.

*Formally, we should define the set Q_1 as a set that is in one-to-one correspondence with the set 2^Q, but it is still more convenient for us to assume that Q_1 coincides with 2^Q - after all, the set of states of a finite automaton can be any non-empty finite set.

The transition function of the new finite automaton is defined so that from the state-set S by the input symbol a the finite automaton M_1 goes to the state-set, which is the union of all sets of states of the old finite automaton into which this old finite automaton goes by the symbol a from each state sets S. Thus, the finite automaton M_1 is deterministic by construction.

The above verbal description can be translated into formulas as follows: we build a finite machine M_1 so that

M_1=(Q_1,V,$q_0$,F_1,\delta_1) , where

\begin(cases)Q_1=2^Q,\quad F_1=$T\colon\, T\cap F\ne\varnothing,~T\in2^Q$,\\ (\forall S\subseteq Q) (\forall a\in V)\Bigl(\delta_1(S,a)= \bigcup\limits_(q\in S)\delta(q,a)\Bigr). \end(cases)

Let us pay attention to the fact that among the states of the new finite automaton there is a state \varnothing , and, according to (7.8), \delta_1(\varnothing,a)=\varnothing for any input symbol a . This means that, once in such a state, the state machine M_1 will not leave it. In general, any state q of a finite automaton such that for any input symbol a we have \delta(q,a)=q is called an absorbing state of the finite automaton. Thus, the state \varnothing of the deterministic finite state machine M_1 is absorbing. It is also useful to note that \delta_1(S,a)=\varnothing if and only if for each q\in S (states of the old finite state machine from the set of states S ) \delta(q,a)=\varnothing , i.e. e. in the graph M, no arc exits from each such state q, marked with the symbol a.

It can be proven that the finite automaton obtained using such an algorithm is equivalent to the original one.

Example 7.9. Let us determine the finite automaton shown in Fig. 7.14.

An equivalent finite automaton without λ-transitions is shown in Fig. 7.15. Note that vertex q_2 disappears, since only “empty” arcs enter it.

To determine the resulting automaton, it is not at all necessary to write down all of its 2^3=8 states, many of which may not be reachable from the initial state $q_0$ . To obtain states that are reachable from $q_0$, and only them, we will use the so-called pulling method.

This method can generally be described as follows.

In the initial finite automaton (without empty arcs) we define all sets of states that are reachable from the initial one, i.e. for each input symbol a we find the set \delta(q_0,a) . Each such set in the new automaton is a state directly accessible from the initial one.

For each of the defined state-sets S and each input symbol a, we find the set \textstyle(\mathop(\bigcup\limits_(q\in S) \delta(q,a))\limits^(\phantom(A)^( .))) . All states obtained at this step will be states of a new (deterministic) automaton, reachable from the initial vertex along a path of length 2. We repeat the described procedure until no new set states (including the empty one!) appear. It can be shown that this produces all states of the finite automaton M_1 that are reachable from the initial state $q_0$ .

For the finite state machine in Fig. 7.15 we have:

\begin(aligned)& \delta_1($q_0$,a)=$q_1$;\qquad \delta_1($q_0$,b)=$q_1,q_3$;\\ & \ delta_1($q_1$,a)=$q_1$;\qquad \delta_1($q_1$,b)=$q_1$;\\ & \delta_1($q_1,q_3$ ,a)= \delta(q_1,a)\cup \delta(q_3,a)= $q_1$\cup$q_1$=$q_1$;\\ & \delta_1($q_1, q_3$,b)= \delta(q_1,b)\cup \delta(q_3,b)= $q_1$\cup$q_1$=$q_1$. \end(aligned)

Since no new states-sets have appeared, the “pulling” procedure ends here, and we get the graph shown in Fig. 7.16.

Regular language addition

One of the important theoretical consequences of the determinization theorem is the following theorem.

Theorem 7.8. The complement of a regular language is a regular language.

Let L be a regular language in the alphabet V. Then the complement of the language L (as a set of words) is the language \overline(L)=V^(\ast)\setminus L .

According to Theorem 7.7, for a regular language L, a deterministic finite automaton M can be constructed that admits L. Since in a deterministic automaton from each vertex for each input symbol a transition to exactly one vertex is defined, then whatever the chain x is in the alphabet V, there is a unique path for it in M, starting in the initial state on which the chain x is read. It is clear that the chain x is allowed by the automaton M , that is, x\in L(M) , if and only if the last state of the specified path is final. It follows that the chain x\notin L(M) if and only if the last state of the specified path is not final. But we just need a finite automaton M" that admits a chain x if and only if it is not allowed by the original finite automaton M. Therefore, turning each final state of M into a non-final state and vice versa, we obtain a deterministic automaton that admits the addition of the language L .

The proven theorem allows us to build a finite automaton that does not admit a certain set of chains, using the following method: we first build an automaton that admits a given set of chains, then we determinize it and go to the automaton for addition as indicated in the proof of Theorem 7.8.

Example 7.10. A. Let's build a finite automaton that accepts all chains in the alphabet $0;1$ except chain 101.

First, let's build a finite automaton that allows a single chain 101. This automaton is shown in Fig. 7.17.

This automaton is quasi-deterministic, but not deterministic, since it is not fully defined. Let us determine it and obtain a deterministic equivalent finite automaton shown in Fig. 7.18.

And finally, moving to the addition (and renaming the states), we obtain the automaton shown in Fig. 7.19.

Note that in the resulting automaton all vertices, except vertex s_3, are final.

Note also that the transition to complement, which is discussed in the proof of Theorem 7.8, can only be carried out in a deterministic automaton. If we were to swap the roles of final and non-final vertices in the automaton shown in Fig. 7.17, then we would get an automaton that admits the language $\lambda,1,10$ , which is not - as one can easily imagine - the set of all chains other than chain 101.

Note also that the finite state machine in Fig. 7.19 allows all strings that contain an occurrence of string 101 but do not match that string itself. Here, for example, is the path carrying chain 1011: s_0,s_1,s_2,s_3,t.

b. Let's build a finite automaton that accepts all chains in the alphabet $0;1$ except those that contain an occurrence of the chain 101. Consider a language L, each chain of which contains an occurrence of the chain 101. It can be defined as follows:

L=(0+1)^(\ast)101(0+1)^(\ast).

We need to build an automaton to complement the language L.

In this case, using the regular expression directly, it is easy to construct a finite automaton that accepts the language L (Fig. 7.20).

Then we will carry out determinization using the “pull” method. The result of determination is presented in Fig. 7.21.

To completely solve the problem, all that remains is in Fig. 7.21 swap the roles of final and non-final vertices (Fig. 7.22).

V. Let's discuss the idea of constructing a finite automaton that allows those and only those chains in the alphabet $0;1$ that do not begin with the chain 01 and do not end with the chain 11 (i.e., chains of the form 01x and chains of the type y11 are not allowed, whatever their there were chains x,y\in$0;1$ ).

In this case, the complement of the language for which it is necessary to construct a finite automaton is the set of all such chains of zeros and ones that begin with the chain 01 or end with the chain 11. An automaton that allows this set of chains is constructed as an automaton for combining 01(0+1)^(\ ast)+(0+1)^(\ast)11 in the same way as described in the proof of Kleene’s theorem (see Theorem 7.6).

From the property that the class of regular languages is closed with respect to complement (see Theorem 7.8) it immediately follows that this class is closed with respect to intersection, set-theoretic and symmetric difference.

Corollary 7.3. For any two regular languages L_1 and L_2 the following statements are true:

1) the intersection of L_1\cap L_2 is regular;
2) the difference L_1\setminus L_2 is regular;
3) the symmetric difference L_1\vartriangle L_2 is regular.

The validity of the statements follows from the identities:

\begin(aligned) &(\scriptstyle(\mathsf(1))))\quad L_1\cap L_2= \overline(\overline(L_1) \cup\overline(L_2))\,;\\ &(\scriptstyle (\mathsf(2))))\quad L_1\setminus L_2= L_1\cap \overline(L_2)\,;\\ &(\scriptstyle(\mathsf(3))))\quad L_1\,\triangle\ ,L_2 = (L_1\cup L_2)\setminus (L_1\cap L_2).\end(aligned)

Firstly, the results obtained allow us to assert that the class of regular languages with respect to the operations of union, intersection and addition is a Boolean algebra, in which the unit is the universal language, and the zero is the empty language. Secondly, these algebraic properties of the family of regular languages allow us to solve the important problem of recognizing the equivalence of two arbitrary finite automata.

According to Definition 7.10, finite state machines are equivalent if the languages they accept are the same. Therefore, to verify the equivalence of the automata M_1 and M_2, it is enough to prove that the symmetric difference of the languages L(M_1) and L(M_2) is empty. To do this, in turn, it is enough to construct an automaton that admits this difference and make sure that the language it admits is empty. In general, the problem of recognizing that a state machine's language is empty is called the state machine's emptiness problem. To solve this problem, it is enough to find the set of final states of the automaton that are reachable from the initial state. Since a finite state machine is a directed graph, this problem can be solved, for example, using breadth-first search. A language allowed by a finite state machine is empty if and only if the set of final states reachable from the initial state is empty. In practice, it is preferable to recognize the equivalence of finite automata using a minimization algorithm, but now it is important for us to emphasize that the fundamental possibility of solving the equivalence problem follows from Theorem 7.7 and its algebraic consequences.

Lexical analysis. Regular languages ​​and finite state machines

Lexical analysis. Regular languages and finite state machines