A simple explanation of Bayes' theorem. Total Probability Formula

When deriving the total probability formula, it was assumed that the event A, the probability of which had to be determined, could happen to one of the events N 1 , N 2 , ... , N n, forming a complete group of pairwise incompatible events. Moreover, the probabilities of these events (hypotheses) were known in advance. Let us assume that an experiment has been carried out, as a result of which the event A it has arrived. This additional information allows us to re-evaluate the probabilities of the hypotheses. N i, having calculated P(H i /A).

or, using the total probability formula, we get

This formula is called Bayes' formula or hypothesis theorem. Bayes' formula allows you to “revise” the probabilities of hypotheses after the result of the experiment that resulted in the event becomes known A.

Probabilities Р(Н i)− these are the a priori probabilities of the hypotheses (they are calculated before the experiment). The probabilities P(H i /A)− these are the posterior probabilities of the hypotheses (they are calculated after the experiment). Bayes' formula allows you to calculate posterior probabilities from their prior probabilities and from the conditional probabilities of an event A.

Example. It is known that 5% of all men and 0.25% of all women are color blind. A randomly selected person based on their medical card number suffers from color blindness. What is the probability that it is a man?

Solution. Event A– a person suffers from color blindness. Space of elementary events for the experiment - a person is selected by medical card number - Ω = ( N 1 , N 2 ) consists of 2 events:

N 1 - a man is selected,

N 2 – a woman is selected.

These events can be selected as hypotheses.

According to the conditions of the problem (random choice), the probabilities of these events are the same and equal P(N 1 ) = 0.5; P(N 2 ) = 0.5.

In this case, the conditional probabilities that a person suffers from color blindness are equal, respectively:

R(A/N 1 ) = 0.05 = 1/20; R(A/N 2 ) = 0.0025 = 1/400.

Since it is known that the selected person is colorblind, i.e. the event occurred, we use Bayes’ formula to re-evaluate the first hypothesis:

Example. There are three identical-looking boxes. The first box contains 20 white balls, the second box contains 10 white and 10 black balls, and the third box contains 20 black balls. A white ball is taken from a box chosen at random. Calculate the probability that the ball is drawn from the first box.

Solution. Let us denote by A event - the appearance of a white ball. Three assumptions (hypotheses) can be made about the choice of box: N 1 ,N 2 , N 3 – selection of the first, second and third box, respectively.

Since the choice of any of the boxes is equally possible, the probabilities of the hypotheses are the same:

P(N 1 )=P(N 2 )=P(N 3 )= 1/3.

According to the problem, the probability of drawing a white ball from the first box is

Probability of drawing a white ball from the second box

Probability of drawing a white ball from the third box

We find the desired probability using the Bayes formula:

Repetition of tests. Bernoulli's formula.

N trials are carried out, in each of which event A may or may not occur, and the probability of event A in each individual trial is constant, i.e. does not change from experience to experience. We already know how to find the probability of event A in one experiment.

Of particular interest is the probability of occurrence of a certain number of times (m times) of event A in n experiments. Such problems can be easily solved if the tests are independent.

Def. Several tests are called independent with respect to event A , if the probability of event A in each of them does not depend on the outcomes of other experiments.

The probability P n (m) of the occurrence of event A exactly m times (non-occurrence n-m times, event ) in these n trials. Event A appears in very different sequences m times).

- Bernoulli formula.

The following formulas are obvious:

Р n (m less k times in n trials.

P n (m>k) = P n (k+1) + P n (k+2) +…+ P n (n) - probability of occurrence of event A more k times in n trials.

Let's start with an example. In the urn in front of you, equally likely there may be (1) two white balls, (2) one white and one black, (3) two black. You drag the ball and it turns out to be white. How would you rate it now? probability these three options (hypotheses)? Obviously, the probability of hypothesis (3) with two black balls = 0. But how to calculate the probabilities of the two remaining hypotheses!? This can be done by the Bayes formula, which in our case has the form (the number of the formula corresponds to the number of the hypothesis being tested):

Download the note in or

X– a random variable (hypothesis) taking the following values: x 1- two white ones, x 2– one white, one black; x 3– two black; at– random variable (event) taking values: at 1– a white ball is pulled out and at 2– a black ball is pulled out; P(x 1)– probability of the first hypothesis before drawing the ball ( a priori likelihood or probability before experience) = 1/3; P(x 2)– probability of the second hypothesis before drawing the ball = 1/3; P(x 3)– probability of the third hypothesis before drawing the ball = 1/3; P(y 1|x 1)– conditional probability of drawing a white ball, if the first hypothesis is true (the balls are white) = 1; P(y 1|x 2) – probability of drawing a white ball if the second hypothesis is true (one ball is white, the second is black) = ½; P(y 1|x 3) – probability of drawing a white ball if the third hypothesis is true (both black) = 0; P(y 1)– probability of drawing a white ball = ½; R(y 2)– probability of drawing a black ball = ½; and finally, what we are looking for - P(x 1|y 1) – the probability that the first hypothesis is true (both balls are white), given that we drew a white ball ( a posteriori likelihood or probability after experience); P(x 2|y 1) – the probability that the second hypothesis is true (one ball is white, the second is black), provided that we drew a white ball.

The probability that the first hypothesis (two white ones) is true, given that we drew a white ball:

The probability that the second hypothesis is true (one is white, the other is black), provided that we drew a white ball:

The probability that the third hypothesis is true (two black ones), given that we drew a white ball:

What does Bayes' formula do? It makes it possible, based on a priori probabilities of hypotheses - P(x 1), P(x 2), P(x 3)– and the probabilities of events occurring – P(y 1), R(y 2)– calculate the posterior probabilities of the hypotheses, for example, the probability of the first hypothesis, provided that a white ball is drawn – P(x 1|y 1).

Let's return once again to formula (1). The initial probability of the first hypothesis was P(x 1) = 1/3. With probability P(y 1) = 1/2 we could draw a white ball, and with probability P(y 2) = 1/2- black. We pulled out the white one. Probability of drawing white, provided that the first hypothesis is true P(y 1|x 1) = 1. Bayes' formula says that since white was drawn, the probability of the first hypothesis has increased to 2/3, the probability of the second hypothesis is still 1/3, and the probability of the third hypothesis has become zero.

It’s easy to check that if we pulled out a black ball, the posterior probabilities would change symmetrically: P(x 1|y 2) = 0, P(x 2|y 2) = 1/3, P(x 3|y 2) = 2/3.

Here's what Pierre Simon Laplace wrote about Bayes' formula in a work published in 1814:

This is the basic principle of that branch of contingency analysis that deals with transitions from events to causes.

Why is Bayes' formula so difficult to understand!? In my opinion, because our usual approach is reasoning from causes to effects. For example, if there are 36 balls in an urn, 6 of which are black and the rest are white. What is the probability of drawing a white ball? Bayes' formula allows you to go from events to reasons (hypotheses). If we had three hypotheses and an event occurred, how did that event (and not the alternative) affect the initial probabilities of the hypotheses? How have these probabilities changed?

I believe that Bayes' formula is not just about probabilities. It changes the paradigm of perception. What is the thought process when using the deterministic paradigm? If an event occurred, what was its cause? If there was an accident, emergency, military conflict. Who or what was their fault? What does a Bayesian observer think? What is the structure of reality that led to given case to such and such a manifestation... The Bayesian understands that in otherwise In this case the result could have been different...

Let's place the symbols in formulas (1) and (2) a little differently:

Let's talk again about what we see. With equal initial (a priori) probability, one of the three hypotheses could be true. With equal probability we could draw a white or black ball. We pulled out the white one. In light of this new additional information, our evaluation of the hypotheses should be reconsidered. Bayes' formula allows us to do this numerically. The prior probability of the first hypothesis (formula 7) was P(x 1), a white ball was drawn, the posterior probability of the first hypothesis became P(x 1|at 1). These probabilities differ by a factor.

Event at 1 called evidence that more or less confirms or refutes a hypothesis x 1. This coefficient is sometimes called the power of evidence. The more powerful the evidence (the more the coefficient differs from unity), the greater the fact of observation at 1 changes the prior probability, the more the posterior probability differs from the prior. If the evidence is weak (coefficient ~1), the posterior probability is almost equal to the prior.

Certificate at 1 V = 2 times changed the prior probability of the hypothesis x 1(formula 4). At the same time, evidence at 1 did not change the probability of the hypothesis x 2, since its power = 1 (formula 5).

In general, the Bayes formula has the following form:

X– a random variable (a set of mutually exclusive hypotheses) taking the following values: x 1, x 2, … , Xn. at– a random variable (a set of mutually exclusive events) taking the following values: at 1, at 2, … , atn. Bayes' formula allows you to find the posterior probability of a hypothesis Xi upon the occurrence of an event y j. The numerator is the product of the prior probability of the hypothesis Xi – P(xi) on the probability of an event occurring y j, if the hypothesis is true Xi – R(y j|xi). The denominator is the sum of the products of the same as in the numerator, but for all hypotheses. If we calculate the denominator, we get the total probability of the event occurring atj(if any of the hypotheses is true) – R(y j) (as in formulas 1–3).

Once again about the testimony. Event y j provides additional information, which allows you to revise the prior probability of the hypothesis Xi. Power of evidence – – contains in the numerator the probability of the event occurring y j, if the hypothesis is true Xi. The denominator is the total probability of the event occurring. atj(or the probability of an event occurring atj averaged over all hypotheses). atj above for hypothesis xi, than the average for all hypotheses, then the evidence plays into the hands of the hypothesis xi, increasing its posterior probability R(y j|xi). If the probability of an event occurring atj below for hypothesis xi than the average for all hypotheses, then the evidence lowers the posterior probability R(y j|xi) For hypotheses xi. If the probability of an event occurring atj for a hypothesis xi is the same as the average for all hypotheses, then the evidence does not change the posterior probability R(y j|xi) For hypotheses xi.

Here are a few examples that I hope will reinforce your understanding of Bayes' formula.

Problem 2. Two shooters independently shoot at the same target, each firing one shot. The probability of hitting the target for the first shooter is 0.8, for the second - 0.4. After shooting, one hole was found in the target. Find the probability that this hole belongs to the first shooter. .

Task 3. The object being monitored can be in one of two states: H 1 = (functioning) and H 2 = (not functioning). The prior probabilities of these states are P(H 1) = 0.7, P(H 2) = 0.3. There are two sources of information that provide contradictory information about the state of the object; the first source reports that the object is not functioning, the second - that it is functioning. It is known that the first source provides correct information with a probability of 0.9, and with a probability of 0.1 - incorrect information. The second source is less reliable: it provides correct information with a probability of 0.7, and incorrect information with a probability of 0.3. Find the posterior probabilities of the hypotheses. .

Problems 1–3 are taken from the textbook by E.S. Ventzel, L.A. Ovcharov. Probability theory and its engineering applications, section 2.6 Hypothesis theorem (Bayes formula).

Problem 4 taken from the book, section 4.3 Bayes' Theorem.

INFORMATION TECHNOLOGY, COMPUTER SCIENCE, AND MANAGEMENT

On the applicability of Bayes' formula

DOI 10.12737/16076

A. I. Dolgov **

1Joint-stock company "Design Bureau for Radio Monitoring of Control, Navigation and Communication Systems", Rostov-on-Don, Russian Federation

On applicability of Bayes" formula*** A. I. Dolgov1**

1“Design bureau on monitoring of control, navigation and communication systems” JSC, Rostov-on-Don, Russian Federation

The subject of this study is the Bayes formula. The purpose of this work is to analyze and expand the scope of application of the formula. The primary task is to study publications devoted to this problem, which made it possible to identify shortcomings in the use of the Bayes formula, leading to incorrect results. The next task is to construct modifications of the Bayes formula that take into account various single pieces of evidence and obtain correct results. And finally, using the example of specific source data, the incorrect results obtained using the Bayes formula are compared with the correct results calculated using the proposed modifications. Two methods were used to conduct the study. First, an analysis of the principles of constructing known expressions used to write the Bayes formula and its modifications was carried out. Secondly, a comparative assessment of the results (including quantitative) was performed. The proposed modifications ensure wider application of the Bayes formula in theory and practice, including in solving applied problems.

Key words: conditional probabilities, inconsistent hypotheses, compatible and incompatible evidence, normalization.

Bayes" formula is the research subject. The work objective is to analyze the formula application and widen the scope of its applicability. The first-priority problem includes the identification of the Bayes" formula disadvantages based on the study of the relevant publications leading to incorrect results. The next task is to construct the Bayes" formula modifications to provide an accounting of various single indications to obtain correct results. And finally, the incorrect results obtained with the application of Bayes" formula are compared to the correct results calculated with the use of the proposed formula modifications by the example of the specific initial data. Two methods are used in studies. First, the analysis of the principles of constructing the known expressions used to record the Bayesian formula and its modifications is conducted. Secondly, a comparative evaluation of the results (including the quantitative one) is performed. The proposed modifications provide a wider application of Bayes" formula both in theory and practice including the solution of the applied problems.

Keywords: conditional probabilities, inconsistent hypotheses, compatible and incompatible indications, normalizing.

Introduction. Bayes' formula is increasingly used in theory and practice, including in solving applied problems using computer technology. The use of mutually independent computational procedures makes it possible to use this formula especially effectively when solving problems on multiprocessor computing systems, since in this case parallel implementation is performed at the level of the general circuit, and when adding the next algorithm or class of problems there is no need to re-work on parallelization.

The subject of this study is the applicability of Bayes' formula for the comparative assessment of posterior conditional probabilities of inconsistent hypotheses under various single pieces of evidence. As the analysis shows, in such cases the normalized probabilities of incompatible combined events belonging to

S X<и ч и

IS eö AND IS X X<и H

"The work was carried out as part of an initiative research project.

**E-mail: [email protected]

""The research is done within the frame of the independent R&D.

corresponding to different complete groups of events. At the same time, the compared results turn out to be inadequate to real statistical data. This is due to the following factors:

Incorrect normalization is used;

The presence or absence of intersections of the evidence taken into account is not taken into account.

In order to eliminate the identified shortcomings, cases of applicability of the Bayes formula are identified. If the specified formula is not applicable, the problem of constructing its modification is solved, ensuring that various single pieces of evidence are taken into account and correct results are obtained. Using specific initial data as an example, a comparative assessment of the results was performed:

Incorrect - obtained using the Bayes formula;

Correct - calculated using the proposed modification.

Initial provisions. The statements stated below will be based on the principle of preserving probability ratios: “Correct processing of event probabilities is feasible only with normalization using one common normalizing divisor, which ensures that the ratios of normalized probabilities are equal to the ratios of the corresponding normalized probabilities.” This principle represents the subjective basis of probability theory, but is not properly reflected in modern educational and scientific-technical literature.

If this principle is violated, information about the degree of possibility of the events under consideration is distorted. The results and decisions made based on distorted information turn out to be inadequate to real statistical data.

This article will use the following concepts:

An elementary event is an event that is not divisible into elements;

Combined event - an event representing one or another combination of elementary events;

Compatible events are events that in some cases of comparative assessment of their probabilities may be incompatible, and in other cases compatible;

Incompatible events are events that are incompatible in all cases.

According to the probability multiplication theorem, the probability P (I ^E) of the product of elementary events I ^ and

E is calculated as the product of probabilities P(Ik E) = P(E)P(I^E) . In this regard, the Bayes formula is often

is written in the form P(Ik\E) =--- , describing the definition of posterior conditional probabilities

P(I^E) hypotheses Ik (k = 1,...n) based on the normalization of a priori probabilities P(I^E) of the combined incompatible events I to E taken into account. Each of such events represents a product whose factors are one of the considered hypotheses and one piece of evidence considered. At the same time, we consider everything

possible events IKE (k = 1,...n) form a complete group IKE of incompatible combined events, due to

with which their probabilities P(Ik E) should be normalized taking into account the total probability formula, according to which

swarm P(E) = 2 P(Ik)P(E\Ik). Therefore, the Bayes formula is most often written in the most commonly used form:

R(Ik) R(EIk)

P(Ik\E) = -. (1)

^ cation of Bayes' formula.

th Analysis of the features of constructing the Bayes formula aimed at solving applied problems, as well as examples

“and its practical application allow us to draw an important conclusion regarding the choice of a complete group of combined events compared according to the degree of possibility (each of which is the product of two elementary events - one of the hypotheses and the evidence taken into account). Such a choice is made subjectively by the decision maker, based on objective input data inherent in typical situational conditions: the types and number of hypotheses being evaluated and the evidence specifically taken into account.

Noncomparable probabilities of hypotheses given single inconsistent evidence. Bayes' formula is traditionally used in the case of determining a posteriori conditional probabilities that are not comparable in degree of possibility.

probabilities of hypotheses H^ given single incompatible evidence, each of which may “appear

only in combination with any of these hypotheses." In this case, complete groups and HkE are selected, combined

bathed events in the form of products, the factors of which are one of the evidence c. (1=1,...,t) and one

of n hypotheses under consideration.

Bayes' formula is used for a comparative assessment of the probabilities of combined events of each such complete group, which differs from other complete groups not only by the evidence e taken into account, but also in the general case by the types of hypotheses H ^ and (or) their number n (see, for example,)

RNkY = P(Hk) P(eH)

% Р(Нк) Р(Эг\Нк) к = 1

In the special case with n = 2

RNk\E,~ P(Hk) P(EN)

% Р(Нк) Р(Э,\Н к) к = 1

and the results obtained are correct, due to the principle of conservation of probability ratios:

P(H1E,) _ P(H 1)P(E,\H1) / P(H2) P(E,\H2) = P(H 1) P(E,\H1)

Р(Н 2= % РШ1!) РЭ,\Н0 % ^) РЭ,\Н) "Р(Н 2> 2>"

The subjectivity of choosing a complete group of combined events compared to the degree of possibility (with

one or another modified by elementary events) allows you to select a complete group of events and Hk E ■ with

negation of the elementary event E ■ () and write Bayes’ formula (1 = 1,...,t) as follows:

P(Hk\E) -=-RNSh±.

% P(Hk)P(E,Hk)

This formula is also applicable and makes it possible to obtain correct results if calculated to

normalized probabilities are compared under different hypotheses under consideration, but not under different evidence.

affairs. ¡^

Comparable probabilities of hypotheses under single inconsistent evidence. Judging by the well-known publications

is used for the comparative assessment of posterior conditional probabilities of hypotheses for various single pieces of evidence.

affairs. At the same time, no attention is paid to the following fact. In these cases, the normalized ^ probabilities of incompatible (incompatible) combined events belonging to different complete groups of n events are compared. However, in this case, the Bayes formula is not applicable, since combined events that are not included in one complete group are compared, the normalization of the probabilities of which is carried out using different n normalizing divisors. Normalized probabilities of incompatible (incompatible) combined events can be compared only if they belong to the same complete group of events and are normalized ¡3 using a common divisor equal to the sum of the probabilities of all normalized events included in the complete §

In general, the following may be considered incompatible evidence:

Two pieces of evidence (for example, evidence and its denial); ^

Three pieces of evidence (for example, in a gaming situation there is a win, a loss and a draw); ^

Four certificates (in particular, in sports, win, loss, draw and replay), etc. ^

Let's consider a fairly simple example (corresponding to the example given in) of using the Bayes formula ^ to determine the posterior conditional probabilities of the hypothesis H ^ for two incompatible events in

in the form of evidence L]- and its denial L]

P(H,k) - ^ . ^ P(A^k", (2)

] E R(Hk> R(A]\vk> k - 1

■ _ P(HkA ]) P(Hk> P(A ]\nc>

P(H,\A,) ----k-]-. (3)

V k\L]> P(A> p

] E R(Hk) R(A]\Hk) to -1

In cases (2) and (3), subjectively selected complete groups compared according to the degree of possibility of com-

binned events are respectively the sets and H to A and and H to A. This is the case when the formula

k-1 k ] k-1 k ]

Bayes is not applicable, since the principle of conservation of probability ratios is violated - the equality of the ratios of normalized probabilities to the ratios of the corresponding normalized probabilities is not observed:

P(N to A]] P(Nk) P(A]\Nk) / P(Nk) P(A]\Nk) P(Nk) P(A] Nk)

P(Nk E P(Nk) P(A]\Nk)/ E P(Nk) P(A]\Nk) P(Nk) P(A] Nk)

k - 1 /k - 1 According to the principle of preserving probability ratios, correct processing of event probabilities is only possible when normalizing using one common normalizing divisor equal to the sum of all compared normalized expressions. That's why

E R(Hk)R(A]\Hk) + E R(Hk)R(A]\Hk) - E R(Hk)[P(A]\Hk) + P(Hk) P(A]\Hk )] - EP(Hk) - 1. to -1 to -1 to -1 to -1

Thus, the fact is revealed that there are varieties of Bayes' formula that differ from

known for the absence of a normalizing divisor:

А,) - Р(Н) Р(А]\Нк), Р(Нк А,) - Р(Н) Р(А, Нк). (4)

J to I ■> to

In this case, the equality of the ratios of normalized probabilities to the ratios of the corresponding normalized probabilities is observed:

t^A^ P(Hk) P(A]\Hk)

A,) R(N k) R(A,Hk)

Based on the subjective choice of unconventionally recorded complete groups of incompatible combined events, it is possible to increase the number of modifications of the Bayes formula, including evidence, as well as a certain number of their negations. For example, the most complete group of combined events

and and Hk /"./ ^ and and Hk Yo\ corresponds (taking into account the absence of a normalizing divisor) modification of the formula; =1 A"=1 ; =1 Bayesian ly

Р(Нк\~) - Р(Н к) ПЁ^^^

where an elementary event in the form of evidence E\ e II II / "/ is one of the elements of the specified multiplicity

o In the absence of denials of evidence, that is, when Ё\ = // e and /"./,

^ P(H\E) P(Hk) P(E,\Hk)

E R(Hk) R(E\Hk) k - 1

Thus, a modification of Bayes' formula, intended to determine conditional probabilities of hypotheses comparable in degree of possibility under single incompatible evidence, looks as follows. The numerator contains the normalized probability of one of the combined incompatible events that form a complete group, expressed as a product of a priori probabilities, and the denominator contains the sum of all

normalized probabilities. In this case, the principle of maintaining probability ratios is observed - and the result obtained is correct.

Probabilities of hypotheses given single consistent evidence. Bayes' formulas are traditionally used to determine comparable posterior conditional probabilities of hypotheses Hk (k = 1,...,n) given one of several considered compatible evidence EL (1 = 1,...,m). In particular (see

for example, and ), when determining the posterior conditional probabilities P(H 1E^) and P(H 1 E2) for each of two compatible evidence E1 and E2, formulas of the form are used:

P(H 1) PE\H1) P(Hj) P(E2Hj) P(H J E1) = --1- and P(H J E 2) =--1-. (5)

I P(Hk) PE\Hk) I P(Hk) P(E2 Hk)

k = 1 k = 1 Please note that this is another case where Bayes' formula is not applicable. Moreover, in this case two shortcomings must be eliminated:

The illustrated normalization of the probabilities of combined events is incorrect, due to the fact that the events under consideration belong to different complete groups;

The symbolic records of the combined events HkEx and HkE2 do not reflect the fact that the evidence taken into account E x and E 2 is compatible.

To eliminate the last drawback, a more detailed record of combined events can be used, taking into account the fact that compatible evidence E1 and E2 in some cases may be incompatible, and in others compatible:

HkE1 = HkE1 E2 and HkE2 = HkE 1E2+HkE1 E2, where E1 and E 2 are evidence contrary to E1 and E 2.

Obviously, in such cases the product of events Hk E1E2 is taken into account twice. In addition, it could be taken into account again separately, but this does not happen. The fact is that in the situation under consideration, the assessed situation is influenced by three probable incompatible combined events: HkE1E2, HkE 1E2 and

Hk E1E2. At the same time, the decision maker is interested in assessing the degree of possibility only

two incompatible combined events: HkE1 E2 and HkE 1E2, which corresponds to considering only g

single certificates. ¡Ts

Thus, when constructing a modification of the Bayes formula for determining a posteriori conditional values,

The probabilities of hypotheses with single compatible evidence must be based on the following. The person who accepted- ^

making a decision, is interested in what kind of elementary event represented by this or that evidence from

the numbers under consideration actually occurred under specific conditions. If another elementary event occurs in K

form of a single certificate, a revision of the decision is required based on the results of a comparative assessment

posterior conditional probabilities of hypotheses with indispensable consideration of other conditions affecting the real total

installation 3

Let us introduce the following notation: HkE- for one (and only one) incompatible combined co-^

existence, consisting in the fact that out of m > 1 elementary events Ei (i = 1,...,m) under consideration, together with the hypothesis “

Hk one elementary event Ex occurred and no other elementary events occurred. se"

In the simplest case, two single incompatible pieces of evidence are considered. If confirmed

one of them is expected, the conditional probability of evidence in general form is expressed by the formula l

P(Hk E-) = P(Ei\Hk) -P(EjE^Hk) = P(Ei\Hk) -P(M^Hk)P(M^Hk) , i = 1, -2 (6) g

The validity of the formula can be clearly seen (Fig. 1).

Rice. 1. Geometric interpretation of the calculation of P(Hk E-) for / = 1,...,2 With conditionally independent evidence

P(K1K2\Hk) = p(E\Hk)P(E2\Hk),

therefore, taking into account (6)

P(Hk E-) = PE Nk) - P(E1 Nk) P(E21Hk) , = 1,.,2. (7)

Similarly, the probability P(HkE-) of one of three (/ = 1,...,3) incompatible events HkE^ is expressed by the formula

For example, when i = 1:

p(HkEl) = P(Ei\Hk)-[ S P(Ei\Hk)P(Ej\Hk) ] + P(EiE2E3Hk)

p(HkE-) = P(E7|Hk)- P(E]E^Hk)- P(E7EjHk) + P(E]E2E3\Hk)

The validity of this formula is clearly confirmed by the geometric interpretation presented in

Rice. 2. Geometric interpretation of the calculation of P(Hk E-) for / = 1,...,3

Using the method of mathematical induction, it is possible to prove the general formula for the probability P(Hk E-) for any amount of evidence e, 0=1,...,t):

P(HkE-) = P(E,Hk)- t RE\Hk) P(E]\Hk) + 1 P(E\Hk) P(E]\Hk) P(E^Hk) +■■■ + (-1)

] = 1(] * 0 ],1 * 1

Using the probability multiplication theorem, we write the conditional probability P(HkE~-) in two forms:

^ from which it follows that

P(Hk E -) = P(H k) P(E-|Hk) = P(E-) P(Hk

E-)= P(HkE-) "" P(E-)

Using the total probability formula P(Ei) = S P(H£) P(Ei Hk) it turns out that

E-) = P(HkET)

2 P(HkE-) k = 1

Substituting expressions for P(HkE-) in the form of the right side of (8) into the resulting formula, we obtain the final form of the formula for determining the posterior conditional probabilities of hypotheses H^ (k = 1,...,n) for one of several considered incompatible single pieces of evidence : (E^\Hk)

P(Nk)[P(E,\Nk) - 2 P(E,\Nk) P(Er k) +...+ (-1)t-1 P(P P(Erk)] P(N, E~) =-] = 1(] * ■----(9)

to 1 p t t t

2 P(N k) 2 [P(E,\N k) - 2 P(EgHk) P(E^Hk) + ...+ (-1)m-1 P(P P (Ep k)]

k=1 , = 1 ) = 1() *,) ■! =1

Comparative assessments. Quite simple but illustrative examples are considered, limited to the analysis of calculated posterior conditional probabilities of one of two hypotheses given two single pieces of evidence. 1. Probabilities of hypotheses given inconsistent single pieces of evidence. Let us compare the results obtained using Bayes formulas (2) and (3), using the example of two pieces of evidence L. = L and L. = L with the initial data:

Р(Н1 = 0.7; Р(Н2) = 0.3; Р(Л| Н^ = 0.1; Р(Л\н 1) = 0.9; Р(Л\Н2) = 0.6 ; P(A\H2) = 0.4. In the considered examples with hypothesis H1, traditional formulas (2) and (3) lead to the following results:

R(N.) R(A\No 0 07

P(N, L) =-- 11 = - = 0.28,

2 Р(Н к) Р(А\Нк) к = 1

R(N L R(A\N 1) 0 63

P(N, L) =-- 11 = - = 0.84,

2 Р(Нк) Р(А\Нк) к = 1

forming divides P(H 1 A) = P(H^ P(L\Hp = 0.07; P(H^ A) = P(H1) P(l|H^ = 0.63. 1ations of the proposed formulas regarding:

R<Н)Р(АНА-Р(А|Н1) _ 0,07

and with the proposed formulas (4), which do not have normalizing divisors: “and

Thus, in the case of applying the proposed formulas, the ratio of normalized probabilities is equal to the ratio of normalized probabilities: K

gt f P(N 1) P(A\N 1) A11 |

When using known formulas with the same ratio -;-=-= 0.11 normalized veron

Р(Н 1) Р(А\Н 1) «§

probabilities indicated in the numerators, the ratio of the resulting normalized probabilities: 2

Р(Н 1) Р(А\Н 1) Р(А\Н 1) 0.63

P(N1 L) = 0.28 P(N 1 L) = 0.84

That is, the principle of preserving probability ratios is not respected, and incorrect results are obtained. At the same time £

in the case of using known formulas, the value of the relative deviation of the ratio (11) of the posterior conditional probabilities of the hypotheses from the correct results (10) turns out to be very significant, since it amounts to

°.33 - °.P x 100 = 242%.. I

2. Probabilities of hypotheses given compatible single pieces of evidence. Let us compare the results obtained using Bayes formulas (5) and the constructed correct modification (9), using the following initial data: ^

P(H1 = 0.7; P(H2) = 0.3; P(E1H1) = 0.4; P(E2H1) = 0.8; P(E1\H2) = 0.7; P(E^ H2) = 0.2.113

In the considered examples with hypothesis H 2 in the case of using traditional formulas (5):

P(H 2) P(E1 H 2) Q, 21

P(H 2 E1) = -2-!-2- = - = Q,429,

I p(Hk) p(El Hk) k = 1

P(H 2) P(E 2 H 2) Q,Q6

P(H 2 E 2) = -2-- = - = 0.097.

I P(Hk) P(E 2 Hk) k = 1

In the case of applying the proposed formula (9) taking into account (7) P(H

P(H2) 0.168

E.) ----- 0.291,

Z P(Hk) Z "

P(H2) 0.018

E0) ----- 0.031.

Z P(Hk) Z k - 1 i - 1

When using the proposed correct formulas, due to the same denominators, the ratio P(H2) -

The normalized probabilities indicated in the numerators are equal to the ratio

P(H2)

normalized probabilities:

That is, the principle of conservation of probability ratios is observed.

However, in the case of using known formulas with the ratio of the normalized probabilities indicated in the numerators

P(H 2) P(E1\H 2) _ 0.21 _3 5 P(H 2)P(E 2 H 2) 0.06,

ratio of normalized probabilities:

P(H 2 = 0.429 = 4.423. (13)

P(H 2\e2) 0.097

That is, the principle of maintaining probability ratios, as before, is not observed. Moreover, in the case of using known formulas, the value of the relative deviation of the ratio (13) of the posterior conditional probabilities of hypotheses from the correct results (12) also turns out to be very significant:

9.387 4.423 x 100 = 52.9%.

Conclusion. Analysis of the construction of specific formula relations that implement the Bayes formula and its modifications proposed for solving practical problems allows us to state the following. The complete group of comparable combined events can be selected subjectively by the decision maker. This choice is based on the taken into account objective initial data characteristic of a typical setting (specific types and number of elementary events - evaluated hypotheses and evidence). Of practical interest is the subjective choice of other options for the complete group compared in terms of the degree of possibility.

ity of combined events - thus ensuring a significant variety of formula relationships when constructing non-traditional variants of modifications of the Bayes formula. This, in turn, can be the basis for improving the mathematical support of software implementation, as well as expanding the scope of application of new formula relations for solving applied problems.

Bibliography

1. Gnedenko, B. V. An elementary introduction to the theory of probability / B. V. Gnedenko, A. Ya. Khinchin. - 114 New York: Dover Publications, 1962. - 144 r.

2. Ventzel, E. S. Theory of Probability / E. S. Ventzel. - 10th ed., erased. - Moscow: Higher School, 2006. - 575 p.

3. Andronov. A. M., Probability theory and mathematical statistics / A. M. Andronov, E. A. Kopytov, L. Ya. Gringlaz. - St. Petersburg: Peter, 2004. - 481 p.

4. Zmitrovich, A. I. Intelligent information systems / A. I. Zmitrovich. - Minsk: TetraSystems, 1997. - 496 p.

5. Chernorutsky, I. G. Decision-making methods / I. G. Chernorutsky. - St. Petersburg: BHV-Petersburg, 2005. - 416 p.

6. Naylor, C.-M. Build Your Own Expert System / C.-M. Naylor. - Chichester: John Wiley & Sons, 1987. - 289 p.

7. Romanov, V. P. Intelligent information systems in economics / V. P. Romanov. - 2nd ed., erased.

Moscow: Exam, 2007. - 496 p.

8. Economic efficiency and competitiveness / D. Yu. Muromtsev [et al.]. - Tambov: Tamb Publishing House. state tech. University, 2007.- 96 p.

9. Dolgov, A. I. Correct modifications of the Bayes formula for parallel programming / A. I. Dolgov // Supercomputer technologies: materials of the 3rd All-Russian. scientific-technical conf. - Rostov-on-Don. - 2014.- T. 1 - P. 122-126.

10. Dolgov, A. I. On the correctness of modifications of the Bayes formula / A. I. Dolgov // Vestnik Don. state tech. un-ta.

2014. - T. 14, No. 3 (78). - P. 13-20.

1. Gnedenko, B.V., Khinchin, A.Ya. An elementary introduction to the theory of probability. New York: Dover Publications, 1962, 144 r.

2. Ventsel, E.S. Teoriya veroyatnostey. 10th ed., reimpr. Moscow: Vysshaya shkola, 2006, 575 p. (in Russian).

3. Andronov, A.M., Kopytov, E.A., Gringlaz, L.Y. Teoriya veroyatnostey i matematicheskaya statistika. St. Petersburg: Piter, 2004, 481 p. (in Russian).

4. Zmitrovich, A.1. Intellektual"nye informatsionnye systemy. Minsk: TetraSistems, 1997, 496 p. (in Russian).

5. Chernorutskiy, I.G. Metody prinyatiya decision. St. Petersburg: BKhV-Petersburg, 2005, 416 p. (in Russian).

6. Naylor, C.-M. Build Your Own Expert System. Chichester: John Wiley & Sons, 1987, 289 p.

7. Romanov, V.P. Intellektual"nye informatsionnye sistemy v ekonomike. 2nd ed., reimpr. Moscow: Ekzamen, 2007, 496 p. (in Russian).

8. Muromtsev, D.Y., et al. Economic effektivnost" i konkurentosposobnost". Tambov: Izd-vo Tamb. gos. tekhn. un-ta, 2007, 96 p. (in Russian). I.B.

9. Dolgov, A1. Korrektnye modifikatsii formuly Bayesa dlya parallel"nogo programmirovaniya. Superkomp"yuternye tekhnologii: mat-ly 3-y vseros. nauch-techn. conf. Rostov-on-Don, 2014, vol. 1, pp. 122-126 (in Russian). ^

10. Dolgov, A1. O korrektnosti modifikatsiy formully Bayesa. ^ Vestnik of DSTU, 2014, vol. 14, no. 3 (78), pp. 13-20 (in Russian). *

Who is Bayes? and what does it have to do with management? - a completely fair question may follow. For now, take my word for it: this is very important!.. and interesting (at least to me).

What is the paradigm in which most managers operate: If I observe something, what conclusions can I draw from it? What does Bayes teach: what must really be there for me to observe this something? This is exactly how all sciences develop, and he writes about this (I quote from memory): a person who does not have a theory in his head will shy away from one idea to another under the influence of various events (observations). It’s not for nothing that they say: there is nothing more practical than a good theory.

Example from practice. My subordinate makes a mistake, and my colleague (the head of another department) says that it would be necessary to exert managerial influence on the negligent employee (in other words, punish/scold). And I know that this employee performs 4–5 thousand of the same type of operations per month, and during this time makes no more than 10 mistakes. Do you feel the difference in the paradigm? My colleague reacts to the observation, and I have a priori knowledge that the employee makes a certain number of mistakes, so another one did not affect this knowledge... Now, if at the end of the month it turns out that there are, for example, 15 such mistakes!.. This will already be a reason to study the reasons for non-compliance with standards.

Convinced of the importance of the Bayesian approach? Intrigued? Hope so". And now the fly in the ointment. Unfortunately, Bayesian ideas are rarely given right away. I was frankly unlucky, since I became acquainted with these ideas through popular literature, after reading which many questions remained. When planning to write a note, I collected everything that I had previously taken notes on Bayes, and also studied what was written on the Internet. I present to your attention my best guess on the topic. Introduction to Bayesian Probability.

Derivation of Bayes' theorem

Consider the following experiment: we name any number lying on the segment and record when this number is, for example, between 0.1 and 0.4 (Fig. 1a). The probability of this event is equal to the ratio of the length of the segment to the total length of the segment, provided that the appearance of numbers on the segment equally probable. Mathematically this can be written p(0,1 <= x <= 0,4) = 0,3, или кратко R(X) = 0.3, where R- probability, X– random variable in the range , X– random variable in the range . That is, the probability of hitting the segment is 30%.

Rice. 1. Graphic interpretation of probabilities

Now consider the square x (Fig. 1b). Let's say we have to name pairs of numbers ( x, y), each of which is greater than zero and less than one. The probability that x(first number) will be within the segment (blue area 1), equal to the ratio of the area of the blue area to the area of the entire square, that is (0.4 – 0.1) * (1 – 0) / (1 * 1) = 0, 3, that is, the same 30%. The probability that y located inside the segment (green area 2) is equal to the ratio of the area of the green area to the area of the entire square p(0,5 <= y <= 0,7) = 0,2, или кратко R(Y) = 0,2.

What can you learn about values at the same time? x And y. For example, what is the probability that at the same time x And y are in the corresponding given segments? To do this, you need to calculate the ratio of the area of area 3 (the intersection of the green and blue stripes) to the area of the entire square: p(X, Y) = (0,4 – 0,1) * (0,7 – 0,5) / (1 * 1) = 0,06.

Now let's say we want to know what the probability is that y is in the interval if x is already in the range . That is, in fact, we have a filter and when we call pairs ( x, y), then we immediately discard those pairs that do not satisfy the condition for finding x in a given interval, and then from the filtered pairs we count those for which y satisfies our condition and considers the probability as the ratio of the number of pairs for which y lies in the above segment to the total number of filtered pairs (that is, for which x lies in the segment). We can write this probability as p(Y|X at X hit the range." Obviously, this probability is equal to the ratio of the area of area 3 to the area of blue area 1. The area of area 3 is (0.4 – 0.1) * (0.7 – 0.5) = 0.06, and the area of blue area 1 ( 0.4 – 0.1) * (1 – 0) = 0.3, then their ratio is 0.06 / 0.3 = 0.2. In other words, the probability of finding y on the segment provided that x belongs to the segment p(Y|X) = 0,2.

In the previous paragraph we actually formulated the identity: p(Y|X) = p(X, Y) / p( X). It reads: “probability of hitting at in the range , provided that X hit the range, equal to the ratio of the probability of simultaneous hit X into the range and at to the range, to the probability of hitting X into the range."

By analogy, consider the probability p(X|Y). We call couples ( x, y) and filter those for which y lies between 0.5 and 0.7, then the probability that x is in the interval provided that y belongs to the segment is equal to the ratio of the area of region 3 to the area of green region 2: p(X|Y) = p(X, Y) / p(Y).

Note that the probabilities p(X, Y) And p(Y, X) are equal, and both are equal to the ratio of the area of zone 3 to the area of the entire square, but the probabilities p(Y|X) And p(X|Y) not equal; while the probability p(Y|X) is equal to the ratio of the area of region 3 to region 1, and p(X|Y) – region 3 to region 2. Note also that p(X, Y) is often denoted as p(X&Y).

So we introduced two definitions: p(Y|X) = p(X, Y) / p( X) And p(X|Y) = p(X, Y) / p(Y)

Let us rewrite these equalities in the form: p(X, Y) = p(Y|X) * p( X) And p(X, Y) = p(X|Y) * p(Y)

Since the left sides are equal, the right sides are equal: p(Y|X) * p( X) = p(X|Y) * p(Y)

Or we can rewrite the last equality as:

This is Bayes' theorem!

Do such simple (almost tautological) transformations really give rise to a great theorem!? Don't rush to conclusions. Let's talk again about what we got. There was a certain initial (a priori) probability R(X), that the random variable X uniformly distributed on the segment falls within the range X. An event occurred Y, as a result of which we received the posterior probability of the same random variable X: R(X|Y), and this probability differs from R(X) by coefficient. Event Y called evidence, more or less confirming or refuting X. This coefficient is sometimes called power of evidence. The stronger the evidence, the more the fact of observing Y changes the prior probability, the more the posterior probability differs from the prior. If the evidence is weak, the posterior probability is almost equal to the prior.

Bayes' formula for discrete random variables

In the previous section, we derived Bayes' formula for continuous random variables x and y defined on the interval. Let's consider an example with discrete random variables, each taking two possible values. During routine medical examinations, it was found that at the age of forty, 1% of women suffer from breast cancer. 80% of women with cancer receive positive mammogram results. 9.6% of healthy women also receive positive mammogram results. During the examination, a woman in this age group received a positive mammography result. What is the likelihood that she actually has breast cancer?

The line of reasoning/calculation is as follows. Of the 1% of cancer patients, mammography will give 80% positive results = 1% * 80% = 0.8%. Of 99% of healthy women, mammography will give 9.6% positive results = 99% * 9.6% = 9.504%. Total of 10.304% (9.504% + 0.8%) with positive mammography results, only 0.8% are sick, and the remaining 9.504% are healthy. Thus, the probability that a woman with a positive mammogram has cancer is 0.8% / 10.304% = 7.764%. Did you think 80% or so?

In our example, the Bayes formula takes the following form:

Let's talk about the “physical” meaning of this formula once again. X– random variable (diagnosis), taking values: X 1- sick and X 2– healthy; Y– random variable (measurement result – mammography), taking values: Y 1- positive result and Y2- negative result; p(X 1)– probability of illness before mammography (a priori probability) equal to 1%; R(Y 1 |X 1 ) – the probability of a positive result if the patient is sick (conditional probability, since it must be specified in the conditions of the task), equal to 80%; R(Y 1 |X 2 ) – the probability of a positive result if the patient is healthy (also conditional probability) is 9.6%; p(X 2)– the probability that the patient is healthy before mammography (a priori probability) is 99%; p(X 1|Y 1 ) – the probability that the patient is sick, given a positive mammography result (posterior probability).

It can be seen that the posterior probability (what we are looking for) is proportional to the prior probability (initial) with a slightly more complex coefficient . Let me emphasize again. In my opinion, this is a fundamental aspect of the Bayesian approach. Measurement ( Y) added a certain amount of information to what was initially available (a priori), which clarified our knowledge about the object.

Examples

To consolidate the material you have covered, try solving several problems.

Example 1. There are 3 urns; in the first there are 3 white balls and 1 black; in the second - 2 white balls and 3 black; in the third there are 3 white balls. Someone approaches one of the urns at random and takes out 1 ball from it. This ball turned out to be white. Find the posterior probabilities that the ball is drawn from the 1st, 2nd, 3rd urn.

Solution. We have three hypotheses: H 1 = (the first urn is selected), H 2 = (the second urn is selected), H 3 = (the third urn is selected). Since the urn is chosen at random, the a priori probabilities of the hypotheses are equal: P(H 1) = P(H 2) = P(H 3) = 1/3.

As a result of the experiment, the event A = appeared (a white ball was drawn from the selected urn). Conditional probabilities of event A under hypotheses H 1, H 2, H 3: P(A|H 1) = 3/4, P(A|H 2) = 2/5, P(A|H 3) = 1. For example , the first equality reads like this: “the probability of drawing a white ball if the first urn is chosen is 3/4 (since there are 4 balls in the first urn, and 3 of them are white).”

Using Bayes' formula, we find the posterior probabilities of the hypotheses:

Thus, in the light of information about the occurrence of event A, the probabilities of the hypotheses changed: hypothesis H 3 became the most probable, hypothesis H 2 became the least probable.

Example 2. Two shooters independently shoot at the same target, each firing one shot. The probability of hitting the target for the first shooter is 0.8, for the second - 0.4. After shooting, one hole was found in the target. Find the probability that this hole belongs to the first shooter (The outcome (both holes coincided) is discarded as negligibly unlikely).

Solution. Before the experiment, the following hypotheses are possible: H 1 = (neither the first nor the second arrow will hit), H 2 = (both arrows will hit), H 3 - (the first shooter will hit, but the second will not), H 4 = (the first shooter will not will hit, and the second will hit). Prior probabilities of hypotheses:

P(H 1) = 0.2*0.6 = 0.12; P(H2) = 0.8*0.4 = 0.32; P (H 3) = 0.8 * 0.6 = 0.48; P(H 4) = 0.2*0.4 = 0.08.

The conditional probabilities of the observed event A = (there is one hole in the target) under these hypotheses are equal: P(A|H 1) = P(A|H 2) = 0; P(A|H 3) = P(A|H 4) = 1

After the experiment, hypotheses H 1 and H 2 become impossible, and the posterior probabilities of hypotheses H 3 and H 4 according to Bayes’ formula will be:

Bayes against spam

Bayes' formula has found wide application in the development of spam filters. Let's say you want to train a computer to determine which emails are spam. We will proceed from the dictionary and phrases using Bayesian estimates. Let us first create a space of hypotheses. Let us have 2 hypotheses regarding any letter: H A is spam, H B is not spam, but a normal, necessary letter.

First, let's “train” our future anti-spam system. Let's take all the letters we have and divide them into two “piles” of 10 letters each. Let's put spam emails in one and call it the H A heap, in the other we'll put the necessary correspondence and call it the H B heap. Now let's see: what words and phrases are found in spam and necessary letters and with what frequency? We will call these words and phrases evidence and denote them E 1 , E 2 ... It turns out that commonly used words (for example, the words “like”, “your”) in the heaps H A and H B occur with approximately the same frequency. Thus, the presence of these words in a letter tells us nothing about which pile to assign it to (weak evidence). Let’s assign these words a neutral “spam” probability score, say 0.5.

Let the phrase “spoken English” appear in only 10 letters, and more often in spam letters (for example, in 7 spam letters out of all 10) than in necessary ones (in 3 out of 10). Let's give this phrase a higher rating for spam: 7/10, and a lower rating for normal emails: 3/10. Conversely, it turned out that the word “buddy” appeared more often in normal letters (6 out of 10). And then we received a short letter: “My friend! How is your spoken English?”. Let's try to evaluate its “spammyness”. We will give general estimates P(H A), P(H B) of a letter’s belonging to each heap using a somewhat simplified Bayes formula and our approximate estimates:

P(H A) = A/(A+B), Where A = p a1 *p a2 *…*p an , B = p b1 *p b2 *…*p b n = (1 – p a1)*(1 – p a2)*… *(1 – p an).

Table 1. Simplified (and incomplete) Bayes estimation of writing.

Thus, our hypothetical letter received a probability of belonging score with an emphasis on “spammy”. Can we decide to throw the letter into one of the piles? Let's set decision thresholds:

We will assume that the letter belongs to the heap H i if P(H i) ≥ T.
A letter does not belong to the heap if P(H i) ≤ L.
If L ≤ P(H i) ≤ T, then no decision can be made.

You can take T = 0.95 and L = 0.05. Since for the letter in question and 0.05< P(H A) < 0,95, и 0,05 < P(H В) < 0,95, то мы не сможем принять решение, куда отнести данное письмо: к спаму (H A) или к нужным письмам (H B). Можно ли улучшить оценку, используя больше информации?

Yes. Let's calculate the score for each piece of evidence in a different way, just as Bayes actually proposed. Let be:

F a is the total number of spam emails;

F ai is the number of letters with certificate i in a pile of spam;

F b is the total number of letters needed;

F bi is the number of letters with certificate i in a bunch of necessary (relevant) letters.

Then: p ai = F ai /F a, p bi = F bi /F b. P(H A) = A/(A+B), P(H B) = B/(A+B), Where A = p a1 *p a2 *…*p an , B = p b1 *p b2 *…*p b n

Please note that assessments of evidence words p ai and p bi have become objective and can be calculated without human intervention.

Table 2. More accurate (but incomplete) Bayes estimate based on available features from a letter

We received a very definite result - with a large advantage, the letter can be classified as the right letter, since P(H B) = 0.997 > T = 0.95. Why did the result change? Because we used more information - we took into account the number of letters in each of the piles and, by the way, determined the estimates p ai and p bi much more correctly. They were determined as Bayes himself did, by calculating conditional probabilities. In other words, p a3 is the probability of the word “buddy” appearing in a letter, provided that this letter already belongs to the spam heap H A . The result was not long in coming - it seems that we can make a decision with greater certainty.

Bayes against corporate fraud

An interesting application of the Bayesian approach was described by MAGNUS8.

My current project (IS for detecting fraud at a manufacturing enterprise) uses the Bayes formula to determine the probability of fraud (fraud) in the presence/absence of several facts that indirectly testify in favor of the hypothesis about the possibility of committing fraud. The algorithm is self-learning (with feedback), i.e. recalculates its coefficients (conditional probabilities) upon actual confirmation or non-confirmation of fraud during an inspection by the economic security service.

It’s probably worth saying that such methods when designing algorithms require a fairly high mathematical culture of the developer, because the slightest error in the derivation and/or implementation of computational formulas will nullify and discredit the entire method. Probabilistic methods are especially prone to this, since human thinking is not adapted to work with probabilistic categories and, accordingly, there is no “visibility” and understanding of the “physical meaning” of intermediate and final probabilistic parameters. This understanding exists only for the basic concepts of probability theory, and then you just need to very carefully combine and derive complex things according to the laws of probability theory - common sense will no longer help for composite objects. This, in particular, is associated with quite serious methodological battles taking place on the pages of modern books on the philosophy of probability, as well as a large number of sophisms, paradoxes and curious puzzles on this topic.

Another nuance that I had to face is that, unfortunately, almost everything even more or less USEFUL IN PRACTICE on this topic is written in English. In Russian-language sources there is mainly only a well-known theory with demonstration examples only for the most primitive cases.

I completely agree with the last remark. For example, Google, when trying to find something like “the book Bayesian Probability,” did not produce anything intelligible. True, he reported that a book with Bayesian statistics was banned in China. (Statistics professor Andrew Gelman reported on the Columbia University blog that his book, Data Analysis with Regression and Multilevel/Hierarchical Models, was banned from publication in China. The publisher there reported that "the book was not approved by authorities due to various politically sensitive material in text.") I wonder if a similar reason led to the lack of books on Bayesian probability in Russia?

Conservatism in human information processing

Probabilities determine the degree of uncertainty. Probability, both according to Bayes and our intuitions, is simply a number between zero and that which represents the degree to which a somewhat idealized person believes the statement to be true. The reason a person is somewhat idealized is that the sum of his probabilities for two mutually exclusive events must equal his probability of either event occurring. The property of additivity has such consequences that few real people can meet all of them.

Bayes' theorem is a trivial consequence of the property of additivity, indisputable and agreed upon by all probabilists, Bayesian and otherwise. One way to write this is as follows. If P(H A |D) is the subsequent probability that hypothesis A was after a given value D was observed, P(H A) is its prior probability before a given value D was observed, P(D|H A ) is the probability that a given value D will be observed if H A is true, and P(D) is the unconditional probability of a given value D, then

(1) P(H A |D) = P(D|H A) * P(H A) / P(D)

P(D) is best thought of as a normalizing constant causing the posterior probabilities to add up to unity over the exhaustive set of mutually exclusive hypotheses that are being considered. If it needs to be calculated, it could be like this:

But more often P(D) is eliminated rather than calculated. A convenient way to eliminate this is to transform Bayes' theorem into probability-odds ratio form.

Consider another hypothesis, H B , which is mutually exclusive with H A , and change your mind about it based on the same given quantity that changed your mind about H A . Bayes' theorem says that

(2) P(H B |D) = P(D|H B) * P(H B) / P(D)

Now let's divide Equation 1 by Equation 2; the result will be like this:

where Ω 1 are the posterior odds in favor of H A through H B , Ω 0 are the prior odds, and L is the quantity familiar to statisticians as the probability ratio. Equation 3 is the same relevant version of Bayes' theorem as Equation 1, and is often significantly more useful especially for experiments involving hypotheses. Bayesians argue that Bayes' theorem is a formally optimal rule about how to revise opinions in the light of new evidence.

We are interested in comparing the ideal behavior defined by Bayes' theorem with the actual behavior of people. To give you some idea of what this means, let's try an experiment with you as the test subject. This bag contains 1000 poker chips. I have two such bags, one containing 700 red and 300 blue chips, and the other containing 300 red and 700 blue. I tossed a coin to determine which one to use. So, if our opinions are the same, your current probability of getting a bag containing more red chips is 0.5. Now, you make a random sample with a return after each chip. In 12 chips you get 8 red and 4 blue. Now, based on everything you know, what is the probability of landing the bag with the most reds? It is clear that it is higher than 0.5. Please do not continue reading until you have recorded your score.

If you are like a typical test taker, your score fell in the range of 0.7 to 0.8. If we were to do the corresponding calculation, however, the answer would be 0.97. It is indeed very rare for a person who has not been previously shown the influence of conservatism to arrive at such a high estimate, even if he was familiar with Bayes' theorem.

If the proportion of red chips in the bag is R, then the probability of receiving r red chips and ( n –r) blue in n samples with return – p r (1–p)n–r. So, in a typical experiment with a bag and poker chips, if NA means that the proportion of red chips is r A And NB– means that the share is RB, then the probability ratio:

When applying Bayes' formula, one needs to consider only the probability of the actual observation, and not the probabilities of other observations that he might have made but did not. This principle has broad implications for all statistical and non-statistical applications of Bayes' theorem; it is the most important technical tool for Bayesian reasoning.

Bayesian revolution

Your friends and colleagues are talking about something called "Bayes' Theorem" or "Bayes' Rule" or something called Bayesian Reasoning. They're really interested in this, so you go online and find a page about Bayes' theorem and... It's an equation. And that's it... Why does a mathematical concept create such enthusiasm in the minds? What kind of “Bayesian revolution” is happening among scientists, and it is argued that even the experimental approach itself can be described as its special case? What is the secret that Bayesians know? What kind of light do they see?

The Bayesian revolution in science did not happen because more and more cognitive scientists suddenly began to notice that mental phenomena had a Bayesian structure; not because scientists in every field have begun to use the Bayesian method; but because science itself is a special case of Bayes' theorem; experimental evidence is Bayesian evidence. Bayesian revolutionaries argue that when you perform an experiment and obtain evidence that “confirms” or “disproves” your theory, that confirmation or refutation occurs according to Bayesian rules. For example, you must consider not only that your theory can explain a phenomenon, but also that there are other possible explanations that can also predict that phenomenon.

Previously, the most popular philosophy of science was the old philosophy, which was displaced by the Bayesian revolution. Karl Popper's idea that theories can be completely falsified but never fully verified is another special case of Bayesian rules; if p(X|A) ≈ 1 – if the theory makes correct predictions, then observing ~X falsifies A very strongly. On the other hand, if p(X|A) ≈ 1 and we observe X, this does not strongly confirm the theory; perhaps some other condition B is possible, such that p(X|B) ≈ 1, and under which observation X does not testify in favor of A but does testify in favor of B. For observation X to definitely confirm A, we would have to know not that that p(X|A) ≈ 1 and that p(X|~A) ≈ 0, which we cannot know because we cannot consider all possible alternative explanations. For example, when Einstein's theory of general relativity surpassed Newton's well-supported theory of gravity, it made all the predictions of Newton's theory a special case of the predictions of Einstein's.

In a similar way, Popper's claim that an idea must be falsifiable can be interpreted as a manifestation of the Bayesian rule of conservation of probability; if result X is positive evidence for the theory, then result ~X must disprove the theory to some extent. If you try to interpret both X and ~X as "confirming" the theory, Bayesian rules say it's impossible! To increase the likelihood of a theory you must subject it to tests that can potentially reduce its likelihood; This is not just a rule to identify charlatans in science, but a corollary of the Bayesian probability theorem. On the other hand, Popper's idea that only falsification is needed and no confirmation is needed is incorrect. Bayes' theorem shows that falsification is very strong evidence compared to confirmation, but falsification is still probabilistic in nature; it is not governed by fundamentally different rules and is no different in this way from confirmation, as Popper claims.

Thus, we find that many phenomena in the cognitive sciences, plus the statistical methods used by scientists, plus the scientific method itself, are all special cases of Bayes' theorem. This is the Bayesian revolution.

Welcome to the Bayesian Conspiracy!

Literature on Bayesian probability

2. A lot of different applications of Bayes are described by the Nobel laureate in economics Kahneman (and his comrades) in a wonderful book. In my brief summary of this very large book alone, I counted 27 mentions of the name of a Presbyterian minister. Minimum formulas. (.. I really liked it. True, it’s a bit complicated, there’s a lot of mathematics (and where would we be without it), but individual chapters (for example, Chapter 4. Information) are clearly on topic. I recommend it to everyone. Even if mathematics is difficult for you, read every other line , skipping math, and fishing for useful grains...

14. (addition dated January 15, 2017), a chapter from the book by Tony Crilly. 50 ideas you need to know about. Mathematics.

Nobel laureate physicist Richard Feynman, speaking of one philosopher with particularly great self-importance, once said: “What irritates me is not philosophy as a science, but the pomposity that is created around it. If only philosophers could laugh at themselves! If only they could say: “I say it is like this, but Von Leipzig thought it was different, and he also knows something about it.” If only they remembered to clarify that it's just theirs .

Events form full group, if at least one of them will definitely occur as a result of the experiment and are pairwise incompatible.

Let's assume that the event A can occur only together with one of several pairwise incompatible events that form a complete group. We will call events ( i= 1, 2,…, n) hypotheses additional experience (a priori). The probability of occurrence of event A is determined by the formula full probability :

Example 16. There are three urns. The first urn contains 5 white and 3 black balls, the second contains 4 white and 4 black balls, and the third contains 8 white balls. One of the urns is selected at random (this could mean, for example, that the choice is made from an auxiliary urn containing three balls numbered 1, 2 and 3). A ball is drawn at random from this urn. What is the probability that it will be black?

Solution. Event A– the black ball is removed. If it were known from which urn the ball was drawn, then the desired probability could be calculated using the classical definition of probability. Let us introduce assumptions (hypotheses) regarding which urn is chosen to retrieve the ball.

The ball can be drawn either from the first urn (conjecture), or from the second (conjecture), or from the third (conjecture). Since there are equal chances of choosing any of the urns, then .

It follows that

Example 17. Electric lamps are manufactured at three factories. The first plant produces 30% of the total number of electric lamps, the second - 25%,
and the third - the rest. The products of the first plant contain 1% of defective electric lamps, the second - 1.5%, the third - 2%. The store receives products from all three factories. What is the probability that a lamp purchased in a store turns out to be defective?

Solution. Assumptions must be made regarding which plant the light bulb was manufactured in. Knowing this, we can find the probability that it is defective. Let us introduce notation for events: A– the purchased electric lamp turned out to be defective, – the lamp was manufactured by the first plant, – the lamp was manufactured by the second plant,
– the lamp was manufactured by the third plant.

We find the desired probability using the total probability formula:

Bayes' formula. Let be a complete group of pairwise incompatible events (hypotheses). A– a random event. Then,

The last formula that allows one to reestimate the probabilities of hypotheses after the result of the test that resulted in event A is known is called Bayes formula .

Example 18. On average, 50% of patients with the disease are admitted to a specialized hospital TO, 30% – with disease L, 20 % –
with illness M. Probability of complete cure of the disease K equal to 0.7 for diseases L And M these probabilities are 0.8 and 0.9, respectively. The patient admitted to the hospital was discharged healthy. Find the probability that this patient suffered from the disease K.

Solution. Let us introduce the hypotheses: – the patient suffered from a disease TO L, – the patient suffered from a disease M.

Then, according to the conditions of the problem, we have . Let's introduce an event A– the patient admitted to the hospital was discharged healthy. By condition

Using the total probability formula we get:

According to Bayes' formula.

Example 19. Let there be five balls in the urn and all guesses about the number of white balls are equally possible. A ball is taken at random from the urn and it turns out to be white. What assumption about the initial composition of the urn is most likely?

Solution. Let be the hypothesis that there are white balls in the urn , i.e., six assumptions can be made. Then, according to the conditions of the problem, we have .

Let's introduce an event A– a white ball taken at random. Let's calculate. Since , then according to Bayes’ formula we have:

Thus, the most probable hypothesis is because .

Example 20. Two of the three independently operating elements of the computing device have failed. Find the probability that the first and second elements failed if the probabilities of failure of the first, second and third elements, respectively, are 0.2; 0.4 and 0.3.

Solution. Let us denote by A event – two elements have failed. The following hypotheses can be made:

– the first and second elements have failed, but the third element is operational. Since the elements operate independently, the multiplication theorem applies: