Let's build a confidence interval in MS EXCEL for estimating the mean value of the distribution in the case of a known value of the variance.

Of course the choice level of trust completely depends on the task at hand. Thus, the degree of confidence of the air passenger in the reliability of the aircraft, of course, should be higher than the degree of confidence of the buyer in the reliability of the light bulb.

Task Formulation

Let's assume that from population having taken sample size n. It is assumed that standard deviation this distribution is known. Necessary on the basis of this samples evaluate the unknown distribution mean(μ, ) and construct the corresponding bilateral confidence interval.

Point Estimation

As is known from statistics(let's call it X cf) is unbiased estimate of the mean this population and has the distribution N(μ;σ 2 /n).

Note: What if you need to build confidence interval in the case of distribution, which is not normal? In this case, comes to the rescue, which says that with a sufficiently large size samples n from distribution non- normal, sampling distribution of statistics Х av will approximately correspond normal distribution with parameters N(μ;σ 2 /n).

So, point estimate middle distribution values we have is sample mean, i.e. X cf. Now let's get busy confidence interval.

Building a confidence interval

Usually, knowing the distribution and its parameters, we can calculate the probability that a random variable will take a value from a given interval. Now let's do the opposite: find the interval in which the random variable falls with a given probability. For example, from properties normal distribution it is known that with a probability of 95%, a random variable distributed over normal law, will fall within the interval approximately +/- 2 from mean value(see article about). This interval will serve as our prototype for confidence interval.

Now let's see if we know the distribution , to calculate this interval? To answer the question, we must specify the form of distribution and its parameters.

We know the form of distribution is normal distribution(remember that we are talking about sampling distribution statistics X cf).

The parameter μ is unknown to us (it just needs to be estimated using confidence interval), but we have its estimate X cf, calculated based on sample, which can be used.

The second parameter is sample mean standard deviation will be known, it is equal to σ/√n.

Because we do not know μ, then we will build the interval +/- 2 standard deviations not from mean value, but from its known estimate X cf. Those. when calculating confidence interval we will NOT assume that X cf will fall within the interval +/- 2 standard deviations from μ with a probability of 95%, and we will assume that the interval is +/- 2 standard deviations from X cf with a probability of 95% will cover μ - the average of the general population, from which sample. These two statements are equivalent, but the second statement allows us to construct confidence interval.

In addition, we refine the interval: a random variable distributed over normal law, with a 95% probability falls within the interval +/- 1.960 standard deviations, not +/- 2 standard deviations. This can be calculated using the formula \u003d NORM.ST.OBR ((1 + 0.95) / 2), cm. sample file Sheet Spacing.

Now we can formulate a probabilistic statement that will serve us to form confidence interval:
"The probability that population mean located from sample average within 1.960" standard deviations of the sample mean", is equal to 95%.

The probability value mentioned in the statement has a special name , which is associated with significance level α (alpha) by a simple expression trust level =1 -α . In our case significance level α =1-0,95=0,05 .

Now, based on this probabilistic statement, we write an expression for calculating confidence interval:

where Zα/2 – standard normal distribution(such a value of a random variable z, What P(z>=Zα/2 )=α/2).

Note: Upper α/2-quantile defines the width confidence interval V standard deviations sample mean. Upper α/2-quantile standard normal distribution is always greater than 0, which is very convenient.

In our case, at α=0.05, upper α/2-quantile equals 1.960. For other significance levels α (10%; 1%) upper α/2-quantile Zα/2 can be calculated using the formula \u003d NORM.ST.OBR (1-α / 2) or, if known trust level, =NORM.ST.OBR((1+confidence level)/2).

Usually when building confidence intervals for estimating the mean use only upper α/2-quantile and do not use lower α/2-quantile. This is possible because standard normal distribution symmetrical about the x-axis ( density of its distribution symmetrical about average, i.e. 0). Therefore, there is no need to calculate lower α/2-quantile(it is simply called α /2-quantile), because it is equal upper α/2-quantile with a minus sign.

Recall that, regardless of the shape of the distribution of x, the corresponding random variable X cf distributed approximately Fine N(μ;σ 2 /n) (see article about). Therefore, in general, the above expression for confidence interval is only approximate. If x is distributed over normal law N(μ;σ 2 /n), then the expression for confidence interval is accurate.

Calculation of confidence interval in MS EXCEL

Let's solve the problem.
The response time of an electronic component to an input signal is an important characteristic of a device. An engineer wants to plot a confidence interval for the average response time at a confidence level of 95%. From previous experience, the engineer knows that the standard deviation of the response time is 8 ms. It is known that the engineer made 25 measurements to estimate the response time, the average value was 78 ms.

Solution: An engineer wants to know the response time of an electronic device, but he understands that the response time is not fixed, but a random variable that has its own distribution. So the best he can hope for is to determine the parameters and shape of this distribution.

Unfortunately, from the condition of the problem, we do not know the form of the distribution of the response time (it does not have to be normal). , this distribution is also unknown. Only he is known standard deviationσ=8. Therefore, while we cannot calculate the probabilities and construct confidence interval.

However, although we do not know the distribution time separate response, we know that according to CPT, sampling distribution average response time is approximately normal(we will assume that the conditions CPT are performed, because size samples large enough (n=25)) .

Moreover, average this distribution is equal to mean value unit response distributions, i.e. μ. A standard deviation of this distribution (σ/√n) can be calculated using the formula =8/ROOT(25) .

It is also known that the engineer received point estimate parameter μ equal to 78 ms (X cf). Therefore, now we can calculate the probabilities, because we know the distribution form ( normal) and its parameters (Х ср and σ/√n).

Engineer wants to know expected valueμ of the response time distribution. As stated above, this μ is equal to expectation of the sample distribution of the average response time. If we use normal distribution N(X cf; σ/√n), then the desired μ will be in the range +/-2*σ/√n with a probability of approximately 95%.

Significance level equals 1-0.95=0.05.

Finally, find the left and right border confidence interval.
Left border: \u003d 78-NORM.ST.INR (1-0.05 / 2) * 8 / ROOT (25) = 74,864
Right border: \u003d 78 + NORM. ST. OBR (1-0.05 / 2) * 8 / ROOT (25) \u003d 81.136

Left border: =NORM.INV(0.05/2, 78, 8/SQRT(25))
Right border: =NORM.INV(1-0.05/2, 78, 8/SQRT(25))

Answer: confidence interval at 95% confidence level and σ=8msec equals 78+/-3.136ms

IN example file on sheet Sigma known created a form for calculation and construction bilateral confidence interval for arbitrary samples with a given σ and significance level.

CONFIDENCE.NORM() function

If the values samples are in the range B20:B79 , A significance level equal to 0.05; then MS EXCEL formula:
=AVERAGE(B20:B79)-CONFIDENCE(0.05,σ, COUNT(B20:B79))
will return the left border confidence interval.

The same boundary can be calculated using the formula:
=AVERAGE(B20:B79)-NORM.ST.INV(1-0.05/2)*σ/SQRT(COUNT(B20:B79))

Note: The TRUST.NORM() function appeared in MS EXCEL 2010. Earlier versions of MS EXCEL used the TRUST() function.

Confidence interval for mathematical expectation - this is such an interval calculated from the data, which with a known probability contains the mathematical expectation of the general population. The natural estimate for the mathematical expectation is the arithmetic mean of its observed values. Therefore, further during the lesson we will use the terms "average", "average value". In problems of calculating the confidence interval, the answer most often required is "The confidence interval of the average number [value in a specific problem] is from [lower value] to [higher value]". With the help of the confidence interval, it is possible to evaluate not only the average values, but also the share of one or another feature of the general population. Mean values, variance, standard deviation and error, through which we will come to new definitions and formulas, are analyzed in the lesson Sample and Population Characteristics .

Point and interval estimates of the mean

If the mean value of the general population is estimated by a number (point), then a specific mean calculated from a sample of observations is taken as an estimate of the unknown mean of the general population. In this case, the value of the sample mean - a random variable - does not coincide with the mean value of the general population. Therefore, when indicating the mean value of the sample, it is also necessary to indicate the sample error at the same time. The standard error is used as a measure of sampling error, which is expressed in the same units as the mean. Therefore, the following notation is often used: .

If the estimate of the mean is required to be associated with a certain probability, then the parameter of the general population of interest must be estimated not by a single number, but by an interval. A confidence interval is an interval in which, with a certain probability, P the value of the estimated indicator of the general population is found. Confidence interval in which with probability P = 1 - α is a random variable , is calculated as follows:

α = 1 - P, which can be found in the appendix to almost any book on statistics.

In practice, the population mean and variance are not known, so the population variance is replaced by the sample variance, and the population mean by the sample mean. Thus, the confidence interval in most cases is calculated as follows:

The confidence interval formula can be used to estimate the population mean if

the standard deviation of the general population is known;
or the standard deviation of the population is not known, but the sample size is greater than 30.

The sample mean is an unbiased estimate of the population mean. In turn, the sample variance is not an unbiased estimate of the population variance . To obtain an unbiased estimate of the population variance in the sample variance formula, the sample size is n should be replaced with n-1.

Example 1 Information is collected from 100 randomly selected cafes in a certain city that the average number of employees in them is 10.5 with a standard deviation of 4.6. Determine the confidence interval of 95% of the number of cafe workers.

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

Thus, the 95% confidence interval for the average number of cafe employees was between 9.6 and 11.4.

Example 2 For a random sample from a general population of 64 observations, the following total values were calculated:

sum of values in observations ,

sum of squared deviations of values from the mean .

Calculate the 95% confidence interval for the expected value.

calculate the standard deviation:

calculate the average value:

Substitute the values in the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

Thus, the 95% confidence interval for the mathematical expectation of this sample ranged from 7.484 to 11.266.

Example 3 For a random sample from a general population of 100 observations, a mean value of 15.2 and a standard deviation of 3.2 were calculated. Calculate the 95% confidence interval for the expected value, then the 99% confidence interval. If the sample power and its variation remain the same, but the confidence factor increases, will the confidence interval narrow or widen?

We substitute these values into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

Thus, the 95% confidence interval for the average of this sample was from 14.57 to 15.82.

Again, we substitute these values into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,01 .

We get:

Thus, the 99% confidence interval for the average of this sample was from 14.37 to 16.02.

As you can see, as the confidence factor increases, the critical value of the standard normal distribution also increases, and, therefore, the start and end points of the interval are located further from the mean, and thus the confidence interval for the mathematical expectation increases.

Point and interval estimates of the specific gravity

The share of some feature of the sample can be interpreted as a point estimate of the share p the same trait in the general population. If this value needs to be associated with a probability, then the confidence interval of the specific gravity should be calculated p feature in the general population with a probability P = 1 - α :

Example 4 There are two candidates in a certain city A And B running for mayor. 200 residents of the city were randomly polled, of which 46% answered that they would vote for the candidate A, 26% - for the candidate B and 28% do not know who they will vote for. Determine the 95% confidence interval for the proportion of city residents who support the candidate A.

Confidence interval are the limiting values of the statistical quantity, which, with a given confidence probability γ, will be in this interval with a larger sample size. Denoted as P(θ - ε . In practice, the confidence probability γ is chosen from the values γ = 0.9 , γ = 0.95 , γ = 0.99 sufficiently close to unity.

Service assignment. This service defines:

confidence interval for the general mean, confidence interval for the variance;
confidence interval for the standard deviation, confidence interval for the general fraction;

The resulting solution is saved in a Word file (see example). Below is a video instruction on how to fill in the initial data.

Example #1. On a collective farm, out of a total herd of 1,000 sheep, 100 sheep were subjected to selective control shearing. As a result, an average wool shear of 4.2 kg per sheep was established. Determine with a probability of 0.99 the standard error of the sample in determining the average wool shear per sheep and the limits in which the shear value lies if the variance is 2.5. The sample is nonrepetitive.
Example #2. From the batch of imported products at the post of the Moscow Northern Customs, 20 samples of product "A" were taken in the order of random re-sampling. As a result of the check, the average moisture content of the product "A" in the sample was established, which turned out to be 6% with a standard deviation of 1%.
Determine with a probability of 0.683 the limits of the average moisture content of the product in the entire batch of imported products.
Example #3. A survey of 36 students showed that the average number of textbooks read by them per academic year turned out to be 6. Assuming that the number of textbooks read by a student per semester has a normal distribution law with a standard deviation equal to 6, find: A) with a reliability of 0 .99 interval estimate for the mathematical expectation of this random variable; B) with what probability can it be argued that the average number of textbooks read by a student per semester, calculated for this sample, deviates from the mathematical expectation in absolute value by no more than 2.

Classification of confidence intervals

By the type of parameter being evaluated:

By sample type:

Confidence interval for infinite sampling;
Confidence interval for the final sample;

Sampling is called re-sampling, if the selected object is returned to the general population before choosing the next one. The sample is called non-repetitive. if the selected object is not returned to the general population. In practice, one usually deals with non-repeating samples.

Calculation of the mean sampling error for random selection

The discrepancy between the values of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error.
Designations of the main parameters of the general and sample population.

Mathematics and informatics. Study guide throughout the course

Let the random variable X of the general population be normally distributed, given that the variance and standard deviation s of this distribution are known. It is required to estimate the unknown mathematical expectation from the sample mean. In this case, the problem is reduced to finding a confidence interval for the mathematical expectation with reliability b. If we set the value of the confidence probability (reliability) b, then we can find the probability of falling into the interval for the unknown mathematical expectation using formula (6.9a):

where Ф(t) is the Laplace function (5.17a).

As a result, we can formulate an algorithm for finding the boundaries of the confidence interval for the mathematical expectation if the variance D = s 2 is known:

Set the reliability value to b .
From (6.14) express Ф(t) = 0.5× b. Select the value t from the table for the Laplace function by the value Ф(t) (see Appendix 1).
Calculate the deviation e using formula (6.10).
Write the confidence interval according to formula (6.12) such that with probability b the following inequality is true:

Example 5.

The random variable X has a normal distribution. Find confidence intervals for an estimate with reliability b = 0.96 of the unknown mean a, if given:

1) general standard deviation s = 5;

2) sample mean ;

3) sample size n = 49.

In formula (6.15) of the interval estimate of the mathematical expectation A with reliability b, all quantities except t are known. The value of t can be found using (6.14): b = 2Ф(t) = 0.96. Ф(t) = 0.48.

According to the table of Appendix 1 for the Laplace function Ф(t) = 0.48, find the corresponding value t = 2.06. Hence, . Substituting the calculated value of e into formula (6.12), we can obtain a confidence interval: 30-1.47< a < 30+1,47.

The desired confidence interval for an estimate with reliability b = 0.96 of the unknown mathematical expectation is: 28.53< a < 31,47.

Sample Mean Error Formulas
reselection		non-repetitive selection
for middle	for share	for middle	for share

The ratio between the sampling error limit (Δ) guaranteed with some probability P(t), and the average sampling error has the form: or Δ = t μ, where t– confidence coefficient, determined depending on the level of probability P(t) according to the table of the integral Laplace function.

Formulas for calculating the sample size with a proper random selection method

Let CB X form the general population and β be an unknown parameter CB X. If the statistical estimate in * is consistent, then the larger the sample size, the more accurate the value of β. However, in practice, we have not very large samples, so we cannot guarantee greater accuracy.

The reliability g or the confidence probability of the estimate in by in * is the probability g with which the inequality |in * - in|< 8, т. е.

Usually, the reliability of g is set in advance, and, for g, they take a number close to 1 (0.9; 0.95; 0.99; ...).

Since the inequality |in * - in|< S равносильно двойному неравенству в* - S < в < в* + 8, то получаем:

The interval (in * - 8, in * + 5) is called the confidence interval, i.e., the confidence interval covers the unknown parameter in with probability y. Note that the ends of the confidence interval are random and vary from sample to sample, so it is more accurate to say that the interval (at * - 8, at * + 8) covers the unknown parameter β rather than β belongs to this interval.

Let the general population be given by a random variable X, distributed according to the normal law, moreover, the standard deviation a is known. The mathematical expectation a = M (X) is unknown. It is required to find a confidence interval for a for a given reliability y.

Sample mean

is a statistical estimate for xr = a.

Theorem. A random variable xB has a normal distribution if X has a normal distribution and M(XB) = a,

A (XB) \u003d a, where a \u003d y / B (X), a \u003d M (X). l/i

The confidence interval for a has the form:

We find 8.

Using the relation

where Ф(г) is the Laplace function, we have:

P ( | XB - a |<8} = 2Ф

we find the value of t in the table of values of the Laplace function.

Denoting

T, we get F(t) = g

From the equality Find - the accuracy of the estimate.

So the confidence interval for a has the form:

If a sample is given from the general population X

ng	To"	X2	xm
n.	n1	n2	nm

n = U1 + ... + nm, then the confidence interval will be:

Example 6.35. Find the confidence interval for estimating the expectation a of a normal distribution with a reliability of 0.95, knowing the sample mean Xb = 10.43, the sample size n = 100, and the standard deviation s = 5.

Let's use the formula