Since the new variable is normally distributed, the lower and upper bounds of the 95% confidence interval for variable φ will be φ-1.96 and φ+1.96left">

Instead of 1.96 for small samples, it is recommended to substitute the value of t for N - 1 degrees of freedom. This method does not give negative values ​​and allows you to more accurately estimate the confidence intervals for frequencies than the Wald method. In addition, it is described in many domestic reference books on medical statistics, which, however, did not lead to its widespread use in medical research. Calculating confidence intervals using an angle transform is not recommended for frequencies approaching 0 or 1.

This is where the description of methods for estimating confidence intervals in most books on the basics of statistics for medical researchers usually ends, and this problem is typical not only for domestic, but also for foreign literature. Both methods are based on the central limit theorem, which implies a large sample.

Given the shortcomings of estimating confidence intervals using the above methods, Clopper (Clopper) and Pearson (Pearson) proposed in 1934 a method for calculating the so-called exact confidence interval, taking into account the binomial distribution of the studied trait. This method is available in many online calculators, however, the confidence intervals obtained in this way are in most cases too wide. At the same time, this method is recommended for use in cases where a conservative estimate is required. The degree of conservativeness of the method increases as the sample size decreases, especially for N< 15 . описывает применение функции биномиального распределения для анализа качественных данных с использованием MS Excel, в том числе и для определения доверительных интервалов, однако расчет последних для частот в электронных таблицах не «затабулирован» в удобном для пользователя виде, а потому, вероятно, и не используется большинством исследователей.

According to many statisticians, the most optimal estimate of confidence intervals for frequencies is carried out by the Wilson method, proposed back in 1927, but practically not used in domestic biomedical research. This method not only makes it possible to estimate confidence intervals for both very small and very high frequencies, but is also applicable to a small number of observations. In general, the confidence interval according to the Wilson formula has the form from

What is the probability of the confidence interval. Confidence interval

The mind is not only in knowledge, but also in the ability to apply knowledge in practice. (Aristotle)

Confidence intervals

general review

Taking a sample from the population, we will obtain a point estimate of the parameter of interest to us and calculate the standard error in order to indicate the accuracy of the estimate.

However, for most cases, the standard error as such is not acceptable. It is much more useful to combine this measure of precision with an interval estimate for the population parameter.

This can be done by using knowledge of the theoretical probability distribution of the sample statistic (parameter) in order to calculate a confidence interval (CI - Confidence Interval, CI - Confidence Interval) for the parameter.

In general, the confidence interval extends the estimates in both directions by some multiple of the standard error (of a given parameter); the two values ​​(confidence limits) that define the interval are usually separated by a comma and enclosed in parentheses.

Confidence interval for mean

Using the normal distribution

The sample mean has a normal distribution if the sample size is large, so knowledge of the normal distribution can be applied when considering the sample mean.

In particular, 95% of the distribution of the sample means is within 1.96 standard deviations (SD) of the population mean.

When we have only one sample, we call this the standard error of the mean (SEM) and calculate the 95% confidence interval for the mean as follows:

If this experiment is repeated several times, then the interval will contain the true population mean 95% of the time.

This is usually a confidence interval, such as the range of values ​​within which the true population mean (general mean) lies with a 95% confidence level.

Although it is not quite strict (the population mean is a fixed value and therefore cannot have a probability related to it) to interpret the confidence interval in this way, it is conceptually easier to understand.

Usage t- distribution

You can use the normal distribution if you know the value of the variance in the population. Also, when the sample size is small, the sample mean follows a normal distribution if the data underlying the population are normally distributed.

If the data underlying the population are not normally distributed and/or the general variance (population variance) is unknown, the sample mean obeys Student's t-distribution.

Calculate the 95% confidence interval for the population mean as follows:

Where - percentage point (percentile) t- Student distribution with (n-1) degrees of freedom, which gives a two-tailed probability of 0.05.

In general, it provides a wider interval than when using a normal distribution, because it takes into account the additional uncertainty that is introduced by estimating the population standard deviation and/or due to the small sample size.

When the sample size is large (of the order of 100 or more), the difference between the two distributions ( t-student and normal) is negligible. However, always use t- distribution when calculating confidence intervals, even if the sample size is large.

Usually 95% CI is given. Other confidence intervals can be calculated, such as 99% CI for the mean.

Instead of product of standard error and table value t- distribution that corresponds to a two-tailed probability of 0.05 multiply it (standard error) by a value that corresponds to a two-tailed probability of 0.01. This is a wider confidence interval than the 95% case because it reflects increased confidence that the interval does indeed include the population mean.

Confidence interval for proportion

The sampling distribution of proportions has a binomial distribution. However, if the sample size n reasonably large, then the proportion sample distribution is approximately normal with mean .

Estimate by sampling ratio p=r/n(where r- the number of individuals in the sample with the characteristics of interest to us), and the standard error is estimated:

The 95% confidence interval for the proportion is estimated:

If the sample size is small (usually when np or n(1-p) less 5 ), then the binomial distribution must be used in order to calculate the exact confidence intervals.

Note that if p expressed as a percentage, then (1-p) replaced by (100p).

Interpretation of confidence intervals

When interpreting the confidence interval, we are interested in the following questions:

How wide is the confidence interval?

A wide confidence interval indicates that the estimate is imprecise; narrow indicates a fine estimate.

The width of the confidence interval depends on the size of the standard error, which, in turn, depends on the sample size and, when considering a numeric variable from the variability of the data, give wider confidence intervals than studies of a large data set of few variables.

Does the CI include any values ​​of particular interest?

You can check whether the likely value for a population parameter falls within a confidence interval. If yes, then the results are consistent with this likely value. If not, then it is unlikely (for a 95% confidence interval, the chance is almost 5%) that the parameter has this value.

"Katren-Style" continues to publish a cycle of Konstantin Kravchik on medical statistics. In two previous articles, the author touched on the explanation of such concepts as and.

Konstantin Kravchik

Mathematician-analyst. Specialist in the field of statistical research in medicine and the humanities

Moscow city

Very often in articles on clinical trials you can find a mysterious phrase: "confidence interval" (95% CI or 95% CI - confidence interval). For example, an article might say: "Student's t-test was used to assess the significance of differences, with a 95% confidence interval calculated."

What is the value of the "95% confidence interval" and why calculate it?

What is a confidence interval? - This is the range in which the true mean values ​​in the population fall. And what, there are "untrue" averages? In a sense, yes, they do. In we explained that it is impossible to measure the parameter of interest in the entire population, so the researchers are content with a limited sample. In this sample (for example, by body weight) there is one average value (a certain weight), by which we judge the average value in the entire general population. However, it is unlikely that the average weight in the sample (especially a small one) will coincide with the average weight in the general population. Therefore, it is more correct to calculate and use the range of average values ​​of the general population.

For example, suppose the 95% confidence interval (95% CI) for hemoglobin is between 110 and 122 g/L. This means that with a 95 % probability, the true mean value for hemoglobin in the general population will be in the range from 110 to 122 g/l. In other words, we do not know the average hemoglobin in the general population, but we can indicate the range of values ​​for this feature with 95% probability.

Confidence intervals are particularly relevant to the difference in means between groups, or what is called the effect size.

Suppose we compared the effectiveness of two iron preparations: one that has been on the market for a long time and one that has just been registered. After the course of therapy, the concentration of hemoglobin in the studied groups of patients was assessed, and the statistical program calculated for us that the difference between the average values ​​of the two groups with a probability of 95% is in the range from 1.72 to 14.36 g/l (Table 1).

Tab. 1. Criterion for independent samples
(groups are compared by hemoglobin level)

This should be interpreted as follows: in a part of patients in the general population who take a new drug, hemoglobin will be higher on average by 1.72–14.36 g/l than in those who took an already known drug.

In other words, in the general population, the difference in the average values ​​for hemoglobin in groups with a 95% probability is within these limits. It will be up to the researcher to judge whether this is a lot or a little. The point of all this is that we are not working with one average value, but with a range of values, therefore, we more reliably estimate the difference in a parameter between groups.

In statistical packages, at the discretion of the researcher, one can independently narrow or expand the boundaries of the confidence interval. By lowering the probabilities of the confidence interval, we narrow the range of means. For example, at 90% CI, the range of means (or mean differences) will be narrower than at 95% CI.

Conversely, increasing the probability to 99% widens the range of values. When comparing groups, the lower limit of the CI may cross the zero mark. For example, if we extended the boundaries of the confidence interval to 99 %, then the boundaries of the interval ranged from –1 to 16 g/L. This means that in the general population there are groups, the difference between the averages between which for the studied trait is 0 (M=0).

Confidence intervals can be used to test statistical hypotheses. If the confidence interval crosses the zero value, then the null hypothesis, which assumes that the groups do not differ in the studied parameter, is true. An example is described above, when we expanded the boundaries to 99%. Somewhere in the general population, we found groups that did not differ in any way.

95% confidence interval of difference in hemoglobin, (g/l)


The figure shows the 95% confidence interval of the mean hemoglobin difference between the two groups as a line. The line passes the zero mark, therefore, there is a difference between the means equal to zero, which confirms the null hypothesis that the groups do not differ. The difference between the groups ranges from -2 to 5 g/l, which means that hemoglobin can either decrease by 2 g/l or increase by 5 g/l.

The confidence interval is a very important indicator. Thanks to it, you can see if the differences in the groups were really due to the difference in the means or due to a large sample, because with a large sample, the chances of finding differences are greater than with a small one.

In practice, it might look like this. We took a sample of 1000 people, measured the hemoglobin level and found that the confidence interval for the difference in the means lies from 1.2 to 1.5 g/L. The level of statistical significance in this case p

We see that the hemoglobin concentration increased, but almost imperceptibly, therefore, the statistical significance appeared precisely due to the sample size.

Confidence intervals can be calculated not only for averages, but also for proportions (and risk ratios). For example, we are interested in the confidence interval of the proportions of patients who achieved remission while taking the developed drug. Assume that the 95% CI for the proportions, i.e. for the proportion of such patients, is in the range 0.60–0.80. Thus, we can say that our medicine has a therapeutic effect in 60 to 80% of cases.

Any sample gives only an approximate idea of ​​the general population, and all sample statistical characteristics (mean, mode, variance ...) are some approximation or say an estimate of the general parameters, which in most cases cannot be calculated due to the inaccessibility of the general population (Figure 20) .

Figure 20. Sampling error

But you can specify the interval in which, with a certain degree of probability, lies the true (general) value of the statistical characteristic. This interval is called d confidence interval (CI).

So the general average with a probability of 95% lies within

from to, (20)

where t - tabular value of Student's criterion for α =0.05 and f= n-1

Can be found and 99% CI, in this case t chosen for α =0,01.

What is the practical significance of a confidence interval?

    A wide confidence interval indicates that the sample mean does not accurately reflect the population mean. This is usually due to an insufficient sample size, or to its heterogeneity, i.e. large dispersion. Both give a large error in the mean and, accordingly, a wider CI. And this is the reason to return to the research planning stage.

    Upper and lower CI limits assess whether the results will be clinically significant

Let us dwell in more detail on the question of the statistical and clinical significance of the results of the study of group properties. Recall that the task of statistics is to detect at least some differences in general populations, based on sample data. It is the clinician's task to find such (not any) differences that will help diagnosis or treatment. And not always statistical conclusions are the basis for clinical conclusions. Thus, a statistically significant decrease in hemoglobin by 3 g/l is not a cause for concern. And, conversely, if some problem in the human body does not have a mass character at the level of the entire population, this is not a reason not to deal with this problem.

We will consider this position in example.

The researchers wondered if boys who had some kind of infectious disease were lagging behind their peers in growth. For this purpose, a selective study was conducted, in which 10 boys who had this disease took part. The results are presented in table 23.

Table 23. Statistical results

lower limit

upper limit

Specifications (cm)

middle

From these calculations, it follows that the selective average height of 10-year-old boys who have had some kind of infectious disease is close to normal (132.5 cm). However, the lower limit of the confidence interval (126.6 cm) indicates that there is a 95% probability that the true average height of these children corresponds to the concept of "short stature", i.e. these children are stunted.

In this example, the results of the confidence interval calculations are clinically significant.

CONFIDENCE INTERVALS FOR FREQUENCIES AND PARTS

© 2008

National Institute of Public Health, Oslo, Norway

The article describes and discusses the calculation of confidence intervals for frequencies and proportions using the Wald, Wilson, Klopper-Pearson methods, using the angular transformation and the Wald method with Agresti-Cowll correction. The presented material provides general information about methods for calculating confidence intervals for frequencies and proportions and is intended to arouse the interest of the journal's readers not only in using confidence intervals when presenting the results of their own research, but also in reading specialized literature before starting work on future publications.

Keywords: confidence interval, frequency, proportion

In one of the previous publications, the description of qualitative data was briefly mentioned and it was reported that their interval estimate is preferable to a point estimate for describing the frequency of occurrence of the studied characteristic in the general population. Indeed, since studies are conducted using sample data, the projection of the results on the general population must contain an element of inaccuracy in the sample estimate. The confidence interval is a measure of the accuracy of the estimated parameter. It is interesting that in some books on the basics of statistics for physicians, the topic of confidence intervals for frequencies is completely ignored. In this article, we will consider several ways to calculate confidence intervals for frequencies, assuming sample characteristics such as non-recurrence and representativeness, as well as the independence of observations from each other. The frequency in this article is not understood as an absolute number showing how many times this or that value occurs in the aggregate, but a relative value that determines the proportion of study participants who have the trait under study.

In biomedical research, 95% confidence intervals are most commonly used. This confidence interval is the region within which the true proportion falls 95% of the time. In other words, it can be said with 95% certainty that the true value of the frequency of occurrence of a trait in the general population will be within the 95% confidence interval.

Most statistical textbooks for medical researchers report that the frequency error is calculated using the formula

where p is the frequency of occurrence of the feature in the sample (value from 0 to 1). In most domestic scientific articles, the value of the frequency of occurrence of a feature in the sample (p) is indicated, as well as its error (s) in the form of p ± s. It is more expedient, however, to present a 95% confidence interval for the frequency of occurrence of a trait in the general population, which will include values ​​from

before.

In some textbooks, for small samples, it is recommended to replace the value of 1.96 with the value of t for N - 1 degrees of freedom, where N is the number of observations in the sample. The value of t is found in the tables for the t-distribution, which are available in almost all textbooks on statistics. The use of the distribution of t for the Wald method does not provide visible advantages over other methods discussed below, and therefore is not welcomed by some authors.

The above method for calculating confidence intervals for frequencies or fractions is named after Abraham Wald (Abraham Wald, 1902–1950), since it began to be widely used after the publication of Wald and Wolfowitz in 1939. However, the method itself was proposed by Pierre Simon Laplace (1749–1827) as early as 1812.

The Wald method is very popular, but its application is associated with significant problems. The method is not recommended for small sample sizes, as well as in cases where the frequency of occurrence of a feature tends to 0 or 1 (0% or 100%) and is simply not possible for frequencies of 0 and 1. In addition, the normal distribution approximation, which is used when calculating the error , "does not work" in cases where n p< 5 или n · (1 – p) < 5 . Более консервативные статистики считают, что n · p и n · (1 – p) должны быть не менее 10 . Более детальное рассмотрение метода Вальда показало, что полученные с его помощью доверительные интервалы в большинстве случаев слишком узки, то есть их применение ошибочно создает слишком оптимистичную картину, особенно при удалении частоты встречаемости признака от 0,5, или 50 % . К тому же при приближении частоты к 0 или 1 доверительный интревал может принимать отрицательные значения или превышать 1, что выглядит абсурдно для частот. Многие авторы совершенно справедливо не рекомендуют применять данный метод не только в уже упомянутых случаях, но и тогда, когда частота встречаемости признака менее 25 % или более 75 % . Таким образом, несмотря на простоту расчетов, метод Вальда может применяться лишь в очень ограниченном числе случаев. Зарубежные исследователи более категоричны в своих выводах и однозначно рекомендуют не применять этот метод для небольших выборок , а ведь именно с такими выборками часто приходится иметь дело исследователям-медикам.



where it takes the value 1.96 when calculating the 95% confidence interval, N is the number of observations, and p is the frequency of the feature in the sample. This method is available in online calculators, so its application is not problematic. and do not recommend using this method for n p< 4 или n · (1 – p) < 4 по причине слишком грубого приближения распределения р к нормальному в такой ситуации, однако зарубежные статистики считают метод Уилсона применимым и для малых выборок .

In addition to the Wilson method, the Agresti–Caull-corrected Wald method is also believed to provide an optimal estimate of the confidence interval for frequencies. The Agresti-Coulle correction is a replacement in the Wald formula for the frequency of occurrence of a trait in the sample (p) by p`, when calculating which 2 is added to the numerator, and 4 is added to the denominator, that is, p` = (X + 2) / (N + 4), where X is the number of study participants who have the trait under study, and N is the sample size. This modification produces results very similar to those of the Wilson formula, except when the event rate approaches 0% or 100% and the sample is small. In addition to the above methods for calculating confidence intervals for frequencies, continuity corrections have been proposed for both the Wald method and the Wilson method for small samples, but studies have shown that their use is inappropriate.

Consider the application of the above methods for calculating confidence intervals using two examples. In the first case, we study a large sample of 1,000 randomly selected study participants, of which 450 have the trait under study (whether it be a risk factor, an outcome, or any other trait), which is a frequency of 0.45, or 45%. In the second case, the study is conducted using a small sample, say, only 20 people, and only 1 participant in the study (5%) has the trait under study. Confidence intervals for the Wald method, for the Wald method with Agresti-Coll correction, for the Wilson method were calculated using an online calculator developed by Jeff Sauro (http://www./wald.htm). Continuity-corrected Wilson confidence intervals were calculated using the calculator provided by Wassar Stats: Web Site for Statistical Computation (http://faculty.vassar.edu/lowry/prop1.html). Calculations using the Fisher angular transformation were performed "manually" using the critical value of t for 19 and 999 degrees of freedom, respectively. The calculation results are presented in the table for both examples.

Confidence intervals calculated in six different ways for the two examples described in the text

Confidence Interval Calculation Method

P=0.0500, or 5%

95% CI for X=450, N=1000, P=0.4500, or 45%

–0,0455–0,2541

Walda with Agresti-Coll correction

<,0001–0,2541

Wilson with continuity correction

Klopper-Pearson's "exact method"

Angular transformation

<0,0001–0,1967

As can be seen from the table, for the first example, the confidence interval calculated by the "generally accepted" Wald method goes into the negative region, which cannot be the case for frequencies. Unfortunately, such incidents are not uncommon in Russian literature. The traditional way of representing data as a frequency and its error partially masks this problem. For example, if the frequency of occurrence of a trait (in percent) is presented as 2.1 ± 1.4, then this is not as “irritating” as 2.1% (95% CI: –0.7; 4.9), although and means the same. The Wald method with the Agresti-Coulle correction and the calculation using the angular transformation give a lower bound tending to zero. The Wilson method with continuity correction and the "exact method" give wider confidence intervals than the Wilson method. For the second example, all methods give approximately the same confidence intervals (differences appear only in thousandths), which is not surprising, since the frequency of the event in this example does not differ much from 50%, and the sample size is quite large.

For readers interested in this problem, we can recommend the works of R. G. Newcombe and Brown, Cai and Dasgupta, which give the pros and cons of using 7 and 10 different methods for calculating confidence intervals, respectively. From domestic manuals, the book and is recommended, in which, in addition to a detailed description of the theory, the Wald and Wilson methods are presented, as well as a method for calculating confidence intervals, taking into account the binomial frequency distribution. In addition to free online calculators (http://www./wald.htm and http://faculty.vassar.edu/lowry/prop1.html), confidence intervals for frequencies (and not only!) can be calculated using the CIA program ( Confidence Intervals Analysis), which can be downloaded from http://www. medschool. soton. ac. uk/cia/ .

The next article will look at univariate ways to compare qualitative data.

Bibliography

Banerjee A. Medical statistics in plain language: an introductory course / A. Banerzhi. - M. : Practical medicine, 2007. - 287 p. Medical statistics / . - M. : Medical Information Agency, 2007. - 475 p. Glanz S. Medico-biological statistics / S. Glants. - M. : Practice, 1998. Data types, distribution verification and descriptive statistics / // Human Ecology - 2008. - No. 1. - P. 52–58. Zhizhin K.S.. Medical statistics: textbook / . - Rostov n / D: Phoenix, 2007. - 160 p. Applied Medical Statistics / , . - St. Petersburg. : Folio, 2003. - 428 p. Lakin G. F. Biometrics / . - M. : Higher school, 1990. - 350 p. Medic V. A. Mathematical statistics in medicine / , . - M. : Finance and statistics, 2007. - 798 p. Mathematical statistics in clinical research / , . - M. : GEOTAR-MED, 2001. - 256 p. Junkerov V. And. Medico-statistical processing of medical research data /,. - St. Petersburg. : VmedA, 2002. - 266 p. Agresti A. Approximate is better than exact for interval estimation of binomial proportions / A. Agresti, B. Coull // American statistician. - 1998. - N 52. - S. 119-126. Altman D. Statistics with confidence // D. Altman, D. Machin, T. Bryant, M. J. Gardner. - London: BMJ Books, 2000. - 240 p. Brown L.D. Interval estimation for a binomial proportion / L. D. Brown, T. T. Cai, A. Dasgupta // Statistical science. - 2001. - N 2. - P. 101-133. Clopper C.J. The use of confidence or fiducial limits illustrated in the case of the binomial / C. J. Clopper, E. S. Pearson // Biometrika. - 1934. - N 26. - P. 404-413. Garcia-Perez M. A. On the confidence interval for the binomial parameter / M. A. Garcia-Perez // Quality and quantity. - 2005. - N 39. - P. 467-481. Motulsky H. Intuitive biostatistics // H. Motulsky. - Oxford: Oxford University Press, 1995. - 386 p. Newcombe R.G. Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods / R. G. Newcombe // Statistics in Medicine. - 1998. - N. 17. - P. 857–872. Sauro J. Estimating completion rates from small samples using binomial confidence intervals: comparisons and recommendations / J. Sauro, J. R. Lewis // Proceedings of the human factors and ergonomics society annual meeting. – Orlando, FL, 2005. Wald A. Confidence limits for continuous distribution functions // A. Wald, J. Wolfovitz // Annals of Mathematical Statistics. - 1939. - N 10. - P. 105–118. Wilson E. B. Probable inference, the law of succession, and statistical inference / E. B. Wilson // Journal of American Statistical Association. - 1927. - N 22. - P. 209-212.

CONFIDENCE INTERVALS FOR PROPORTIONS

A. M. Grjibovski

National Institute of Public Health, Oslo, Norway

The article presents several methods for calculations confidence intervals for binomial proportions, namely, Wald, Wilson, arcsine, Agresti-Coull and exact Clopper-Pearson methods. The paper gives only general introduction to the problem of confidence interval estimation of a binomial proportion and its aim is not only to stimulate the readers to use confidence intervals when presenting results of own empirical research intervals, but also to encourage them to consult statistics books prior to analyzing own data and preparing manuscripts.

key words: confidence interval, proportion

Contact Information:

Senior Advisor, National Institute of Public Health, Oslo, Norway

In the previous subsections, we considered the question of estimating the unknown parameter a one number. Such an assessment is called "point". In a number of tasks, it is required not only to find for the parameter a suitable numerical value, but also evaluate its accuracy and reliability. It is required to know what errors the parameter substitution can lead to a its point estimate a and with what degree of confidence can we expect that these errors will not go beyond known limits?

Problems of this kind are especially relevant for a small number of observations, when the point estimate and in is largely random and an approximate replacement of a by a can lead to serious errors.

To give an idea of ​​the accuracy and reliability of the estimate a,

in mathematical statistics, so-called confidence intervals and confidence probabilities are used.

Let for the parameter a derived from experience unbiased estimate a. We want to estimate the possible error in this case. Let us assign some sufficiently large probability p (for example, p = 0.9, 0.95, or 0.99) such that an event with probability p can be considered practically certain, and find a value of s for which

Then the range of practically possible values ​​of the error that occurs when replacing a on the a, will be ± s; large absolute errors will appear only with a small probability a = 1 - p. Let's rewrite (14.3.1) as:

Equality (14.3.2) means that with probability p the unknown value of the parameter a falls within the interval

In this case, one circumstance should be noted. Previously, we repeatedly considered the probability of a random variable falling into a given non-random interval. Here the situation is different: a not random, but random interval / r. Randomly its position on the x-axis, determined by its center a; in general, the length of the interval 2s is also random, since the value of s is calculated, as a rule, from experimental data. Therefore, in this case, it would be better to interpret the value of p not as the probability of "hitting" the point a into the interval / p, but as the probability that a random interval / p will cover the point a(Fig. 14.3.1).

Rice. 14.3.1

The probability p is called confidence level, and the interval / p - confidence interval. Interval boundaries if. a x \u003d a- s and a 2 = a + and are called trust boundaries.

Let's give one more interpretation to the concept of a confidence interval: it can be considered as an interval of parameter values a, compatible with experimental data and not contradicting them. Indeed, if we agree to consider an event with a probability a = 1-p practically impossible, then those values ​​of the parameter a for which a - a> s must be recognized as contradicting the experimental data, and those for which |a - a a t na 2 .

Let for the parameter a there is an unbiased estimate a. If we knew the law of distribution of the quantity a, the problem of finding the confidence interval would be quite simple: it would be enough to find a value of s for which

The difficulty lies in the fact that the distribution law of the estimate a depends on the law of distribution of quantity X and, consequently, on its unknown parameters (in particular, on the parameter itself a).

To get around this difficulty, one can apply the following roughly approximate trick: replace the unknown parameters in the expression for s with their point estimates. With a relatively large number of experiments P(about 20 ... 30) this technique usually gives satisfactory results in terms of accuracy.

As an example, consider the problem of the confidence interval for the mathematical expectation.

Let produced P x, whose characteristics are the mathematical expectation t and variance D- unknown. For these parameters, the following estimates were obtained:

It is required to build a confidence interval / р, corresponding to the confidence probability р, for the mathematical expectation t quantities x.

In solving this problem, we use the fact that the quantity t is the sum P independent identically distributed random variables X h and according to the central limit theorem for sufficiently large P its distribution law is close to normal. In practice, even with a relatively small number of terms (of the order of 10 ... 20), the distribution law of the sum can be approximately considered normal. We will assume that the value t distributed according to the normal law. The characteristics of this law - the mathematical expectation and variance - are equal, respectively t and

(see chapter 13 subsection 13.3). Let's assume that the value D is known to us and we will find such a value Ep for which

Applying formula (6.3.5) of Chapter 6, we express the probability on the left side of (14.3.5) in terms of the normal distribution function

where is the standard deviation of the estimate t.

From the equation

find the Sp value:

where arg Ф* (x) is the inverse function of Ф* (X), those. such a value of the argument for which the normal distribution function is equal to X.

Dispersion D, through which the value is expressed a 1P, we do not know exactly; as its approximate value, you can use the estimate D(14.3.4) and put approximately:

Thus, the problem of constructing a confidence interval is approximately solved, which is equal to:

where gp is defined by formula (14.3.7).

In order to avoid reverse interpolation in the tables of the function Ф * (l) when calculating s p, it is convenient to compile a special table (Table 14.3.1), which lists the values ​​of the quantity

depending on r. The value (p determines for the normal law the number of standard deviations that must be set aside to the right and left of the dispersion center so that the probability of falling into the resulting area is equal to p.

Through the value of 7 p, the confidence interval is expressed as:

Table 14.3.1

Example 1. 20 experiments were carried out on the value x; the results are shown in table. 14.3.2.

Table 14.3.2

It is required to find an estimate of for the mathematical expectation of the quantity X and construct a confidence interval corresponding to a confidence level p = 0.8.

Solution. We have:

Choosing for the origin n: = 10, according to the third formula (14.2.14) we find the unbiased estimate D :

According to the table 14.3.1 we find

Confidence limits:

Confidence interval:

Parameter values t, lying in this interval are compatible with the experimental data given in table. 14.3.2.

In a similar way, a confidence interval can be constructed for the variance.

Let produced P independent experiments on a random variable X with unknown parameters from and A, and for the variance D the unbiased estimate is obtained:

It is required to approximately build a confidence interval for the variance.

From formula (14.3.11) it can be seen that the value D represents

amount P random variables of the form . These values ​​are not

independent, since any of them includes the quantity t, dependent on everyone else. However, it can be shown that as P the distribution law of their sum is also close to normal. Almost at P= 20...30 it can already be considered normal.

Let's assume that this is so, and find the characteristics of this law: the mathematical expectation and variance. Since the score D- unbiased, then M[D] = D.

Variance Calculation D D is associated with relatively complex calculations, so we give its expression without derivation:

where c 4 - the fourth central moment of the quantity x.

To use this expression, you need to substitute in it the values ​​\u200b\u200bof 4 and D(at least approximate). Instead of D you can use the evaluation D. In principle, the fourth central moment can also be replaced by its estimate, for example, by a value of the form:

but such a replacement will give an extremely low accuracy, since in general, with a limited number of experiments, high-order moments are determined with large errors. However, in practice it often happens that the form of the distribution law of the quantity X known in advance: only its parameters are unknown. Then we can try to express u4 in terms of D.

Let us take the most common case, when the value X distributed according to the normal law. Then its fourth central moment is expressed in terms of the variance (see Chapter 6 Subsection 6.2);

and formula (14.3.12) gives or

Replacing in (14.3.14) the unknown D his assessment D, we get: whence

The moment u 4 can be expressed in terms of D also in some other cases, when the distribution of the quantity X is not normal, but its appearance is known. For example, for the law of uniform density (see Chapter 5) we have:

where (a, P) is the interval on which the law is given.

Consequently,

According to the formula (14.3.12) we get: from where we find approximately

In cases where the form of the law of distribution of the value of 26 is unknown, when estimating the value of a /) it is still recommended to use the formula (14.3.16), if there are no special grounds for believing that this law is very different from the normal one (has a noticeable positive or negative kurtosis) .

If the approximate value of a /) is obtained in one way or another, then it is possible to construct a confidence interval for the variance in the same way as we built it for the mathematical expectation:

where the value depending on the given probability p is found in Table. 14.3.1.

Example 2. Find an Approximately 80% Confidence Interval for the Variance of a Random Variable X under the conditions of example 1, if it is known that the value X distributed according to a law close to normal.

Solution. The value remains the same as in Table. 14.3.1:

According to the formula (14.3.16)

According to the formula (14.3.18) we find the confidence interval:

The corresponding range of values ​​of the standard deviation: (0.21; 0.29).

14.4. Exact methods for constructing confidence intervals for the parameters of a random variable distributed according to the normal law

In the previous subsection, we considered roughly approximate methods for constructing confidence intervals for the mean and variance. Here we give an idea of ​​the exact methods for solving the same problem. We emphasize that in order to accurately find the confidence intervals, it is absolutely necessary to know in advance the form of the law of distribution of the quantity x, whereas this is not necessary for the application of approximate methods.

The idea of ​​exact methods for constructing confidence intervals is as follows. Any confidence interval is found from the condition expressing the probability of fulfillment of some inequalities, which include the estimate of interest to us a. Grade distribution law a in the general case depends on the unknown parameters of the quantity x. However, sometimes it is possible to pass in inequalities from a random variable a to some other function of observed values X p X 2, ..., X p. the distribution law of which does not depend on unknown parameters, but depends only on the number of experiments and on the form of the distribution law of the quantity x. Random variables of this kind play a large role in mathematical statistics; they have been studied in most detail for the case of a normal distribution of the quantity x.

For example, it has been proved that under a normal distribution of the quantity X random value

subject to the so-called Student's distribution law With P- 1 degrees of freedom; the density of this law has the form

where G(x) is the known gamma function:

It is also proved that the random variable

has "distribution % 2 " with P- 1 degrees of freedom (see chapter 7), the density of which is expressed by the formula

Without dwelling on the derivations of distributions (14.4.2) and (14.4.4), we will show how they can be applied when constructing confidence intervals for the parameters Ty D .

Let produced P independent experiments on a random variable x, distributed according to the normal law with unknown parameters TIO. For these parameters, estimates

It is required to construct confidence intervals for both parameters corresponding to the confidence probability p.

Let us first construct a confidence interval for the mathematical expectation. It is natural to take this interval symmetrical with respect to t; denote by s p half the length of the interval. The value of sp must be chosen so that the condition

Let's try to pass on the left side of equality (14.4.5) from a random variable t to a random variable T, distributed according to Student's law. To do this, we multiply both parts of the inequality |m-w?|

to a positive value: or, using the notation (14.4.1),

Let us find a number / p such that the value / p can be found from the condition

It can be seen from formula (14.4.2) that (1) is an even function, so (14.4.8) gives

Equality (14.4.9) determines the value / p depending on p. If you have at your disposal a table of integral values

then the value / p can be found by reverse interpolation in the table. However, it is more convenient to compile a table of values ​​/ p in advance. Such a table is given in the Appendix (Table 5). This table shows the values ​​depending on the confidence probability p and the number of degrees of freedom P- 1. Having determined / p according to the table. 5 and assuming

we find half the width of the confidence interval / p and the interval itself

Example 1. 5 independent experiments were performed on a random variable x, normally distributed with unknown parameters t and about. The results of the experiments are given in table. 14.4.1.

Table 14.4.1

Find an estimate t for the mathematical expectation and construct a 90% confidence interval / p for it (i.e., the interval corresponding to the confidence probability p \u003d 0.9).

Solution. We have:

According to table 5 of the application for P - 1 = 4 and p = 0.9 we find where

The confidence interval will be

Example 2. For the conditions of example 1 of subsection 14.3, assuming the value X normally distributed, find the exact confidence interval.

Solution. According to table 5 of the application, we find at P - 1 = 19ir =

0.8 / p = 1.328; from here

Comparing with the solution of example 1 of subsection 14.3 (e p = 0.072), we see that the discrepancy is very small. If we keep the accuracy to the second decimal place, then the confidence intervals found by the exact and approximate methods are the same:

Let's move on to constructing a confidence interval for the variance. Consider the unbiased variance estimate

and express the random variable D through the value V(14.4.3) having distribution x 2 (14.4.4):

Knowing the distribution law of the quantity V, it is possible to find the interval / (1 ) in which it falls with a given probability p.

distribution law k n _ x (v) the value of I 7 has the form shown in fig. 14.4.1.

Rice. 14.4.1

The question arises: how to choose the interval / p? If the distribution law of the quantity V was symmetric (like a normal law or Student's distribution), it would be natural to take the interval /p symmetric with respect to the mathematical expectation. In this case, the law k n _ x (v) asymmetrical. Let us agree to choose the interval /p so that the probabilities of output of the quantity V outside the interval to the right and left (shaded areas in Fig. 14.4.1) were the same and equal

To construct an interval / p with this property, we use Table. 4 applications: it contains numbers y) such that

for the quantity V, having x 2 -distribution with r degrees of freedom. In our case r = n- 1. Fix r = n- 1 and find in the corresponding line of the table. 4 two values x 2 - one corresponding to a probability the other - probabilities Let us designate these

values at 2 and xl? The interval has y 2 , with his left, and y~ right end.

Now we find the required confidence interval /| for the variance with boundaries D, and D2, which covers the point D with probability p:

Let us construct such an interval / (, = (?> b A), which covers the point D if and only if the value V falls into the interval / r. Let us show that the interval

satisfies this condition. Indeed, the inequalities are equivalent to the inequalities

and these inequalities hold with probability p. Thus, the confidence interval for the dispersion is found and is expressed by the formula (14.4.13).

Example 3. Find the confidence interval for the variance under the conditions of example 2 of subsection 14.3, if it is known that the value X distributed normally.

Solution. We have . According to table 4 of the application

we find at r = n - 1 = 19

According to the formula (14.4.13) we find the confidence interval for the dispersion

Corresponding interval for standard deviation: (0.21; 0.32). This interval only slightly exceeds the interval (0.21; 0.29) obtained in Example 2 of Subsection 14.3 by the approximate method.

  • Figure 14.3.1 considers a confidence interval that is symmetric about a. In general, as we will see later, this is not necessary.
CATEGORIES

POPULAR ARTICLES

2022 "kingad.ru" - ultrasound examination of human organs