Tabular value of the student's t test. Basic statistics and Student's t-test

When can the Student's t-test be used?

To apply the Student's t-test, it is necessary that the original data have normal distribution. In the case of applying a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) of variances.

If these conditions are not met, when comparing sample means, similar methods should be used. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent samples).

To compare means, Student's t-test is calculated using the following formula:

where M 1- arithmetic mean of the first compared population (group), M 2- arithmetic mean of the second compared population (group), m 1- the average error of the first arithmetic mean, m2- the average error of the second arithmetic mean.

How to interpret the value of Student's t-test?

The resulting value of Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Finding the number of degrees of freedom f according to the following formula:

f \u003d (n 1 + n 2) - 2

After that, we determine the critical value of Student's t-test for the required level of significance (for example, p=0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

If the calculated value of Student's t-test equal or greater critical, found in the table, we conclude that the differences between the compared values ​​are statistically significant.

If the value of the calculated Student's t-test less tabular, which means that the differences between the compared values ​​are not statistically significant.

Student's t-test example

To study the effectiveness of a new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group they received a placebo. After that, the level of hemoglobin in peripheral blood was measured. In the first group, the average hemoglobin level was 115.4±1.2 g/l, and in the second - 103.7±2.3 g/l (data are presented in the format M±m), the compared populations have a normal distribution. The number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Solution: To assess the significance of differences, we use Student's t-test, calculated as the difference between the means divided by the sum of squared errors:

After performing the calculations, the value of the t-test was equal to 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. We compare the obtained value of Student's t-test 4.51 with the critical value at p=0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical value, we conclude that the observed differences are statistically significant (significance level p<0,05).

The Fisher distribution is the distribution of a random variable

where random variables X 1 and X 2 are independent and have chi distributions - the square with the number of degrees of freedom k 1 and k2 respectively. At the same time, a couple (k 1 , k 2) is a pair of "numbers of degrees of freedom" of the Fisher distribution, namely, k 1 is the number of degrees of freedom of the numerator, and k2 is the number of degrees of freedom of the denominator. Distribution of a random variable F named after the great English statistician R. Fisher (1890-1962), who actively used it in his work.

The Fisher distribution is used to test hypotheses about the adequacy of the model in regression analysis, about the equality of variances, and in other problems of applied statistics.

Student's table of critical values.

Form start

Number of degrees of freedom, f Student's t-test value at p=0.05
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
40-41 2.021
42-43 2.018
44-45 2.015
46-47 2.013
48-49 2.011
50-51 2.009
52-53 2.007
54-55 2.005
56-57 2.003
58-59 2.002
60-61 2.000
62-63 1.999
64-65 1.998
66-67 1.997
68-69 1.995
70-71 1.994
72-73 1.993
74-75 1.993
76-77 1.992
78-79 1.991
80-89 1.990
90-99 1.987
100-119 1.984
120-139 1.980
140-159 1.977
160-179 1.975
180-199 1.973
1.972
1.960

​ Student's t-test is a general name for a class of methods for statistical testing of hypotheses (statistical tests) based on the Student's distribution. The most common cases of applying the t-test are related to checking the equality of the means in two samples.

1. History of the development of the t-test

This criterion was developed William Gosset to assess the quality of beer at Guinness. In connection with obligations to the company not to disclose trade secrets, Gosset's article was published in 1908 in the journal Biometrics under the pseudonym "Student" (Student).

2. What is the Student's t-test used for?

Student's t-test is used to determine the statistical significance of mean differences. It can be used both in cases of comparing independent samples ( for example, groups of patients with diabetes mellitus and groups of healthy), and when comparing related sets ( e.g. mean heart rate in the same patients before and after taking an antiarrhythmic drug).

3. When can the Student's t-test be used?

To apply the Student's t-test, it is necessary that the original data have normal distribution. In the case of applying a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) of variances.

If these conditions are not met, when comparing sample means, similar methods should be used. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent samples).

4. How to calculate Student's t-test?

To compare means, Student's t-test is calculated using the following formula:

where M 1- arithmetic mean of the first compared population (group), M 2- arithmetic mean of the second compared population (group), m 1- the average error of the first arithmetic mean, m2- the average error of the second arithmetic mean.

5. How to interpret the value of Student's t-test?

The resulting value of Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Finding the number of degrees of freedom f according to the following formula:

f \u003d (n 1 + n 2) - 2

After that, we determine the critical value of Student's t-test for the required level of significance (for example, p=0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

  • If the calculated value of Student's t-test equal or greater critical, found in the table, we conclude that the differences between the compared values ​​are statistically significant.
  • If the value of the calculated Student's t-test less tabular, which means that the differences between the compared values ​​are not statistically significant.

6. An example of calculating the Student's t-test

To study the effectiveness of a new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group they received a placebo. After that, the level of hemoglobin in peripheral blood was measured. In the first group, the average hemoglobin level was 115.4±1.2 g/l, and in the second - 103.7±2.3 g/l (data are presented in the format M±m), the compared populations have a normal distribution. The number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Solution: To assess the significance of differences, we use Student's t-test, calculated as the difference between the means divided by the sum of squared errors:

After performing the calculations, the value of the t-test was equal to 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. We compare the obtained value of Student's t-test 4.51 with the critical value at p=0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical value, we conclude that the observed differences are statistically significant (significance level p<0,05).

Testing a statistical hypothesis allows you to make a rigorous conclusion about the characteristics of the general population based on sample data. Hypotheses are different. One of them is the hypothesis about the average (mathematical expectation). Its essence is to make a correct conclusion about where the general average may or may not be based on only the available sample (we will never know the exact truth, but we can narrow the search circle).

The general approach to testing hypotheses is described, so straight to the point. Assume first that the sample is drawn from a normal set of random variables X with general average μ and dispersion σ2(I know, I know that this does not happen, but you do not need to interrupt me!). The arithmetic mean of this sample is obviously itself a random variable. If we extract many such samples and calculate the averages for them, then they will also have with the mathematical expectation μ and

Then the random variable

The question arises: will the general mean with a probability of 95% be within ±1.96 s x̅. In other words, are the distributions of random variables

equivalent.

For the first time this question was raised (and solved) by a chemist who worked at the Guinness beer factory in Dublin (Ireland). The chemist's name was William Seeley Gosset, and he took beer samples for chemical analysis. At some point, apparently, William began to have vague doubts about the distribution of averages. It turned out to be a little more spread out than a normal distribution should be.

Having collected a mathematical justification and calculated the values ​​​​of the distribution function he discovered, the Dublin chemist William Gosset wrote a note that was published in the March 1908 issue of the journal Biometrics (editor-in-chief - Karl Pearson). Because Guinness strictly forbade giving out the secrets of brewing, Gosset signed under the pseudonym Student.

Despite the fact that K. Pearson had already invented the distribution, nevertheless, the general idea of ​​normality still dominated. No one was going to think that the distribution of sample estimates might not be normal. Therefore, W. Gosset's article remained practically unnoticed and forgotten. And only Ronald Fisher appreciated Gosset's discovery. Fischer used the new distribution in his work and gave it the name Student's t-distribution. The criterion for testing hypotheses, respectively, became Student's t-test. So there was a "revolution" in statistics, which stepped into the era of analysis of sample data. It was a brief digression into history.

Let's see what W. Gosset could see. Let's generate 20 thousand normal samples from 6 observations with mean ( ) 50 and standard deviation ( σ ) 10. Then we normalize the sample means using general variance:

We group the resulting 20 thousand averages into intervals of 0.1 length and calculate the frequencies. Let us plot the actual (Norm) and theoretical (ENorm) frequency distributions of the sample means on a diagram.

The points (observed frequencies) almost coincide with the line (theoretical frequencies). This is understandable, because the data are taken from the same general population, and the differences are just sampling errors.

Let's do a new experiment. We normalize the averages using sample variance.

Let's count the frequencies again and plot them on the diagram as dots, leaving the line of the standard normal distribution for comparison. Let us denote the empirical frequency of the averages, say, through the letter t.

It can be seen that the distributions this time are not very similar. Close, yes, but not the same. Tails have become more "heavy".

Gosset-Student didn't have the latest version of MS Excel, but that's exactly the effect he noticed. Why is it so? The explanation is that the random variable

depends not only on the sampling error (numerator), but also on the standard error of the mean (denominator), which is also a random variable.

Let's figure out a little what distribution such a random variable should have. First, you have to remember (or learn) something from mathematical statistics. There is such a Fisher theorem, which says that in a sample from a normal distribution:

1. medium and sample variance s2 are independent quantities;

2. The ratio of the sample and general variance, multiplied by the number of degrees of freedom, has a distribution χ 2(chi-square) with the same number of degrees of freedom, i.e.

where k- the number of degrees of freedom (in English degrees of freedom (d.f.))

Many other results in the statistics of normal models are based on this law.

Let's return to the distribution of the mean. Divide the numerator and denominator of the expression

on the σX̅. Get

The numerator is a standard normal random variable (we denote ξ (xi)). The denominator can be expressed from the Fisher theorem.

Then the original expression will take the form

This is in general terms (Student's ratio). It is already possible to derive its distribution function directly, because the distributions of both random variables in this expression are known. Let's leave this pleasure to mathematicians.

The Student's t-distribution function has a formula that is quite difficult to understand, so it makes no sense to parse it. Anyway, no one uses it, because. the probabilities are given in special tables of Student's distribution (sometimes called tables of Student's coefficients), or they are hammered into PC formulas.

So, armed with new knowledge, you will be able to understand the official definition of Student's distribution.
A random variable obeying the Student's distribution with k degrees of freedom is the ratio of independent random variables

where ξ distributed according to the standard normal law, and χ 2 k subject to distribution χ 2 c k degrees of freedom.

Thus, the formula for the Student's criterion for the arithmetic mean

There is a special case of the student relation

It follows from the formula and definition that the distribution of Student's t-test depends only on the number of degrees of freedom.

At k> 30 t-test practically does not differ from the standard normal distribution.

Unlike the chi-square, the t-test can be one- or two-tailed. Usually two-sided is used, assuming that the deviation can occur in both directions from the mean. But if the condition of the problem allows deviation only in one direction, then it is reasonable to apply a one-sided criterion. This slightly increases the power, tk. at a fixed significance level, the critical value slightly approaches zero.

Conditions for applying Student's t-test

Despite the fact that Student's discovery at one time made a revolution in statistics, the t-test is still quite limited in its applicability, because itself comes from the assumption of a normal distribution of the original data. If the data is not normal (which is usually the case), then the t-test will no longer have a Student's distribution. However, due to the operation of the central limit theorem, the mean, even for non-normal data, quickly acquires a bell-shaped distribution.

Consider, for example, data that has a pronounced skew to the right, like a chi-square distribution with 5 degrees of freedom.

Now let's create 20 thousand samples and observe how the distribution of means changes depending on their size.

The difference is quite noticeable in small samples up to 15–20 observations. But then it quickly disappears. Thus, the abnormality of the distribution is, of course, not good, but not critical.

Most of all, the t-criterion is “afraid” of outliers, i.e. abnormal deviations. Let's take 20 thousand normal samples of 15 observations and add one random outlier to some of them.

The picture is unhappy. The actual frequencies of the averages are very different from the theoretical ones. Using the t-distribution in such a situation becomes a very risky undertaking.

So, in not very small samples (from 15 observations), the t-test is relatively resistant to the non-normal distribution of the initial data. But outliers in the data strongly distort the distribution of the t-test, which, in turn, can lead to statistical inference errors, so anomalous observations should be eliminated. Often, all values ​​that fall outside ±2 standard deviations from the mean are removed from the sample.

An example of testing the hypothesis of mathematical expectation using Student's t-test in MS Excel

Excel has several functions related to the t-distribution. Let's consider them.

STUDENT.DIST - "classical" left-sided Student's t-distribution. The input is the value of the t-criterion, the number of degrees of freedom and the option (0 or 1) that determines what needs to be calculated: the density or the value of the function. At the output, we obtain, respectively, the density or the probability that the random variable will be less than the t-criterion specified in the argument.

STUDENT.DIST.2X - two-way distribution. The absolute value (modulo) of the t-criterion and the number of degrees of freedom are given as an argument. At the output, we get the probability of getting this or even more value of the t-criterion, i.e. actual significance level (p-level).

STUDENT.DIST.RH - right-handed t-distribution. So, 1-STUDENT.DIST(2;5;1) = STUDENT.DIST.PX(2;5) = 0.05097. If the t-test is positive, then the resulting probability is p-level.

STUDENT.INV - used to calculate the left-hand reciprocal of the t-distribution. The argument is the probability and the number of degrees of freedom. At the output, we obtain the value of the t-criterion corresponding to this probability. Probability is counted to the left. Therefore, the significance level itself is needed for the left tail α , and for the right 1 - α .

STUDENT.ORD.2X is the reciprocal of the two-tailed Student's distribution, i.e. t-test value (modulo). The significance level is also given as input. α . Only this time, the countdown is from both sides at the same time, so the probability is distributed over two tails. So, STUDENT.OBR (1-0.025; 5) \u003d STUDENT. OBR. 2X (0.05; 5) \u003d 2.57058

STUDENT.TEST is a function for testing the hypothesis about the equality of mathematical expectations in two samples. Replaces a bunch of calculations, because. it is enough to specify only two ranges with data and a couple more parameters. The output is p-level.

STUDENT CONFIDENCE - calculation of the confidence interval of the mean, taking into account the t-distribution.

Let's consider such a training example. The company packs cement in bags of 50 kg. Due to chance, in a single bag, some deviation from the expected mass is allowed, but the general average should remain 50 kg. The quality control department randomly weighed 9 bags and obtained the following results: average weight ( ) amounted to 50.3 kg, the standard deviation ( s) - 0.5 kg.

Is the result consistent with the null hypothesis that the general average is 50kg? In other words, is it possible to get such a result by pure chance, if the equipment works properly and produces an average filling of 50 kg? If the hypothesis is not rejected, then the resulting difference fits into the range of random fluctuations, but if the hypothesis is rejected, then, most likely, a failure has occurred in the settings of the apparatus that fills the bags. It needs to be checked and adjusted.

A brief condition in the generally accepted notation looks like this.

H0: μ = 50 kg

H1: μ ≠ 50 kg

There are reasons to assume that the distribution of bag occupancy follows a normal distribution (or does not differ much from it). So, to test the hypothesis of mathematical expectation, you can use Student's t-test. Random deviations can occur in either direction, so a two-tailed t-test is needed.

First, we apply antediluvian means: manually calculating the t-test and comparing it with a critical table value. Estimated t-test:

Now let's determine whether the resulting number goes beyond the critical level at the significance level α = 0.05. Let's use the Student's t-distribution table (available in any textbook on statistics).

The columns show the probability of the right side of the distribution, the rows show the number of degrees of freedom. We are interested in a two-sided t-test with a significance level of 0.05, which is equivalent to the t-value for half of the significance level on the right: 1 - 0.05 / 2 = 0.975. The number of degrees of freedom is the sample size minus 1, i.e. 9 - 1 = 8. At the intersection, we find the tabular value of the t-test - 2.306. If we used the standard normal distribution, then the critical point would be 1.96, but here it is more, because t-distribution on small samples has a more flattened form.

We compare the actual (1.8) and tabular value (2.306). The calculated criterion turned out to be less than the tabular one. Therefore, the available data do not contradict the H 0 hypothesis that the general average is 50 kg (but do not prove it either). That's all we can find out using the tables. You can, of course, still try to find p-level, but it will be approximate. And, as a rule, p-level is used to test hypotheses. So let's move on to Excel.

There is no ready-made function for calculating the t-test in Excel. But this is not scary, because the Student's t-test formula is quite simple and can be easily built right in an Excel cell.

Got the same 1.8. Let us first find the critical value. We take alpha 0.05, the criterion is two-sided. We need a function of the inverse value of the t-distribution for the two-tailed hypothesis STUDENT.OBR.2X.

The resulting value cuts off the critical region. The observed t-test does not fall into it, so the hypothesis is not rejected.

However, this is the same way of testing a hypothesis with a table value. It will be more informative to calculate the p-level, i.e. the probability of getting the observed or even greater deviation from the mean of 50kg if this hypothesis is correct. You will need a Student's distribution function for the two-tailed hypothesis STUDENT.DIST.2X.

P-level is equal to 0.1096, which is more than the allowable significance level of 0.05 - we do not reject the hypothesis. But now we can judge the degree of evidence. P-level turned out to be quite close to the level when the hypothesis is rejected, and this leads to different thoughts. For example, that the sample was too small to detect a significant deviation.

Suppose after a while the control department again decided to check how the bag fill standard was maintained. This time, for greater reliability, not 9, but 25 bags were selected. It is intuitively clear that the spread of the average will decrease, and, therefore, the chances of finding a failure in the system become greater.

Let's say that the same values ​​of the mean and standard deviation for the sample were obtained as the first time (50.3 and 0.5, respectively). Let's calculate the t-test.


The critical value for 24 degrees of freedom and α = 0.05 is 2.064. The picture below shows that the t-test falls into the area of ​​the hypothesis rejection.

It can be concluded that with a confidence probability of more than 95%, the general average differs from 50 kg. To be more convincing, let's look at p-level (the last line in the table). The probability of obtaining an average with this or even greater deviation from 50, if the hypothesis is correct, is 0.0062, or 0.62%, which is almost impossible with a single measurement. In general, we reject the hypothesis as unlikely.

Calculating a Confidence Interval Using Student's t-Distribution

Another statistical method closely related to hypothesis testing is calculation of confidence intervals. If the value corresponding to the null hypothesis falls within the obtained interval, then this is equivalent to the fact that the null hypothesis is not rejected. Otherwise, the hypothesis is rejected with the appropriate confidence level. In some cases, analysts do not test hypotheses in the classical form at all, but only calculate confidence intervals. This approach allows you to extract even more useful information.

Let's calculate the confidence intervals for the average at 9 and 25 observations. To do this, we will use the Excel function TRUST.STUDENT. Here, oddly enough, everything is quite simple. In the function arguments, you need to specify only the level of significance α , sample standard deviation, and sample size. At the output, we get the half-width of the confidence interval, that is, the value that needs to be set aside on both sides of the average. After doing the calculations and drawing a visual diagram, we get the following.

As can be seen, with a sample of 9 observations, the value of 50 falls within the confidence interval (the hypothesis is not rejected), and with 25 observations it does not fall (the hypothesis is rejected). At the same time, in the experiment with 25 bags, it can be argued that with a probability of 97.5%, the general average exceeds 50.1 kg (the lower limit of the confidence interval is 50.094 kg). And that's pretty valuable information.

Thus, we solved the same problem in three ways:

1. An ancient approach, comparing the calculated and tabular value of the t-criterion
2. More modern, by calculating the p-level, adding a degree of confidence in rejecting the hypothesis.
3. Even more informative by calculating the confidence interval and getting the minimum value of the general average.

It is important to remember that the t-test refers to parametric methods, because based on a normal distribution (it has two parameters: mean and variance). Therefore, for its successful application, at least the approximate normality of the initial data and the absence of outliers are important.

Finally, I propose to watch a video on how to carry out calculations related to Student's t-test in Excel.

Student distribution table

Probability integral tables are used for large samples from an infinitely large population. But already at (n)< 100 получается Несоответствие между

tabular data and limit probability; at (n)< 30 погрешность становится значительной. Несоответствие вызывается главным образом характером распределения единиц генеральной совокупности. При большом объеме выборки особенность распределения в гене-

It does not matter to the general population, since the distribution of deviations of the sample indicator from the general characteristic with a large sample always turns out to be normal.

nym. In samples of small size (n)< 30 характер распределения генеральной совокупности сказывается на распределении ошибок выборки. Поэтому для расчета ошибки выборки при небольшом объеме наблюдения (уже менее 100 единиц) отбор должен проводиться из со-

a population that has a normal distribution. The theory of small samples was developed by the English statistician W. Gosset (who wrote under the pseudonym Student) at the beginning of the 20th century. AT

In 1908, he constructed a special distribution that allows, even with small samples, to correlate (t) and the confidence probability F(t). For (n) > 100, Student distribution tables give the same results as Laplace probability integral tables for 30< (n ) <

100 differences are minor. Therefore, in practice, small samples include samples with a volume of less than 30 units (of course, a sample with a volume of more than 100 units is considered large).

The use of small samples in some cases is due to the nature of the surveyed population. Thus, in breeding work, "pure" experience is easier to achieve on a small number of

plots. The production and economic experiment, associated with economic costs, is also carried out on a small number of trials. As already noted, in the case of a small sample, only for a normally distributed general population can both the confidence probabilities and the confidence limits of the general mean be calculated.

The probability density of Student's distribution is described by a function.

1 + t2

f (t ,n) := Bn

n − 1

t - current variable; n - sample size;

B is a value that depends only on (n).

Student's distribution has only one parameter: (d.f. ) - the number of degrees of freedom (sometimes denoted by (k)). This distribution is, like the normal one, symmetrical with respect to the point (t) = 0, but it is flatter. With an increase in the sample size, and, consequently, the number of degrees of freedom, the Student's distribution quickly approaches normal. The number of degrees of freedom is equal to the number of those individual values ​​of features that need to be

suppose to determine the desired characteristic. So, to calculate the variance, the average value must be known. Therefore, when calculating the dispersion, (d.f.) = n - 1 is used.

Student distribution tables are published in two versions:

1. similarly to the tables of the probability integral, the values ​​( t ) and

cumulative probabilities F(t) for different numbers of degrees of freedom;

2. values ​​(t) are given for the most commonly used confidence probabilities

0.70; 0.75; 0.80; 0.85; 0.90; 0.95 and 0.99 or for 1 - 0.70 = 0.3; 1 - 0.80 = 0.2; …… 1 - 0.99 = 0.01.

3. with different number of degrees of freedom. Such a table is given in the appendix.

(Table 1 - 20), as well as the value (t) - Student's test at a significance level of 0.7

In the course of the example, we will use fictitious information so that the reader can make the necessary transformations on their own.

So, for example, in the course of research, we studied the effect of drug A on the content of substance B (in mmol / g) in tissue C and the concentration of substance D in the blood (in mmol / l) in patients divided according to some criterion E into 3 groups of equal volume (n = 10). The results of this fictitious study are shown in the table:

Substance B content, mmol/g

Substance D, mmol/l

concentration increase


We want to warn you that samples of size 10 are considered by us for ease of presentation of data and calculations; in practice, such a sample size is usually not enough to form a statistical conclusion.

As an example, consider the data of the 1st column of the table.

Descriptive statistics

sample mean

The arithmetic mean, which is very often referred to simply as "average", is obtained by adding all the values ​​and dividing this sum by the number of values ​​in the set. This can be shown using an algebraic formula. A set of n observations of a variable x can be represented as x 1 , x 2 , x 3 , ..., x n

The formula for determining the arithmetic mean of observations (pronounced "X with a dash"):

\u003d (X 1 + X 2 + ... + X n) / n

= (12 + 13 + 14 + 15 + 14 + 13 + 13 + 10 + 11 + 16) / 10 = 13,1;

Sample variance

One way to measure data scatter is to determine how far each observation deviates from the arithmetic mean. Obviously, the greater the deviation, the greater the variability, the variability of observations. However, we cannot use the average of these deviations as a measure of dispersion, because positive deviations compensate for negative deviations (their sum is zero). To solve this problem, we square each deviation and find the average of the squared deviations; this quantity is called variation or dispersion. Take n observations x 1, x 2, x 3, ..., x n, average which equals. We calculate the disper this one, usually referred to ass2,these observations:

The sample variance of this indicator is s 2 = 3.2.

Standard deviation

The standard (root mean square) deviation is the positive square root of the variance. For example, n observations, it looks like this:

We can think of the standard deviation as a sort of mean deviation of the observations from the mean. It is calculated in the same units (dimensions) as the original data.

s = sqrt (s 2) = sqrt (3.2) = 1.79 .

The coefficient of variation

If you divide the standard deviation by the arithmetic mean and express the result as a percentage, you get the coefficient of variation.

CV = (1.79 / 13.1) * 100% = 13.7

Sample mean error

1.79/sqrt(10) = 0.57;

Student's coefficient t (one-sample t-test)

It is used to test the hypothesis about the difference between the mean value and some known value m

The number of degrees of freedom is calculated as f=n-1.

In this case, the confidence interval for the mean is between the limits of 11.87 and 14.39.

For the 95% confidence level, m=11.87 or m=14.39, i.e. = |13.1-11.82| = |13.1-14.38| = 1.28

Accordingly, in this case, for the number of degrees of freedom f = 10 - 1 = 9 and the confidence level of 95% t=2.26.

Dialog Basic Statistics and Tables

In the module Basic statistics and tables choose Descriptive statistics.

A dialog box will open Descriptive statistics.

In field Variables choose Group 1.

Pressing OK, we obtain tables of results with descriptive statistics of the selected variables.

A dialog box will open One-sample t-test.

Suppose we know that the average content of substance B in tissue C is 11.

The results table with descriptive statistics and Student's t-test is as follows:

We had to reject the hypothesis that the average content of substance B in tissue C is 11.

Since the calculated value of the criterion is greater than the tabular one (2.26), the null hypothesis is rejected at the chosen significance level, and the differences between the sample and the known value are recognized as statistically significant. Thus, the conclusion about the existence of differences, made using the Student's criterion, is confirmed using this method.

CATEGORIES

POPULAR ARTICLES

2022 "kingad.ru" - ultrasound examination of human organs