Discrete features for constructing variational distribution series. Construction of an interval variation series for continuous quantitative data

Having the data of statistical observation characterizing this or that phenomenon, it is first of all necessary to streamline them, i.e. make it systematic

English statistician. UjReichman said figuratively about unordered aggregates that to be confronted with a mass of ungeneralized data is tantamount to a situation where a person is thrown into the thicket without a compass. What is the systematization of statistical data in the form of distribution series?

The statistical distribution series is an ordered statistical population (Table 17). The simplest kind of statistical distribution series is a ranked series, i.e. a series of numbers in ascending or descending order varying signs. Such a series does not allow us to judge the patterns inherent in the distributed data: which value has the majority of indicators grouped, what are the deviations from this value; as a general distribution pattern. For this purpose, data are grouped, showing how often individual observations occur in their total number (Scheme 1a 1).

. Table 17

. General view of statistical distribution series

. Scheme 1. Scheme of statistical distribution ranks

The distribution of population units according to characteristics that do not have a quantitative expression is called attribute series(for example, the distribution of enterprises according to their production line)

The distribution series of population units according to characteristics, have a quantitative expression, are called variation series. In such series, the value of the feature (options) is in ascending or descending order

In the variation series of distribution, two elements are distinguished: variants and frequency . Option- this is a separate value of the grouping feature frequency- a number that shows how many times each option occurs

In mathematical statistics, one more element of the variational series is calculated - partial. The latter is defined as the ratio of the frequency of cases of a given interval to the total amount of frequencies, the part is determined in fractions of a unit, percent (%) in ppm (% o)

Thus, a variational distribution series is a series in which the options are arranged in ascending or descending order, their frequencies or frequencies are indicated. Variational series are discrete (pererivny) and other intervals (continuous).

. Discrete variation series- these are distribution series in which the variant as the value of a quantitative trait can only take on a certain value. Variants differ from each other by one or more units

So, the number of parts produced per shift by a specific worker can be expressed only by one specific number (6, 10, 12, etc.). An example of a discrete variation series can be the distribution of workers according to the number of parts produced (Table 18-18).

. Table 18

. Discrete distribution range _

. Interval (continuous) variation series- such distribution series in which the value of the options are given as intervals, i.e. feature values ​​can differ from each other by an arbitrarily small amount. When constructing a variational series of NEP, it is impossible to indicate each value of the variants, so the set is distributed over intervals. The latter may or may not be equal. For each of them, frequencies or frequencies are indicated (Table 1 9 19).

In interval distribution series with unequal intervals, mathematical characteristics such as distribution density and relative distribution density in a given interval are calculated. The first characteristic is determined by the ratio of the frequency to the value of the same interval, the second - by the ratio of the frequency to the value of the same interval. For the above example, the distribution density in the first interval will be 3: 5 = 0.6, and the relative density in this interval will be 7.5: 5 = 1.55%.

. Table 19

. Interval distribution series _

The description of changes in a variable attribute is carried out using distribution series.

Statistical distribution series- this is an ordered distribution of units of the statistical population into separate groups according to a certain varying attribute.

Statistical series built on a qualitative basis are called attributive. If the distribution series is based on a quantitative attribute, then the series is variational.

In turn, variational series are divided into discrete and interval. At the core discrete distribution series is a discrete (discontinuous) feature that takes specific numerical values ​​(the number of offenses, the number of citizens applying for legal assistance). interval the distribution series is built on the basis of a continuous feature that can take on any values ​​from a given range (the age of the convict, the term of imprisonment, etc.)

Any statistical distribution series contains two mandatory elements - series and frequency variants. Options (x i) are the individual values ​​of the feature that it takes in the distribution series. Frequencies (fi) are numerical values ​​showing how many times certain options occur in the distribution series. The sum of all frequencies is called the volume of the population.

Frequencies expressed in relative units (fractions or percentages) are called frequencies ( w i). The sum of the frequencies is equal to one if the Frequencies are expressed in fractions of one, or 100 if they are expressed as a percentage. The use of frequencies makes it possible to compare variational series with different population sizes. Frequencies are determined by the following formula:

To build a discrete series, all the individual values ​​of the feature that occur in the series are ranked, and then the repetition frequencies of each value are calculated. A distribution series is drawn up in the idea of ​​a table consisting of two rows and columns, one of which contains the values ​​of the variants of the series x i, in the second - the values ​​of the frequencies fi.

Consider an example of constructing a discrete variational series.

Example 3.1 . According to the Ministry of Internal Affairs registered crimes committed in the city of N minors aged.

17 13 15 16 17 15 15 14 16 13 14 17 14 15 15 16 16 15 14 15 15 14 16 16 14 17 16 15 16 15 13 15 15 13 15 14 15 13 17 14.

Construct a discrete distribution series.

Solution .

First, it is necessary to rank the data on the age of minors, i.e. write them down in ascending order.

13 13 13 13 13 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 16 16 16 16 16 16 16 16 17 17 17 17 17



Table 3.1

Thus, the frequencies reflect the number of people of a given age, for example, 5 people are 13 years old, 8 people are 14 years old, and so on.

Building interval distribution rows are carried out similarly to the implementation of an equal-interval grouping according to a quantitative attribute, that is, first the optimal number of groups into which the set will be divided is determined, the boundaries of the intervals by groups are set and the frequencies are calculated.

Let us illustrate the construction of an interval distribution series using the following example.

Example 3.2 .

Build an interval series for the following statistical population - the salary of a lawyer in the office, thousand rubles:

16,0 22,2 25,1 24,3 30,5 32,0 17,0 23,0 19,8 27,5 22,0 18,9 31,0 21,5 26,0 27,4

Solution.

Let's take the optimal number of equal-interval groups for a given statistical population, equal to 4 (we have 16 options). Therefore, the size of each group is equal to:

and the value of each interval will be equal to:

The boundaries of the intervals are determined by the formulas:

,

where are the lower and upper boundaries of the i-th interval, respectively.

Omitting intermediate calculations of the boundaries of the intervals, we enter their values ​​(options) and the number of lawyers (frequencies) who have salaries within each interval in Table 3.2, which illustrates the resulting interval series.

Table 3.2

Analysis of statistical distribution series can be performed using a graphical method. The graphical representation of the distribution series makes it possible to visually illustrate the patterns of distribution of the studied population by depicting it in the form of a polygon, a histogram and cumulates. Let's take a look at each of these charts.

Polygon is a polyline whose segments connect points with coordinates ( x i;fi). Typically, a polygon is used to display discrete distribution series. To build it, the ranked individual values ​​of the feature are plotted on the x-axis x i, on the y-axis are the frequencies corresponding to these values. As a result, by connecting segments of the points corresponding to the data marked along the abscissa and ordinate axes, a polyline is obtained, called a polygon. Let us give an example of constructing a frequency polygon.

To illustrate the construction of a polygon, let's take the result of solving Example 3.1 for constructing a discrete series - Figure 1. The abscissa shows the age of convicts, the ordinate shows the number of juvenile convicts with a given age. Analyzing this polygon, we can say that the largest number of convicts - 14 people, are 15 years old.

Figure 3.1 - Range of frequencies of a discrete series.

A polygon can also be built for an interval series, in which case the midpoints of the intervals are plotted along the abscissa axis, and the corresponding frequencies are plotted along the ordinate axis.

bar chart– a stepped figure consisting of rectangles, the bases of which are the intervals of the value of the feature, and the heights are equal to the corresponding frequencies. The histogram is used only for displaying interval distribution series. If the intervals are unequal, then to build a histogram on the y-axis, not the frequencies are plotted, but the ratio of the frequency to the width of the corresponding interval. A histogram can be converted into a distribution polygon if the middles of its columns are connected by segments.

To illustrate the construction of a histogram, let's take the results of constructing an interval series from Example 3.2 - Figure 3.2.

Figure 3.2 - Histogram of the distribution of lawyers' salaries.

For a graphical representation of variational series, cumulate is also used. Cumulate is a curve representing a series of accumulated frequencies and connecting points with coordinates ( x i;f i nak). The cumulative frequencies are calculated by successive summation of all frequencies of the distribution series and show the number of population units that have a feature value not greater than the specified one. Let us illustrate the calculation of the accumulated frequencies for the variational interval series presented in example 3.2 - table 3.3.

Table 3.3

To build the cumulate of a discrete distribution series, the ranked individual values ​​of the trait are plotted along the abscissa axis, and the accumulated frequencies corresponding to them are plotted along the ordinate axis. When constructing a cumulative curve of an interval series, the first point will have an abscissa equal to the lower limit of the first interval, and an ordinate equal to 0. All subsequent points must correspond to the upper limit of the intervals. Let's build a cumulate using the data in Table 3.3 - Figure 3.3.

Figure 3.3 - The cumulative distribution curve of lawyers' salaries.

Control questions

1. The concept of a statistical distribution series, its main elements.

2. Types of statistical distribution series. Their brief description.

3. Discrete and interval distribution series.

4. Technique for constructing discrete distribution series.

5. Technique for constructing interval distribution series.

6. Graphical representation of discrete distribution series.

7. Graphical representation of interval distribution series.

Tasks

Task 1. There are the following data on the progress of 25 students of the group in TGP per session: 5, 4, 4, 4, 3, 2, 5, 3, 4, 4, 4, 3, 2, 5, 2, 5, 5, 2, 3 , 3, 5, 4, 2, 3, 3. Construct a discrete variational series of distribution of students according to the scores of assessments received in the session. For the resulting series, calculate Frequencies, Cumulative Frequencies, Cumulative Frequencies. Draw your own conclusions.

Task 2. The colony contains 1000 convicts, their age distribution is presented in the table:

Show this series graphically. Draw your own conclusions.

Task 3. The following data are available on the terms of imprisonment of prisoners:

5; 4; 2; 1; 6; 3; 4; 3; 2; 2; 3; 1; 17; 6; 2; 8; 5; 11; 9; 3; 5; 6; 4; 3; 10; 5; 25; 1; 12; 3; 3; 4; 9; 6; 5; 3; 4; 3; 5; 12; 4; 13; 2; 4; 6; 4; 14; 3; 11; 5; 4; 13; 2; 4; 6; 4; 14; 3; 11; 5; 4; 3; 12; 6.

Build an interval series of the distribution of prisoners by terms of imprisonment. Draw your own conclusions.

Task 4. There is the following data on the distribution of convicts in the region for the study period by age groups:

Draw this series graphically, draw conclusions.

Higher professional education

"RUSSIAN ACADEMY OF PEOPLE'S ECONOMY AND

CIVIL SERVICE UNDER THE PRESIDENT

RUSSIAN FEDERATION"

(Kaluga branch)

Department of Natural Science and Mathematical Disciplines

TEST

Subject "Statistics"

Student ___ Mayboroda Galina Yurievna ______

Correspondence department faculty State and municipal management group G-12-V

Lecturer ____________________ Hamer G.V.

PhD, Associate Professor

Kaluga-2013

Task 1.

Task 1.1. 4

Task 1.2. 16

Task 1.3. 24

Task 1.4. 33

Task 2.

Task 2.1. 43

Task 2.2. 48

Task 2.3. 53

Task 2.4. 58

Task 3.

Task 3.1. 63

Task 3.2. 68

Task 3.3. 73

Task 3.4. 79

Task 4.

Problem 4.1. 85

Task 4.2. 88

Task 4.3. 90

Task 4.4. 93

List of used sources. 96

Task 1.

Task 1.1.

There are the following data on the output and the amount of profit by the enterprises of the region (table 1).

Table 1

Data on the output and the amount of profits by enterprises

company number Output, million rubles Profit, million rubles company number Output, million rubles Profit, million rubles
63,0 6,7 56,0 7,2
48,0 6,2 81,0 9,6
39,0 6,5 55,0 6,3
28,0 3,0 76,0 9,1
72,0 8,2 54,0 6,0
61,0 7,6 53,0 6,4
47,0 5,9 68,0 8,5
37,0 4,2 52,0 6,5
25,0 2,8 44,0 5,0
60,0 7,9 51,0 6,4
46,0 5,5 50,0 5,8
34,0 3,8 65,0 6,7
21,0 2,1 49,0 6,1
58,0 8,0 42,0 4,8
45,0 5,7 32,0 4,6

According to the original data:

1. Build a statistical series of distribution of enterprises by output, forming five groups at equal intervals.

Build distribution series graphs: polygon, histogram, cumulate. Graphically determine the value of mode and median.

2. Calculate the characteristics of a series of distribution of enterprises by output: arithmetic mean, dispersion, standard deviation, coefficient of variation.

Make a conclusion.

3. Using the method of analytical grouping, establish the presence and nature of the correlation between the cost of manufactured products and the amount of profit per enterprise.

4. Measure the tightness of the correlation between the cost of production and the amount of profit by the empirical correlation.

Draw general conclusions.

Solution:

Let's build a statistical series of distribution

To construct an interval variation series that characterizes the distribution of enterprises in terms of output, it is necessary to calculate the value and boundaries of the intervals of the series.

When constructing a series with equal intervals, the value of the interval h is determined by the formula:

x max And x min- the largest and smallest values ​​of the attribute in the studied set of enterprises;

k- number of interval series groups.

Number of groups k specified in the assignment. k= 5.

x max= 81 million rubles, x min= 21 million rubles

Calculation of the interval value:

million rubles

By successively adding the value of the interval h = 12 million rubles. to the lower boundary of the interval, we obtain the following groups:

1 group: 21 - 33 million rubles.

2 group: 33 - 45 million rubles;

Group 3: 45 - 57 million rubles.

Group 4: 57 - 69 million rubles.

Group 5: 69 - 81 million rubles.

To construct an interval series, it is necessary to calculate the number of enterprises included in each group ( group frequencies).

The process of grouping enterprises by output volume is presented in auxiliary table 2. Column 4 of this table is necessary to build an analytical grouping (item 3 of the task).

table 2

Table for constructing an interval distribution series and

analytical grouping

Groups of enterprises by output, million rubles company number Output, million rubles Profit, million rubles
21-33 21,0 2,1
25,0 2,8
28,0 3,0
32,0 4,6
Total 106,0 12,5
33-45 34,0 3,8
37,0 4,2
39,0 6,5
42,0 4,8
44,0 5,0
Total 196,0 24,3
45-57 45,0 5,7
46,0 5,5
47,0 5,9
48,0 6,2
49,0 6,1
50,0 5,8
51,0 6,4
52,0 6,5
53,0 6,4
54,0 6,0
55,0 6,3
56,0 7,2
Total 606,0 74,0
57-69 58,0 8,0
60,0 7,9
61,0 7,6
63,0 6,7
65,0 6,7
68,0 8,5
Total 375,0 45,4
69-81 72,0 8,2
76,0 9,1
81,0 9,6
Total 229,0 26,9
Total 183,1

Based on the group summary rows of the "Total" table 3, a final table 3 is formed, representing the interval series of the distribution of enterprises by output.

Table 3

A number of distribution of enterprises by output volume

Conclusion. The constructed grouping shows that the distribution of enterprises in terms of output is not uniform. The most common enterprises with a production volume of 45 to 57 million rubles. (12 enterprises). The least common are enterprises with output from 69 to 81 million rubles. (3 enterprises).

Let's build graphs of the distribution series.

Polygon often used to represent discrete series. To construct a polygon in a rectangular coordinate system, the values ​​of the argument are plotted on the abscissa axis, i.e. options (for interval variation series, the middle of the interval is taken as an argument) and on the ordinate axis - frequency values. Further, in this coordinate system, points are built, the coordinates of which are pairs of corresponding numbers from the variation series. The resulting points are connected in series by straight line segments. The polygon is shown in Figure 1.

bar chart - bar chart. It allows you to evaluate the symmetry of the distribution. The histogram is shown in Figure 2.

Figure 1 - Polygon distribution of enterprises by volume

output

Fashion

Figure 2 - Histogram of the distribution of enterprises by volume

output

Fashion- the value of the trait that occurs most often in the study population.

For an interval series, the mode can be graphically determined from the histogram (Figure 2). For this, the highest rectangle is selected, which in this case is modal (45–57 million rubles). Then the right vertex of the modal rectangle is connected to the upper right corner of the previous rectangle. And the left vertex of the modal rectangle is with the upper left corner of the subsequent rectangle. Further, from the point of their intersection, a perpendicular is lowered to the abscissa axis. The abscissa of the point of intersection of these lines will be the distribution mode.

Million rub.

Conclusion. In the considered set of enterprises, the most common are enterprises with a product output of 52 million rubles.

Cumulate - broken curve. It is built on the accumulated frequencies (calculated in Table 4). The cumulate starts from the lower boundary of the first interval (21 million rubles), the accumulated frequency is deposited at the upper boundary of the interval. The cumulate is shown in Figure 3.

Median

Figure 3 - Cumulative distribution of enterprises by volume

output

Median Me is the value of the feature that falls in the middle of the ranked series. There are the same number of population units on both sides of the median.

In an interval series, the median can be determined graphically from a cumulative curve. To determine the median from a point on the cumulative frequency scale corresponding to 50% (30:2 = 15), a straight line is drawn parallel to the abscissa axis until it intersects with the cumulate. Then, from the point of intersection of the specified straight line with the cumulate, a perpendicular is lowered to the abscissa axis. The abscissa of the intersection point is the median.

Million rub.

Conclusion. In the considered set of enterprises, half of the enterprises have a production volume of no more than 52 million rubles, and the other half - no less than 52 million rubles.


Similar information.


When constructing an interval distribution series, three questions are solved:

  • 1. How many intervals should I take?
  • 2. What is the length of the intervals?
  • 3. What is the procedure for including population units in the boundaries of the intervals?
  • 1. Number of intervals can be determined by Sturgess formula:

2. Interval length, or interval step, is usually determined by the formula

Where R- range of variation.

3. The order of inclusion of population units in the boundaries of the interval

may be different, but when constructing an interval series, the distribution is necessarily strictly defined.

For example, this: [), in which the units of the population are included in the lower bounds, and not included in the upper bounds, but are transferred to the next interval. The exception to this rule is the last interval , whose upper bound includes the last number of the ranked series.

The boundaries of the intervals are:

  • closed - with two extreme values ​​of the attribute;
  • open - with one extreme value of the feature (before some number or over such a number).

In order to assimilate the theoretical material, we introduce background information for solutions through tasks.

There are conditional data on the average number of sales managers, the number of single-quality goods sold by them, the individual market price for this product, as well as the sales volume of 30 firms in one of the regions of the Russian Federation in the first quarter of the reporting year (Table 2.1).

Table 2.1

Initial information for a cross-cutting task

population

managers

Price, thousand rubles

Sales volume, million rubles

population

managers

Quantity of goods sold, pcs.

Price, thousand rubles

Sales volume, million rubles

Based on the initial information, as well as additional information, we will set up individual tasks. Then we present the methodology for solving them and the solutions themselves.

Cross-cutting task. Task 2.1

Using the original data table. 2.1 required build a discrete series of distribution of firms by the number of goods sold (Table 2.2).

Solution:

Table 2.2

Discrete series of distribution of firms by the number of goods sold in one of the regions of the Russian Federation in the first quarter of the reporting year

Cross-cutting task. Task 2.2

required build a ranked series of 30 firms by the average number of managers.

Solution:

15; 17; 18; 20; 20; 20; 22; 22; 24; 25; 25; 25; 27; 27; 27; 28; 29; 30; 32; 32; 33; 33; 33; 34; 35; 35; 38; 39; 39; 45.

Cross-cutting task. Task 2.3

Using the original data table. 2.1, required:

  • 1. Construct an interval series for the distribution of firms by the number of managers.
  • 2. Calculate the frequencies of the distribution series of firms.
  • 3. Draw conclusions.

Solution:

Calculate using the Sturgess formula (2.5) number of intervals:

Thus, we take 6 intervals (groups).

Interval length, or interval step, calculate by the formula

Note. The order of inclusion of units of the population in the boundaries of the interval is as follows: I), in which the units of the population are included in the lower boundaries, and not included in the upper ones, but are transferred to the next interval. The exception to this rule is the last interval I ], whose upper bound includes the last number of the ranked series.

We build an interval series (Table 2.3).

Interval series of distribution of firms but the average number of managers in one of the regions of the Russian Federation in the first quarter of the reporting year

Conclusion. The most numerous group of firms is the group with an average number of managers of 25-30 people, which includes 8 firms (27%); the smallest group with an average number of managers of 40-45 people includes only one firm (3%).

Using the original data table. 2.1, as well as the interval series of the distribution of firms by the number of managers (Table 2.3), required build an analytical grouping of the relationship between the number of managers and the volume of sales of firms and, based on it, draw a conclusion about the presence (or absence) of a relationship between the indicated signs.

Solution:

Analytical grouping is built on a factor basis. In our problem, the factor sign (x) is the number of managers, and the resultant sign (y) is the sales volume (Table 2.4).

Let's build now analytical grouping(Table 2.5).

Conclusion. Based on the data of the constructed analytical grouping, it can be said that with an increase in the number of sales managers, the average sales volume of the company in the group also increases, which indicates the presence of a direct relationship between these features.

Table 2.4

Auxiliary table for building an analytical grouping

Number of managers, persons,

Company number

Sales volume, million rubles, y

» = 59 f = 9.97

I-™ 4 - Yu.22

74 '25 1PY1

U4 = 7 = 10,61

at = ’ =10,31 30

Table 2.5

Dependence of sales volumes on the number of company managers in one of the regions of the Russian Federation in the first quarter of the reporting year

CONTROL QUESTIONS
  • 1. What is the essence of statistical observation?
  • 2. Name the stages of statistical observation.
  • 3. What are the organizational forms of statistical observation?
  • 4. Name the types of statistical observation.
  • 5. What is a statistical summary?
  • 6. Name the types of statistical reports.
  • 7. What is a statistical grouping?
  • 8. Name the types of statistical groupings.
  • 9. What is a distribution series?
  • 10. Name the structural elements of the distribution series.
  • 11. What is the procedure for constructing a distribution series?

The most important stage in the study of socio-economic phenomena and processes is the systematization of primary data and, on this basis, obtaining a summary characteristic of the entire object using generalizing indicators, which is achieved by summarizing and grouping primary statistical material.

Statistical summary - this is a complex of sequential operations to generalize specific single facts that form a set, to identify typical features and patterns inherent in the phenomenon under study as a whole. Conducting a statistical summary includes the following steps :

  • choice of grouping feature;
  • determination of the order of formation of groups;
  • development of a system of statistical indicators to characterize groups and the object as a whole;
  • development of layouts of statistical tables for presenting summary results.

Statistical grouping called the division of units of the studied population into homogeneous groups according to certain characteristics that are essential for them. Groupings are the most important statistical method of summarizing statistical data, the basis for the correct calculation of statistical indicators.

There are the following types of groupings: typological, structural, analytical. All these groupings are united by the fact that the units of the object are divided into groups according to some attribute.

grouping sign is called the sign by which the units of the population are divided into separate groups. The conclusions of a statistical study depend on the correct choice of a grouping attribute. As a basis for grouping, it is necessary to use significant, theoretically substantiated features (quantitative or qualitative).

Quantitative signs of grouping have a numerical expression (trading volume, age of a person, family income, etc.), and qualitative features of the grouping reflect the state of the population unit (sex, marital status, industry affiliation of the enterprise, its form of ownership, etc.).

After the basis of the grouping is determined, the question of the number of groups into which the study population should be divided should be decided. The number of groups depends on the objectives of the study and the type of indicator underlying the grouping, the volume of the population, the degree of variation of the trait.

For example, the grouping of enterprises according to the forms of ownership takes into account municipal, federal and the property of the subjects of the federation. If the grouping is carried out according to a quantitative attribute, then it is necessary to pay special attention to the number of units of the object under study and the degree of fluctuation of the grouping attribute.

When the number of groups is determined, then the grouping intervals should be determined. Interval - these are the values ​​of a variable characteristic that lie within certain limits. Each interval has its own value, upper and lower limits, or at least one of them.

The lower bound of the interval is called the smallest value of the attribute in the interval, and upper bound - the largest value of the attribute in the interval. The interval value is the difference between the upper and lower limits.

Grouping intervals, depending on their size, are: equal and unequal. If the variation of the trait manifests itself in relatively narrow boundaries and the distribution is uniform, then a grouping is built with equal intervals. The value of an equal interval is determined by the following formula :

where Xmax, Xmin - the maximum and minimum values ​​of the attribute in the aggregate; n is the number of groups.

The simplest grouping, in which each selected group is characterized by one indicator, is a distribution series.

Statistical distribution series - this is an ordered distribution of population units into groups according to a certain attribute. Depending on the trait underlying the formation of a distribution series, attributive and variation distribution series are distinguished.

attributive they call the distribution series built according to qualitative characteristics, that is, signs that do not have a numerical expression (distribution by type of labor, by sex, by profession, etc.). Attribute distribution series characterize the composition of the population according to one or another essential feature. Taken over several periods, these data allow us to study the change in the structure.

Variation rows called distribution series built on a quantitative basis. Any variational series consists of two elements: variants and frequencies. Options the individual values ​​of the attribute that it takes in the variation series are called, that is, the specific value of the variable attribute.

Frequencies called the number of individual variant or each group of the variation series, that is, these are numbers that show how often certain variants occur in the distribution series. The sum of all frequencies determines the size of the entire population, its volume. Frequencies frequencies are called, expressed in fractions of a unit or as a percentage of the total. Accordingly, the sum of the frequencies is equal to 1 or 100%.

Depending on the nature of the variation of the trait, three forms of the variation series are distinguished: a ranked series, a discrete series, and an interval series.

Ranked variation series - this is the distribution of individual units of the population in ascending or descending order of the trait under study. Ranking makes it easy to divide quantitative data into groups, immediately detect the smallest and largest values ​​of a feature, highlight the values ​​that are most often repeated.

Discrete variation series characterizes the distribution of population units according to a discrete attribute that takes only integer values. For example, the tariff category, the number of children in the family, the number of employees in the enterprise, etc.

If a sign has a continuous change, which within certain limits can take on any values ​​("from - to"), then for this sign you need to build interval variation series . For example, the amount of income, work experience, the cost of fixed assets of the enterprise, etc.

Examples of solving problems on the topic "Statistical summary and grouping"

Task 1 . There is information on the number of books received by students by subscription for the past academic year.

Build a ranged and discrete variational distribution series, denoting the elements of the series.

Solution

This set is a set of options for the number of books students receive. Let us count the number of such variants and arrange them in the form of a variational ranked and variational discrete distribution series.

Task 2 . There is data on the value of fixed assets for 50 enterprises, thousand rubles.

Build a distribution series, highlighting 5 groups of enterprises (at equal intervals).

Solution

For the solution, we choose the largest and smallest values ​​of the cost of fixed assets of enterprises. These are 30.0 and 10.2 thousand rubles.

Find the size of the interval: h \u003d (30.0-10.2): 5 \u003d 3.96 thousand rubles.

Then the first group will include enterprises, the amount of fixed assets of which is from 10.2 thousand rubles. up to 10.2 + 3.96 = 14.16 thousand rubles. There will be 9 such enterprises. The second group will include enterprises, the amount of fixed assets of which will be from 14.16 thousand rubles. up to 14.16 + 3.96 = 18.12 thousand rubles. There will be 16 such enterprises. Similarly, we find the number of enterprises included in the third, fourth and fifth groups.

The resulting distribution series is placed in the table.

Task 3 . For a number of light industry enterprises, the following data were obtained:

Make a grouping of enterprises according to the number of workers, forming 6 groups at equal intervals. Count for each group:

1. number of enterprises
2. number of workers
3. volume of manufactured products per year
4. average actual output per worker
5. amount of fixed assets
6. average size of fixed assets of one enterprise
7. average value of manufactured products by one enterprise

Record the results of the calculation in tables. Draw your own conclusions.

Solution

For the solution, we choose the largest and smallest values ​​of the average number of workers in the enterprise. These are 43 and 256.

Find the size of the interval: h = (256-43): 6 = 35.5

Then the first group will include enterprises with an average number of workers ranging from 43 to 43 + 35.5 = 78.5 people. There will be 5 such enterprises. The second group will include enterprises, the average number of workers in which will be from 78.5 to 78.5 + 35.5 = 114 people. There will be 12 such enterprises. Similarly, we find the number of enterprises included in the third, fourth, fifth and sixth groups.

We put the resulting distribution series in a table and calculate the necessary indicators for each group:

Conclusion : As can be seen from the table, the second group of enterprises is the most numerous. It includes 12 enterprises. The smallest are the fifth and sixth groups (two enterprises each). These are the largest enterprises (in terms of the number of workers).

Since the second group is the most numerous, the volume of output per year by the enterprises of this group and the volume of fixed assets are much higher than others. At the same time, the average actual output of one worker at the enterprises of this group is not the highest. The enterprises of the fourth group are in the lead here. This group also accounts for a fairly large amount of fixed assets.

In conclusion, we note that the average size of fixed assets and the average value of the output of one enterprise are directly proportional to the size of the enterprise (in terms of the number of workers).

CATEGORIES

POPULAR ARTICLES

2023 "kingad.ru" - ultrasound examination of human organs