Construct an interval distribution series. Construction of an interval variation series for continuous quantitative data

Grouping- this is the division of a population into groups that are homogeneous according to some characteristic.

Purpose of the service. Using the online calculator you can:

  • build a variation series, build a histogram and polygon;
  • find indicators of variation (average, mode (including graphically), median, range of variation, quartiles, deciles, quartile differentiation coefficient, coefficient of variation and other indicators);

Instructions. To group a series, you must select the type of variation series obtained (discrete or interval) and indicate the amount of data (number of rows). The resulting solution is saved in a Word file (see example of grouping statistical data).

Number of input data
",0);">

If the grouping has already been carried out and the discrete variation series or interval series, then you need to use the online calculator Variation Indices. Testing the hypothesis about the type of distribution is carried out using the service Studying the distribution form.

Types of statistical groupings

Variation series. In the case of observations of a discrete random variable, the same value can be encountered several times. Such values ​​x i of a random variable are recorded indicating n i the number of times it appears in n observations, this is the frequency of this value.
In the case of a continuous random variable, grouping is used in practice.
  1. Typological grouping- this is the division of the qualitatively heterogeneous population under study into classes, socio-economic types, homogeneous groups of units. To build this grouping, use the Discrete variation series parameter.
  2. A grouping is called structural, in which a homogeneous population is divided into groups that characterize its structure according to some varying characteristic. To build this grouping, use the Interval series parameter.
  3. A grouping that reveals the relationships between the phenomena being studied and their characteristics is called analytical group(see analytical grouping of series).

Principles for constructing statistical groupings

A series of observations ordered in ascending order is called a variation series. Grouping feature is a characteristic by which a population is divided into separate groups. It is called the basis of the group. The grouping can be based on both quantitative and qualitative characteristics.
After determining the basis of the grouping, the question of the number of groups into which the population under study should be divided should be decided.

When using personal computers to process statistical data, grouping of object units is carried out using standard procedures.
One such procedure is based on the use of the Sturgess formula to determine the optimal number of groups:

k = 1+3.322*log(N)

Where k is the number of groups, N is the number of population units.

The length of partial intervals is calculated as h=(x max -x min)/k

Then the number of observations that fall into these intervals is counted, which are taken as frequencies n i . Few frequencies, the values ​​of which are less than 5 (n i< 5), следует объединить. в этом случае надо объединить и соответствующие интервалы.
The middle values ​​of the intervals x i =(c i-1 +c i)/2 are taken as new values.

Submitting your good work to the knowledge base is easy. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

TASK1

The following data is available on the wages of employees at the enterprise:

Table 1.1

The amount of wages in conventional terms. den. units

It is required to construct an interval distribution series by which to find;

1) average salary;

2) average linear deviation;

4) standard deviation;

5) range of variation;

6) oscillation coefficient;

7) linear coefficient of variation;

8) simple coefficient of variation;

10) median;

11) asymmetry coefficient;

12) Pearson asymmetry index;

13) kurtosis coefficient.

Solution

As you know, the options (recognized values) are arranged in ascending order to form discrete variation series. With a large number option (more than 10), even in the case of discrete variation, interval series are constructed.

If an interval series is compiled with even intervals, then the range of variation is divided by the specified number of intervals. Moreover, if the resulting value is integer and unambiguous (which is rare), then the length of the interval is assumed to be equal to this number. In other cases produced rounding Necessarily V side increase, So to the last digit left was even. Obviously, as the length of the interval increases, the range of variation by an amount equal to the product of the number of intervals: by the difference between the calculated and initial length of the interval

A) If the magnitude of the expansion of the range of variation is insignificant, then it is either added to the largest or subtracted from the smallest value of the characteristic;

b) If the magnitude of the expansion of the range of variation is noticeable, then in order to avoid confusion of the center of the range, it is approximately divided in half by simultaneously adding to the largest and subtracting from the smallest values ​​of the characteristic.

If an interval series with unequal intervals is compiled, then the process is simplified, but still the length of the intervals must be expressed as a number with the last even digit, which greatly simplifies subsequent calculations of numerical characteristics.

30 is the sample size.

Let's create an interval distribution series using the Sturges formula:

K = 1 + 3.32*log n,

K - number of groups;

K = 1 + 3.32*lg 30 = 5.91=6

We find the range of the attribute - wages of workers at the enterprise - (x) using the formula

R= xmax - xmin and divide by 6; R= 195-112=83

Then the length of the interval will be l lane=83:6=13.83

The beginning of the first interval will be 112. Adding to 112 l ras = 13.83, we get its final value 125.83, which is also the beginning of the second interval, etc. end of the fifth interval - 195.

When finding frequencies, you should be guided by the rule: “if the value of a feature coincides with the boundary of the internal interval, then it should be attributed to the previous interval.”

We obtain an interval series of frequencies and cumulative frequencies.

Table 1.2

Therefore, 3 employees have a salary. fee from 112 to 125.83 conventional monetary units. Highest salary fee from 181.15 to 195 conventional monetary units. only 6 employees.

To calculate numerical characteristics, we transform the interval series into a discrete series, taking the middle of the intervals as an option:

Table 1.3

14131,83

Using the weighted arithmetic mean formula

conventional monetary units

Average linear deviation:

where xi is the value of the characteristic being studied for the i-th unit of the population,

Average value of the studied trait.

Posted on http://www.allbest.ru/

LPosted on http://www.allbest.ru/

Conventional monetary units

Standard deviation:

Dispersion:

Relative range of variation (oscillation coefficient): c= R:,

Relative linear deviation: q = L:

Coefficient of variation: V = y:

The oscillation coefficient shows the relative fluctuation of the extreme values ​​of a characteristic around the arithmetic mean, and the coefficient of variation characterizes the degree and homogeneity of the population.

c= R: = 83 / 159.485*100% = 52.043%

Thus, the difference between the extreme values ​​is 5.16% (=94.84%-100%) less than the average salary of employees at the enterprise.

q = L: = 17.765/ 159.485*100% = 11.139%

V = y: = 21.704/ 159.485*100% = 13.609%

The coefficient of variation is less than 33%, which indicates a weak variation in wages of workers at the enterprise, i.e. that the average value is a typical characteristic of workers’ wages (the population is homogeneous).

In interval distribution series fashion determined by the formula -

Frequency of the modal interval, i.e. the interval containing the largest number of options;

Frequency of the interval preceding the modal;

Frequency of the interval following the modal;

Modal interval length;

The lower limit of the modal interval.

To determine medians in the interval series we use the formula

where is the cumulative (accumulated) frequency of the interval preceding the median;

Lower limit of the median interval;

Median interval frequency;

Length of the median interval.

Median interval- an interval whose accumulated frequency (=3+3+5+7) exceeds half the sum of frequencies - (153.49; 167.32).

Let's calculate asymmetry and kurtosis, for which we will create a new worksheet:

Table 1.4

Factual data

Calculation data

Let's calculate the third order moment

Therefore, the asymmetry is equal to

Since 0.3553 0.25, the asymmetry is considered significant.

Let's calculate the fourth order moment

Therefore, the kurtosis is equal to

Because< 0, то эксцесс является плосковершинным.

The degree of asymmetry can be determined using the Pearson asymmetry coefficient (As): oscillation sample value turnover

where is the arithmetic mean of the distribution series; -- fashion; -- standard deviation.

With a symmetric (normal) distribution = Mo, therefore, the asymmetry coefficient is zero. If As > 0, then there is more mode, therefore, there is a right-handed asymmetry.

If As< 0, то меньше моды, следовательно, имеется левосторонняя асимметрия. Коэффициент асимметрии может изменяться от -3 до +3.

The distribution is not symmetrical, but has left-sided asymmetry.

TASK 2

What should the sample size be so that with probability 0.954 the sampling error does not exceed 0.04 if, based on previous surveys, the variance is known to be 0.24?

Solution

The sample size for non-repetitive sampling is calculated using the formula:

t - confidence coefficient (with a probability of 0.954 it is equal to 2.0; determined from tables of probability integrals),

y2=0.24 - standard deviation;

10,000 people - sample size;

Dx =0.04 - maximum error of the sample mean.

With a probability of 95.4%, it can be stated that the sample size, ensuring a relative error of no more than 0.04, should be at least 566 families.

TASK3

The following data is available on income from the main activities of the enterprise, million rubles.

To analyze a series of dynamics, determine the following indicators:

1) chain and basic:

Absolute increases;

Growth rate;

Growth rate;

2) average

Dynamics row level;

Absolute increase;

Growth rate;

Rate of increase;

3) absolute value of 1% increase.

Solution

1. Absolute increase (Dy)- this is the difference between the next level of the series and the previous (or basic):

chain: DN = yi - yi-1,

basic: DN = yi - y0,

уi - row level,

i - row level number,

y0 - base year level.

2. Growth rate (Tu) is the ratio of the subsequent level of the series and the previous one (or base year 2001):

chain: Tu = ;

basic: Tu =

3. Growth rate (TD) is the ratio of absolute growth to the previous level, expressed in %.

chain: Tu = ;

basic: Tu =

4. Absolute value of 1% increase (A)- this is the ratio of chain absolute growth to the growth rate, expressed in %.

A =

Average row level calculated using the arithmetic mean formula.

Average level of income from core activities for 4 years:

Average absolute increase calculated by the formula:

where n is the number of levels of the series.

On average, for the year, income from core activities increased by 3.333 million rubles.

Average annual growth rate calculated using the geometric mean formula:

уn is the final level of the row,

y0 is the initial level of the series.

Tu = 100% = 102.174%

Average annual growth rate calculated by the formula:

T? = Tu - 100% = 102.74% - 100% = 2.74%.

Thus, on average over the year, income from the main activities of the enterprise increased by 2.74%.

TASKSA4

Calculate:

1. Individual price indices;

2. General trade turnover index;

3. Aggregate price index;

4. Aggregate index of the physical volume of sales of goods;

5. Break down the absolute increase in the value of trade turnover by factors (due to changes in prices and the number of goods sold);

6. Draw brief conclusions on all obtained indicators.

Solution

1. According to the condition, individual price indices for products A, B, C amounted to -

ipA=1.20; iрБ=1.15; iрВ=1.00.

2. We will calculate the general trade turnover index using the formula:

I w = = 1470/1045*100% = 140.67%

Trade turnover increased by 40.67% (140.67%-100%).

On average, commodity prices increased by 10.24%.

The amount of additional costs of buyers from price increases:

w(p) = ? p1q1 - ? p0q1 = 1470 - 1333.478= 136.522 million rubles.

As a result of rising prices, buyers had to spend an additional 136.522 million rubles.

4. General index of physical volume of trade turnover:

The physical volume of trade turnover increased by 27.61%.

5. Let’s determine the overall change in trade turnover in the second period compared to the first period:

w = 1470-1045 = 425 million rubles.

due to price changes:

W(p) = 1470 - 1333.478 = 136.522 million rubles.

due to changes in physical volume:

w(q) = 1333.478 - 1045 = 288.478 million rubles.

The turnover of goods increased by 40.67%. Prices on average for 3 goods increased by 10.24%. The physical volume of trade turnover increased by 27.61%.

In general, sales volume increased by 425 million rubles, including due to rising prices it increased by 136.522 million rubles, and due to an increase in sales volumes - by 288.478 million rubles.

TASK5

The following data is available for 10 factories in one industry.

Plant number

Product output, thousand pcs. (X)

Based on the given data:

I) to confirm the provisions of logical analysis about the presence of a linear correlation between the factor characteristic (volume of output) and the resultant characteristic (electricity consumption), plot the initial data on the graph of the correlation field and draw conclusions about the form of the relationship, indicate its formula;

2) determine the parameters of the connection equation and plot the resulting theoretical line on the graph of the correlation field;

3) calculate the linear correlation coefficient,

4) explain the meaning of the indicators obtained in paragraphs 2) and 3);

5) using the resulting model, make a forecast about the possible energy consumption at a plant with a production volume of 4.5 thousand units.

Solution

The data of the attribute - the volume of production (factor), will be denoted by xi; sign - electricity consumption (result) through yi; points with coordinates (x, y) are plotted on the correlation field OXY.

The points of the correlation field are located along a certain straight line. Therefore, the relationship is linear; we will look for a regression equation in the form of a straight line Уx=ax+b. To find it, we use the system of normal equations:

Let's create a calculation table.

Using the averages found, we compose a system and solve it with respect to parameters a and b:

So, we get the regression equation for y on x: = 3.57692 x + 3.19231

We build a regression line on the correlation field.

Substituting the x values ​​from column 2 into the regression equation, we obtain the calculated ones (column 7) and compare them with the y data, which is reflected in column 8. By the way, the correctness of the calculations is confirmed by the coincidence of the average values ​​of y and.

Coefficientlinear correlation evaluates the closeness of the relationship between characteristics x and y and is calculated using the formula

The angular coefficient of direct regression a (at x) characterizes the direction of the identifieddependenciessigns: for a>0 they are the same, for a<0- противоположны. Its absolute value - a measure of change in the resultant characteristic when the factor characteristic changes by a unit of measurement.

The free term of the direct regression reveals the direction, and its absolute value is a quantitative measure of the influence of all other factors on the resultant sign.

If< 0, then the resource of the factor characteristic of an individual object is used with less, and when>0 Withgreater efficiency than the average for the entire set of objects.

Let's conduct a post-regression analysis.

The coefficient for x of the direct regression is equal to 3.57692 >0, therefore, with an increase (decrease) in production output, electricity consumption increases (decreases). Increase in production output by 1 thousand units. gives an average increase in electricity consumption by 3.57692 thousand kWh.

2. The free term of the direct regression is equal to 3.19231, therefore, the influence of other factors increases the impact of product output on electricity consumption in absolute terms by 3.19231 thousand kWh.

3. The correlation coefficient of 0.8235 reveals a very close dependence of electricity consumption on product output.

It is easy to make predictions using the regression model equation. To do this, the values ​​of x - the volume of production - are substituted into the regression equation and electricity consumption is predicted. In this case, the values ​​of x can be taken not only within a given range, but also outside it.

Let's make a forecast about the possible energy consumption at a plant with a production volume of 4.5 thousand units.

3.57692*4.5 + 3.19231= 19.288 45 thousand kWh.

LIST OF SOURCES USED

1. Zakharenkov S.N. Socio-economic statistics: Textbook and practical guide. -Mn.: BSEU, 2002.

2. Efimova M.R., Petrova E.V., Rumyantsev V.N. General theory of statistics. - M.: INFRA - M., 2000.

3. Eliseeva I.I. Statistics. - M.: Prospekt, 2002.

4. General theory of statistics / Under general. ed. O.E. Bashina, A.A. Spirina. - M.: Finance and Statistics, 2000.

5. Socio-economic statistics: Educational and practical. allowance / Zakharenkov S.N. and others - Mn.: Yerevan State University, 2004.

6. Socio-economic statistics: Textbook. allowance. / Ed. Nesterovich S.R. - Mn.: BSEU, 2003.

7. Teslyuk I.E., Tarlovskaya V.A., Terlizhenko N. Statistics. - Minsk, 2000.

8. Kharchenko L.P. Statistics. - M.: INFRA - M, 2002.

9. Kharchenko L.P., Dolzhenkova V.G., Ionin V.G. Statistics. - M.: INFRA - M, 1999.

10. Economic statistics / Ed. Yu.N. Ivanova - M., 2000.

Posted on Allbest.ru

...

Similar documents

    Calculation of the arithmetic mean for an interval distribution series. Determination of the general index of physical volume of trade turnover. Analysis of the absolute change in the total cost of production due to changes in physical volume. Calculation of the coefficient of variation.

    test, added 07/19/2010

    The essence of wholesale, retail and public trade. Formulas for calculating individual and aggregate turnover indices. Calculation of characteristics of an interval distribution series - arithmetic mean, mode and median, coefficient of variation.

    course work, added 05/10/2013

    Calculation of planned and actual sales volume, percentage of plan fulfillment, absolute change in turnover. Determination of absolute growth, average growth rates and increase in cash income. Calculation of structural averages: modes, medians, quartiles.

    test, added 02/24/2012

    Interval series of distribution of banks by profit volume. Finding the mode and median of the resulting interval distribution series using a graphical method and by calculations. Calculation of characteristics of interval distribution series. Calculation of the arithmetic mean.

    test, added 12/15/2010

    Formulas for determining the average values ​​of an interval series - modes, medians, dispersion. Calculation of analytical indicators of dynamics series using chain and basic schemes, growth rates and increments. The concept of a consolidated index of costs, prices, expenses and turnover.

    course work, added 02/27/2011

    Concept and purpose, order and rules for constructing a variation series. Analysis of data homogeneity in groups. Indicators of variation (fluctuation) of a trait. Determination of average linear and square deviation, coefficient of oscillation and variation.

    test, added 04/26/2010

    The concept of mode and median as typical characteristics, the order and criteria for their determination. Finding the mode and median in discrete and interval variation series. Quartiles and deciles as additional characteristics of a variation statistical series.

    test, added 09/11/2010

    Construction of an interval distribution series based on grouping characteristics. Characteristics of the deviation of the frequency distribution from a symmetrical shape, calculation of kurtosis and asymmetry indicators. Analysis of balance sheet or income statement indicators.

    test, added 10/19/2014

    Converting empirical series into discrete and interval ones. Determination of the average value for a discrete series using its properties. Calculation using a discrete series of mode, median, variation indicators (dispersion, deviation, oscillation coefficient).

    test, added 04/17/2011

    Construction of a statistical series of distribution of organizations. Graphical determination of the mode and median values. The closeness of the correlation using the coefficient of determination. Determining the sampling error of the average number of employees.

If the random variable under study is continuous, then ranking and grouping of observed values ​​often does not allow identifying the characteristic features of variation in its values. This is explained by the fact that individual values ​​of a random variable can differ from each other as little as desired, and therefore, in the totality of observed data, identical values ​​of a quantity can rarely occur, and the frequencies of variants differ little from each other.

It is also impractical to construct a discrete series for a discrete random variable, the number of possible values ​​of which is large. In such cases, you should build interval variation series distributions.

To construct such a series, the entire interval of variation of the observed values ​​of a random variable is divided into a series partial intervals and counting the frequency of occurrence of the value values ​​in each partial interval.

Interval variation series call an ordered set of intervals of varying values ​​of a random variable with corresponding frequencies or relative frequencies of values ​​of the variable falling into each of them.

To build an interval series you need:

  1. define size partial intervals;
  2. define width intervals;
  3. set it for each interval top And lower limit ;
  4. group the observation results.

1 . The question of choosing the number and width of grouping intervals has to be decided in each specific case based on goals research, volume samples and degree of variation characteristic in the sample.

Approximately number of intervals k can be estimated based only on sample size n in one of the following ways:

  • according to the formula Sturges : k = 1 + 3.32 log n ;
  • using table 1.

Table 1

2 . Spaces of equal width are generally preferred. To determine the width of intervals h calculate:

  • range of variation R - sample values: R = x max - x min ,

Where xmax And xmin - maximum and minimum sampling options;

  • width of each interval h determined by the following formula: h = R/k .

3 . Lower limit first interval x h1 is selected so that the minimum sample option xmin fell approximately in the middle of this interval: x h1 = x min - 0.5 h .

Intermediate intervals obtained by adding the length of the partial interval to the end of the previous interval h :

x hi = x hi-1 +h.

The construction of an interval scale based on the calculation of interval boundaries continues until the value x hi satisfies the relation:

x hi< x max + 0,5·h .

4 . In accordance with the interval scale, the characteristic values ​​are grouped - for each partial interval the sum of frequencies is calculated n i option included in i th interval. In this case, the interval includes values ​​of the random variable that are greater than or equal to the lower limit and less than the upper limit of the interval.

Polygon and histogram

For clarity, various statistical distribution graphs are constructed.

Based on the data of a discrete variation series, they construct polygon frequencies or relative frequencies.

Frequency polygon x 1 ; n 1 ), (x 2 ; n 2 ), ..., (x k ; n k ). To construct a frequency polygon, options are plotted on the abscissa axis. x i , and on the ordinate - the corresponding frequencies n i . Points ( x i ; n i ) are connected by straight segments and a frequency polygon is obtained (Fig. 1).

Polygon of relative frequencies called a broken line whose segments connect points ( x 1 ; W 1 ), (x 2 ; W 2 ), ..., (x k ; Wk ). To construct a polygon of relative frequencies, options are plotted on the abscissa axis x i , and on the ordinate - the corresponding relative frequencies W i . Points ( x i ; W i ) are connected by straight segments and a polygon of relative frequencies is obtained.

In case continuous sign it is advisable to build histogram .

Frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h , and the heights are equal to the ratio n i/h (frequency density).

To construct a frequency histogram, partial intervals are laid out on the abscissa axis, and segments parallel to the abscissa axis are drawn above them at a distance n i/h .

When constructing an interval distribution series, three questions are resolved:

  • 1. How many intervals should I take?
  • 2. What is the length of the intervals?
  • 3. What is the procedure for including population units within the boundaries of intervals?
  • 1. Number of intervals can be determined by Sturgess formula:

2. Interval length, or interval step, usually determined by the formula

Where R- range of variation.

3. The order of inclusion of population units within the boundaries of the interval

may be different, but when constructing an interval series, the distribution must be strictly defined.

For example, this: [), in which population units are included in the lower boundaries, but are not included in the upper boundaries, but are transferred to the next interval. The exception to this rule is the last interval, the upper limit of which includes the last number of the ranked series.

The interval boundaries are:

  • closed - with two extreme values ​​of the attribute;
  • open - with one extreme value of the attribute (to such and such a number or over such and such a number).

In order to assimilate the theoretical material, we introduce background information to solve end-to-end task.

There are conditional data on the average number of sales managers, the quantity of similar goods sold by them, the individual market price for this product, as well as the sales volume of 30 companies in one of the regions of the Russian Federation in the first quarter of the reporting year (Table 2.1).

Table 2.1

Initial information for a cross-cutting task

Number

managers,

Price, thousand rubles

Sales volume, million rubles.

Number

managers,

Quantity of goods sold, pcs.

Price, thousand rubles

Sales volume, million rubles.

Based on the initial information, as well as additional information, we will set up individual tasks. Then we will present the methodology for solving them and the solutions themselves.

Cross-cutting task. Task 2.1

Using the initial data from table. 2.1 required construct a discrete series of distribution of firms by quantity of goods sold (Table 2.2).

Solution:

Table 2.2

Discrete series of distribution of firms by quantity of goods sold in one of the regions of the Russian Federation in the first quarter of the reporting year

Cross-cutting task. Task 2.2

required construct a ranked series of 30 firms according to the average number of managers.

Solution:

15; 17; 18; 20; 20; 20; 22; 22; 24; 25; 25; 25; 27; 27; 27; 28; 29; 30; 32; 32; 33; 33; 33; 34; 35; 35; 38; 39; 39; 45.

Cross-cutting task. Task 2.3

Using the initial data from table. 2.1, required:

  • 1. Construct an interval series of distribution of firms by number of managers.
  • 2. Calculate the frequencies of the distribution series of firms.
  • 3. Draw conclusions.

Solution:

Let's calculate using the Sturgess formula (2.5) number of intervals:

Thus, we take 6 intervals (groups).

Interval length, or interval step, calculate using the formula

Note. The order of inclusion of population units in the boundaries of the interval is as follows: I), in which population units are included in the lower boundaries, but are not included in the upper boundaries, but are transferred to the next interval. The exception to this rule is the last interval I ], the upper limit of which includes the last number of the ranked series.

We build an interval series (Table 2.3).

Interval series of distribution of firms and the average number of managers in one of the regions of the Russian Federation in the first quarter of the reporting year

Conclusion. The largest group of firms is the group with an average number of managers of 25-30 people, which includes 8 firms (27%); The smallest group with an average number of managers of 40-45 people includes only one company (3%).

Using the initial data from table. 2.1, as well as an interval series of distribution of firms by number of managers (Table 2.3), required build an analytical grouping of the relationship between the number of managers and the sales volume of firms and, based on it, draw a conclusion about the presence (or absence) of a relationship between these characteristics.

Solution:

Analytical grouping is based on factor characteristics. In our problem, the factor characteristic (x) is the number of managers, and the resultant characteristic (y) is the sales volume (Table 2.4).

Let's build now analytical grouping(Table 2.5).

Conclusion. Based on the data of the constructed analytical grouping, we can say that with an increase in the number of sales managers, the average sales volume of the company in the group also increases, which indicates the presence of a direct connection between these characteristics.

Table 2.4

Auxiliary table for constructing an analytical grouping

Number of managers, people,

Company number

Sales volume, million rubles, y

" = 59 f = 9.97

I-™ 4 - Yu.22

74 '25 1PY1

U4 = 7 = 10,61

at = ’ =10,31 30

Table 2.5

Dependence of sales volumes on the number of company managers in one of the regions of the Russian Federation in the first quarter of the reporting year

TEST QUESTIONS
  • 1. What is the essence of statistical observation?
  • 2. Name the stages of statistical observation.
  • 3. What are the organizational forms of statistical observation?
  • 4. Name the types of statistical observation.
  • 5. What is a statistical summary?
  • 6. Name the types of statistical reports.
  • 7. What is statistical grouping?
  • 8. Name the types of statistical groupings.
  • 9. What is a distribution series?
  • 10. Name the structural elements of the distribution row.
  • 11. What is the procedure for constructing a distribution series?

They are presented in the form of distribution series and are presented in the form.

A distribution series is one of the types of groupings.

Distribution range— represents an ordered distribution of units of the population being studied into groups according to a certain varying characteristic.

Depending on the characteristic underlying the formation of the distribution series, they are distinguished attributive and variational distribution rows:

  • Attributive— are called distribution series constructed according to qualitative characteristics.
  • Distribution series constructed in ascending or descending order of values ​​of a quantitative characteristic are called variational.
The distribution variation series consists of two columns:

The first column provides quantitative values ​​of the varying characteristic, which are called options and are designated . Discrete option - expressed as an integer. The interval option ranges from and to. Depending on the type of options, you can construct a discrete or interval variation series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers that show how many times a given value of a feature occurs in total, which denote . The sum of all frequencies must be equal to the number of units in the entire population.

Frequencies() are frequencies expressed as a percentage of the total. The sum of all frequencies expressed as percentages must be equal to 100% in fractions of one.

Graphic representation of distribution series

The distribution series are visually presented using graphical images.

The distribution series are depicted as:
  • Polygon
  • Histograms
  • Cumulates
  • Ogives

Polygon

When constructing a polygon, the values ​​of the varying characteristic are plotted on the horizontal axis (x-axis), and frequencies or frequencies are plotted on the vertical axis (y-axis).

The polygon in Fig. 6.1 is based on data from the micro-census of the population of Russia in 1994.

6.1. Household size distribution

Condition: Data is provided on the distribution of 25 employees of one of the enterprises according to tariff categories:
4; 2; 4; 6; 5; 6; 4; 1; 3; 1; 2; 5; 2; 6; 3; 1; 2; 3; 4; 5; 4; 6; 2; 3; 4
Task: Construct a discrete variation series and depict it graphically as a distribution polygon.
Solution:
In this example, the options are the employee's pay grade. To determine frequencies, it is necessary to calculate the number of employees with the corresponding tariff category.

The polygon is used for discrete variation series.

To construct a distribution polygon (Fig. 1), we plot the quantitative values ​​of the varying characteristic—options—on the abscissa (X) axis, and frequencies or frequencies on the ordinate axis.

If the values ​​of a characteristic are expressed in the form of intervals, then such a series is called interval.
Interval series distributions are depicted graphically in the form of a histogram, cumulate or ogive.

Statistical table

Condition: Data are provided on the size of deposits of 20 individuals in one bank (thousand rubles) 60; 25; 12; 10; 68; 35; 2; 17; 51; 9; 3; 130; 24; 85; 100; 152; 6; 18; 7; 42.
Task: Construct an interval variation series with equal intervals.
Solution:

  1. The initial population consists of 20 units (N = 20).
  2. Using the Sturgess formula, we determine the required number of groups used: n=1+3.322*lg20=5
  3. Let's calculate the value of the equal interval: i=(152 - 2) /5 = 30 thousand rubles
  4. Let's divide the initial population into 5 groups with an interval of 30 thousand rubles.
  5. We present the grouping results in the table:

With such a recording of a continuous characteristic, when the same value occurs twice (as the upper limit of one interval and the lower limit of another interval), then this value belongs to the group where this value acts as the upper limit.

Histogram

To construct a histogram, the values ​​of the boundaries of the intervals are indicated on the abscissa axis and, based on them, rectangles are constructed, the height of which is proportional to the frequencies (or frequencies).

In Fig. 6.2. shows a histogram of the distribution of the Russian population in 1997 by age group.

Rice. 6.2. Distribution of the Russian population by age groups

Condition: The distribution of 30 employees of the company by monthly salary is given

Task: Display the interval variation series graphically in the form of a histogram and cumulate.
Solution:

  1. The unknown boundary of the open (first) interval is determined by the value of the second interval: 7000 - 5000 = 2000 rubles. With the same value we find the lower limit of the first interval: 5000 - 2000 = 3000 rubles.
  2. To construct a histogram in a rectangular coordinate system, we plot along the abscissa axis the segments whose values ​​correspond to the intervals of the varicose series.
    These segments serve as the lower base, and the corresponding frequency (frequency) serves as the height of the formed rectangles.
  3. Let's build a histogram:

To construct cumulates, it is necessary to calculate the accumulated frequencies (frequencies). They are determined by sequentially summing the frequencies (frequencies) of previous intervals and are designated S. The accumulated frequencies show how many units of the population have a characteristic value no greater than the one under consideration.

Cumulates

The distribution of a characteristic in a variation series over accumulated frequencies (frequencies) is depicted using a cumulate.

Cumulates or a cumulative curve, unlike a polygon, is constructed from accumulated frequencies or frequencies. In this case, the values ​​of the characteristic are placed on the abscissa axis, and accumulated frequencies or frequencies are placed on the ordinate axis (Fig. 6.3).

Rice. 6.3. Cumulates of household size distribution

4. Let's calculate the accumulated frequencies:
The cumulative frequency of the first interval is calculated as follows: 0 + 4 = 4, for the second: 4 + 12 = 16; for the third: 4 + 12 + 8 = 24, etc.

When constructing a cumulate, the accumulated frequency (frequency) of the corresponding interval is assigned to its upper limit:

Ogiva

Ogiva is constructed similarly to a cumulate with the only difference being that the accumulated frequencies are placed on the abscissa axis, and the characteristic values ​​are placed on the ordinate axis.

A type of cumulate is a concentration curve or Lorentz plot. To construct a concentration curve, a scale scale in percentages from 0 to 100 is plotted on both axes of the rectangular coordinate system. At the same time, the accumulated frequencies are indicated on the abscissa axis, and the accumulated values ​​of the share (in percent) by volume of the characteristic are indicated on the ordinate axis.

The uniform distribution of the characteristic corresponds to the diagonal of the square on the graph (Fig. 6.4). With an uneven distribution, the graph represents a concave curve depending on the level of concentration of the trait.

6.4. Concentration curve

CATEGORIES

POPULAR ARTICLES

2024 “kingad.ru” - ultrasound examination of human organs