Data analysis using the least squares method. Least squares method in Excel

Least square method

In the final lesson of the topic, we will get acquainted with the most famous application FNP, which finds the widest application in various fields of science and practical activity. This could be physics, chemistry, biology, economics, sociology, psychology, and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a trip to an amazing country called Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! ...But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY QUICKLY ;-) But first general statement of the problem+ accompanying example:

Let us study indicators in a certain subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be either a scientific hypothesis or based on basic common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Let's denote by:

– retail area of a grocery store, sq.m.,
– annual turnover of a grocery store, million rubles.

It is absolutely clear that the larger the store area, the greater in most cases its turnover will be.

Suppose that after carrying out observations/experiments/calculations/dances with a tambourine we have numerical data at our disposal:

With grocery stores, I think everything is clear: - this is the area of the 1st store, - its annual turnover, - the area of the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of trade turnover can be obtained by means of mathematical statistics. However, let’s not get distracted, the commercial espionage course is already paid =)

Tabular data can also be written in the form of points and depicted in the familiar form Cartesian system .

Let's answer an important question: How many points are needed for a qualitative study?

The bigger, the better. The minimum acceptable set consists of 5-6 points. In addition, when the amount of data is small, “anomalous” results cannot be included in the sample. So, for example, a small elite store can earn orders of magnitude more than “its colleagues,” thereby distorting the general pattern that you need to find!

To put it very simply, we need to select a function, schedule which passes as close as possible to the points . This function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general terms. Let some function approximate experimental data:

How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or collapsed: (in case anyone doesn't know: is the sum icon, and – an auxiliary “counter” variable, which takes values from 1 to ) .

By approximating experimental points with different functions, we will obtain different values, and obviously, where this sum is smaller, that function is more accurate.

Such a method exists and it is called least modulus method. However, in practice it has become much more widespread least square method, in which possible negative values are eliminated not by the module, but by squaring the deviations:

, after which efforts are aimed at selecting a function such that the sum of squared deviations was as small as possible. Actually, this is where the name of the method comes from.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic , exponential , logarithmic , quadratic etc. And, of course, here I would immediately like to “reduce the field of activity.” Which class of functions should I choose for research? A primitive but effective technique:

– The easiest way is to depict points on the drawing and analyze their location. If they tend to run in a straight line, then you should look for equation of a line with optimal values and . In other words, the task is to find SUCH coefficients so that the sum of squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is obviously clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation - those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

Let's remember our example: suppose that “store” points tend to be located in a straight line and there is every reason to believe that linear dependence turnover from retail space. Let's find SUCH coefficients “a” and “be” such that the sum of squared deviations was the smallest. Everything is as usual - first 1st order partial derivatives. According to linearity rule You can differentiate right under the sum icon:

If you want to use this information for an essay or term paper, I will be very grateful for the link in the list of sources; you will find such detailed calculations in few places:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why “a” and “be” can be taken out beyond the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

Do we know the coordinates of the points? We know. Amounts can we find it? Easily. Let's make the simplest system of two linear equations in two unknowns(“a” and “be”). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for an extremum, we can verify that at this point the function reaches exactly minimum. The check involves additional calculations and therefore we will leave it behind the scenes (if necessary, the missing frame can be viewedHere ) . We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In our example situation, Eq. allows you to predict what trade turnover ("Igrek") the store will have at one or another value of the sales area (one or another meaning of “x”). Yes, the resulting forecast will only be a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with “real” numbers, since there are no difficulties in it - all calculations are at the level of the 7th-8th grade school curriculum. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations of the optimal hyperbola, exponential and some other functions.

In fact, all that remains is to distribute the promised goodies - so that you can learn to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which to construct experimental points and a graph of the approximating function in a Cartesian rectangular coordinate system . Find the sum of squared deviations between the empirical and theoretical values. Find out if the feature would be better (from the point of view of the least squares method) bring experimental points closer.

Please note that the “x” meanings are natural, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can also be fractional. In addition, depending on the content of a particular task, both “X” and “game” values can be completely or partially negative. Well, we have been given a “faceless” task, and we begin it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purpose of more compact recording, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in tabular form:

Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not a gift, and in such cases it saves Cramer's method:
, which means the system has a unique solution.

Let's check. I understand that you don’t want to, but why skip errors where they can absolutely not be missed? Let us substitute the found solution into the left side of each equation of the system:

The right-hand sides of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative slope. Function tells us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression “to be in trend,” and I think that this term does not need additional comments.

Let's calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the “raspberry” segments (two of which are so small that they are not even visible).

Let's summarize the calculations in a table:

Again, they can be done manually; just in case, I’ll give an example for the 1st point:

but it is much more effective to do it in the already known way:

We repeat once again: What is the meaning of the result obtained? From all linear functions y function the indicator is the smallest, that is, in its family it is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function will better approximate the experimental points?

Let's find the corresponding sum of squared deviations - to distinguish, I will denote them by the letter “epsilon”. The technique is exactly the same:

And again, just in case, the calculations for the 1st point:

In Excel we use the standard function EXP (syntax can be found in Excel Help).

Conclusion: , which means that the exponential function approximates the experimental points worse than a straight line .

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I have built a graph of this exponential function - and it also passes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This concludes the solution, and I return to the question of the natural values of the argument. In various studies, usually economic or sociological, natural “X’s” are used to number months, years or other equal time intervals. Consider, for example, the following problem:

The following data is available on the store’s retail turnover for the first half of the year:

Using analytical straight line alignment, determine the volume of turnover for July.

Yes, no problem: we number the months 1, 2, 3, 4, 5, 6 and use the usual algorithm, as a result of which we get an equation - the only thing is that when it comes to time, they usually use the letter “te” (although this is not critical). The resulting equation shows that in the first half of the year trade turnover increased by an average of 27.74 units. per month. Let's get the forecast for July (month no. 7): d.e.

And there are countless tasks like this. Those who wish can use an additional service, namely my Excel calculator (demo version), which solves the analyzed problem almost instantly! Working version of the program is available in exchange or for symbolic fee.

At the end of the lesson, brief information about finding dependencies of some other types. Actually, there’s not much to tell, since the fundamental approach and solution algorithm remain the same.

Let us assume that the arrangement of the experimental points resembles a hyperbola. Then, to find the coefficients of the best hyperbola, you need to find the minimum of the function - anyone can carry out detailed calculations and arrive at a similar system:

From a formal technical point of view, it is obtained from a “linear” system (let's denote it with an asterisk) replacing "x" with . Well, what about the amounts? calculate, after which to the optimal coefficients “a” and “be” close at hand.

If there is every reason to believe that the points are located along a logarithmic curve, then to find the optimal values we find the minimum of the function . Formally, in the system (*) needs to be replaced with:

When performing calculations in Excel, use the function LN. I confess that it would not be particularly difficult for me to create calculators for each of the cases under consideration, but it would still be better if you “programmed” the calculations yourself. Lesson videos to help.

With exponential dependence the situation is a little more complicated. To reduce the matter to the linear case, we take the function logarithm and use properties of the logarithm:

Now, comparing the resulting function with the linear function, we come to the conclusion that in the system (*) must be replaced by , and – by . For convenience, let's denote:

Please note that the system is resolved with respect to and, and therefore, after finding the roots, you must not forget to find the coefficient itself.

To bring experimental points closer optimal parabola , should be found minimum function of three variables. After performing standard actions, we get the following “working” system:

Yes, of course, there are more amounts here, but there are no difficulties at all when using your favorite application. And finally, I’ll tell you how to quickly perform a check using Excel and build the desired trend line: create a scatter plot, select any of the points with the mouse and right click select the option "Add trend line". Next, select the chart type and on the tab "Options" activate the option "Show equation on diagram". OK

As always, I want to end the article with some beautiful phrase, and I almost typed “Be in trend!” But he changed his mind in time. And not because it is stereotyped. I don’t know how it is for anyone, but I don’t really want to follow the promoted American and especially European trend =) Therefore, I wish each of you to stick to your own line!

http://www.grandars.ru/student/vysshaya-matematika/metod-naimenshih-kvadratov.html

The least squares method is one of the most common and most developed due to its simplicity and efficiency of methods for estimating parameters of linear econometric models. At the same time, when using it, some caution should be observed, since models constructed using it may not satisfy a number of requirements for the quality of their parameters and, as a result, do not reflect the patterns of process development “well” enough.

Let us consider the procedure for estimating the parameters of a linear econometric model using the least squares method in more detail. Such a model in general can be represented by equation (1.2):

y t = a 0 + a 1 x 1t +...+ a n x nt + ε t.

The initial data when estimating the parameters a 0 , a 1 ,..., a n is a vector of values of the dependent variable y= (y 1 , y 2 , ... , y T)" and the matrix of values of independent variables

in which the first column, consisting of ones, corresponds to the model coefficient.

The least squares method received its name based on the basic principle that the parameter estimates obtained on its basis must satisfy: the sum of squares of the model error should be minimal.

Examples of solving problems using the least squares method

Example 2.1. The trading enterprise has a network of 12 stores, information on the activities of which is presented in table. 2.1.

The management of the enterprise would like to know how the size of the annual turnover depends on the retail space of the store.

Table 2.1

Store number	Annual turnover, million rubles.	Retail area, thousand m2
	19,76	0,24
	38,09	0,31
	40,95	0,55
	41,08	0,48
	56,29	0,78
	68,51	0,98
	75,01	0,94
	89,05	1,21
	91,13	1,29
	91,26	1,12
	99,84	1,29
	108,55	1,49

Least squares solution. Let us denote the annual turnover of the th store, million rubles; - retail area of the th store, thousand m2.

Fig.2.1. Scatterplot for Example 2.1

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.1).

Based on the scatter diagram, we can conclude that annual turnover is positively dependent on retail space (i.e., y will increase with increasing ). The most suitable form of functional connection is linear.

Information for further calculations is presented in table. 2.2. Using the least squares method, we estimate the parameters of a linear one-factor econometric model

Table 2.2

t	y t	x 1t	y t 2	x 1t 2	x 1t y t

	19,76	0,24	390,4576	0,0576	4,7424
	38,09	0,31	1450,8481	0,0961	11,8079
	40,95	0,55	1676,9025	0,3025	22,5225
	41,08	0,48	1687,5664	0,2304	19,7184
	56,29	0,78	3168,5641	0,6084	43,9062
	68,51	0,98	4693,6201	0,9604	67,1398
	75,01	0,94	5626,5001	0,8836	70,5094
	89,05	1,21	7929,9025	1,4641	107,7505
	91,13	1,29	8304,6769	1,6641	117,5577
	91,26	1,12	8328,3876	1,2544	102,2112
	99,84	1,29	9968,0256	1,6641	128,7936
	108,55	1,49	11783,1025	2,2201	161,7395
S	819,52	10,68	65008,554	11,4058	858,3991
Average	68,29	0,89

Thus,

Therefore, with an increase in retail space by 1 thousand m2, other things being equal, the average annual turnover increases by 67.8871 million rubles.

Example 2.2. The company's management noticed that the annual turnover depends not only on the store's sales area (see example 2.1), but also on the average number of visitors. The relevant information is presented in table. 2.3.

Table 2.3

Solution. Let us denote - the average number of visitors to the th store per day, thousand people.

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.2).

Based on the scatterplot, we can conclude that annual turnover is positively dependent on the average number of visitors per day (i.e., y will increase with increasing ). The form of functional dependence is linear.

Rice. 2.2. Scatterplot for Example 2.2

Table 2.4

t	x 2t	x 2t 2	y t x 2t	x 1t x 2t

	8,25	68,0625	163,02	1,98
	10,24	104,8575	390,0416	3,1744
	9,31	86,6761	381,2445	5,1205
	11,01	121,2201	452,2908	5,2848
	8,54	72,9316	480,7166	6,6612
	7,51	56,4001	514,5101	7,3598
	12,36	152,7696	927,1236	11,6184
	10,81	116,8561	962,6305	13,0801
	9,89	97,8121	901,2757	12,7581
	13,72	188,2384	1252,0872	15,3664
	12,27	150,5529	1225,0368	15,8283
	13,92	193,7664	1511,016	20,7408
S	127,83	1410,44	9160,9934	118,9728
Average	10,65

In general, it is necessary to determine the parameters of a two-factor econometric model

y t = a 0 + a 1 x 1t + a 2 x 2t + ε t

The information required for further calculations is presented in table. 2.4.

Let us estimate the parameters of a linear two-factor econometric model using the least squares method.

Thus,

Estimation of the coefficient =61.6583 shows that, other things being equal, with an increase in retail space by 1 thousand m 2, the annual turnover will increase by an average of 61.6583 million rubles.

The coefficient estimate = 2.2748 shows that, other things being equal, with an increase in the average number of visitors per 1 thousand people. per day, annual turnover will increase by an average of 2.2748 million rubles.

Example 2.3. Using the information presented in table. 2.2 and 2.4, estimate the parameter of the one-factor econometric model

where is the centered value of the annual turnover of the th store, million rubles; - centered value of the average daily number of visitors to the t-th store, thousand people. (see examples 2.1-2.2).

Solution. Additional information required for calculations is presented in table. 2.5.

Table 2.5



	-48,53	-2,40	5,7720	116,6013
	-30,20	-0,41	0,1702	12,4589
	-27,34	-1,34	1,8023	36,7084
	-27,21	0,36	0,1278	-9,7288
	-12,00	-2,11	4,4627	25,3570
	0,22	-3,14	9,8753	-0,6809
	6,72	1,71	2,9156	11,4687
	20,76	0,16	0,0348	3,2992
	22,84	-0,76	0,5814	-17,413
	22,97	3,07	9,4096	70,4503
	31,55	1,62	2,6163	51,0267
	40,26	3,27	10,6766	131,5387
Amount			48,4344	431,0566

Using formula (2.35), we obtain

Thus,

http://www.cleverstudents.ru/articles/mnk.html

Example.

Experimental data on the values of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values in the fourth row of the table are obtained by multiplying the values of the 2nd row by the values of the 3rd row for each number i.

The values in the fifth row of the table are obtained by squaring the values in the 2nd row for each number i.

The values in the last column of the table are the sums of the values across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The second order differential has the form:

That is

Therefore, the matrix of quadratic form has the form

and the values of the elements do not depend on A And b.

Let us show that the matrix is positive definite. To do this, the angular minors must be positive.

Angular minor of the first order . The inequality is strict, since the points

Tutorial

Introduction

I am a mathematician and programmer. The biggest leap I took in my career was when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what he, the luminary, is telling me. And it's very difficult. Yes, admitting your ignorance is difficult and embarrassing. Who likes to admit that he doesn’t know the basics of something? Due to my profession, I have to attend a large number of presentations and lectures, where, I admit, in the vast majority of cases I want to sleep because I do not understand anything. But I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all listeners are familiar with absolutely all areas of mathematics (which is absurd). Admitting that you don’t know what a derivative is (we’ll talk about what it is a little later) is shameful.

But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I don’t know why quadratic equations are needed in life. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, there is no reputation, no authority. Yes, it is prestigious to speak in as abstract a language as possible, which is complete nonsense.

Do you know what a derivative is? Most likely you will tell me about the limit of the difference ratio. In the first year of mathematics and mechanics at St. Petersburg State University, Viktor Petrovich Khavin told me determined derivative as the coefficient of the first term of the Taylor series of the function at a point (this was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time until I finally understood what it was about. The derivative is nothing more than a simple measure of how similar the function we are differentiating is to the function y=x, y=x^2, y=x^3.

I now have the honor of lecturing to students who afraid mathematics. If you are afraid of mathematics, we are on the same path. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is poorly written. I assert that there is not a single area of mathematics that cannot be discussed “on the fingers” without losing accuracy.

Assignment for the near future: I assigned my students to understand what a linear quadratic regulator is. Don’t be shy, spend three minutes of your life and follow the link. If you don’t understand anything, then we are on the same path. I (a professional mathematician-programmer) didn’t understand anything either. And I assure you, you can figure this out “on your fingers.” At the moment I don't know what it is, but I assure you that we will be able to figure it out.

So, the first lecture that I am going to give to my students after they come running to me in horror and say that a linear-quadratic regulator is a terrible thing that you will never master in your life is least squares methods. Can you solve linear equations? If you are reading this text, then most likely not.

So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of the line passing through these two points:

illustration

This line should have an equation like the following:

Here alpha and beta are unknown to us, but two points of this line are known:

We can write this equation in matrix form:

Here we should make a lyrical digression: what is a matrix? A matrix is nothing more than a two-dimensional array. This is a way of storing data; no further meanings should be attached to it. It depends on us exactly how to interpret a certain matrix. Periodically I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

Let's replace concrete matrices with their symbolic representation:

Then (alpha, beta) can be easily found:

More specifically for our previous data:

Which leads to the following equation of the line passing through the points (1,1) and (3,2):

Okay, everything is clear here. Let's find the equation of the line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

Oh-oh-oh, but we have three equations for two unknowns! A standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

In our case, the vectors i, j, b are three-dimensional, therefore (in the general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality cannot be achieved in the equation). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) exactly how far we have not achieved equality:

And we will try to minimize this error:

Why square?

We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha, beta)), while simply the length gives a cone-shaped function, non-differentiable at the minimum point. Brr. A square is more convenient.

Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i And j.

Illustration

In other words: we are looking for a straight line such that the sum of the squared lengths of the distances from all points to this straight line is minimal:

UPDATE: I have a problem here, the distance to the straight line should be measured vertically, and not by orthogonal projection. This commentator is right.

Illustration

In completely different words (carefully, poorly formalized, but it should be clear): we take all possible lines between all pairs of points and look for the average line between all:

Illustration

Another explanation is straightforward: we attach a spring between all data points (here we have three) and the straight line that we are looking for, and the straight line of the equilibrium state is exactly what we are looking for.

Minimum quadratic form

So, given this vector b and a plane spanned by the column vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for the vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the column vectors of the matrix A:

In other words, we are looking for a vector x=(alpha, beta) such that:

Let me remind you that this vector x=(alpha, beta) is the minimum of the quadratic function ||e(alpha, beta)||^2:

Here it would be useful to remember that the matrix can be interpreted also as a quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function x^2 + y^2:

quadratic form

All this gymnastics is known under the name linear regression.

Laplace's equation with Dirichlet boundary condition

Now the simplest real task: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load a model of my face:

The original commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. To solve a linear system, I use OpenNL, this is an excellent solver, which, however, is very difficult to install: you need to copy two files (.h+.c) to the folder with your project. All smoothing is done with the following code:

For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with a number of variables equal to the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have the original model coordinates. That is, I tie a spring between the new position of the vertex and the old position of the vertex - the new ones should not move too far from the old ones.

All subsequent rows of matrix A (faces.size()*3 = number of edges of all triangles in the mesh) have one occurrence of 1 and one occurrence of -1, with the vector b having zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending point.

Once again: all vertices are variables, and they cannot move far from their original position, but at the same time they try to become similar to each other.

Here's the result:

Everything would be fine, the model is really smoothed, but it has moved away from its original edge. Let's change the code a little:

For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000*1000 units. That is, we hung a stronger spring on the extreme vertices, the solution will prefer to stretch the others more strongly. Here's the result:

Let's double the spring strength between the vertices:
nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

It is logical that the surface has become smoother:

And now even a hundred times stronger:

What is this? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved Laplace's equation with Dirichlet boundary conditions. Sounds cool? But in reality, you just need to solve one system of linear equations.

Poisson's equation

Let's remember another cool name.

Let's say I have an image like this:

Looks good to everyone, but I don’t like the chair.

I'll cut the picture in half:

And I will select a chair with my hands:

Then I will pull everything that is white in the mask to the left side of the picture, and at the same time throughout the picture I will say that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right picture:

For (int i=0; i

Here's the result:

Code and pictures available

Ordinary Least Squares (OLS) method- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of certain functions from the desired variables. It can be used to “solve” overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find solutions in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values of some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

Encyclopedic YouTube

1 / 5

✪ Least squares method. Subject

✪ Mitin I.V. - Processing of physical results. experiment - Least squares method (Lecture 4)

✪ Least squares method, lesson 1/2. Linear function

✪ Econometrics. Lecture 5. Least squares method

✪ Least squares method. Answers

Subtitles

Story

Until the beginning of the 19th century. scientists did not have certain rules for solving a system of equations in which the number of unknowns is less than the number of equations; Until that time, private techniques were used that depended on the type of equations and on the wit of the calculators, and therefore different calculators, based on the same observational data, came to different conclusions. Gauss (1795) was the first to use the method, and Legendre (1805) independently discovered and published it under its modern name (French. Méthode des moindres quarrés) . Laplace connected the method with probability theory, and the American mathematician Adrain (1808) considered its probability-theoretic applications. The method was widespread and improved by further research by Encke, Bessel, Hansen and others.

The essence of the least squares method

Let x (\displaystyle x)- kit n (\displaystyle n) unknown variables (parameters), f i (x) (\displaystyle f_(i)(x)), , m > n (\displaystyle m>n)- a set of functions from this set of variables. The task is to select such values x (\displaystyle x), so that the values of these functions are as close as possible to certain values y i (\displaystyle y_(i)). Essentially we are talking about the “solution” of an overdetermined system of equations f i (x) = y i (\displaystyle f_(i)(x)=y_(i)), i = 1 , … , m (\displaystyle i=1,\ldots ,m) in the indicated sense of maximum proximity of the left and right parts of the system. The essence of the least squares method is to select as a “proximity measure” the sum of squared deviations of the left and right sides | f i (x) − y i | (\displaystyle |f_(i)(x)-y_(i)|). Thus, the essence of MNC can be expressed as follows:

∑ i e i 2 = ∑ i (y i − f i (x)) 2 → min x (\displaystyle \sum _(i)e_(i)^(2)=\sum _(i)(y_(i)-f_( i)(x))^(2)\rightarrow \min _(x)).

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions to the system of equations can be found analytically or, for example, using various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of desired variables, then the system does not have an exact solution and the least squares method allows us to find some “optimal” vector x (\displaystyle x) in the sense of maximum proximity of vectors y (\displaystyle y) And f (x) (\displaystyle f(x)) or maximum proximity of the deviation vector e (\displaystyle e) to zero (closeness is understood in the sense of Euclidean distance).

Example - system of linear equations

In particular, the method of least squares can be used to "solve" a system of linear equations

A x = b (\displaystyle Ax=b),

Where A (\displaystyle A) rectangular size matrix m × n , m > n (\displaystyle m\times n,m>n)(i.e. the number of rows of matrix A is greater than the number of sought variables).

In the general case, such a system of equations has no solution. Therefore, this system can be “solved” only in the sense of choosing such a vector x (\displaystyle x) to minimize the "distance" between vectors A x (\displaystyle Ax) And b (\displaystyle b). To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is (A x − b) T (A x − b) → min (\displaystyle (Ax-b)^(T)(Ax-b)\rightarrow \min ). It is easy to show that solving this minimization problem leads to solving the following system of equations

A T A x = A T b ⇒ x = (A T A) − 1 A T b (\displaystyle A^(T)Ax=A^(T)b\Rightarrow x=(A^(T)A)^(-1)A^ (T)b).

OLS in regression analysis (data approximation)

Let there be n (\displaystyle n) values of some variable y (\displaystyle y)(this could be the results of observations, experiments, etc.) and related variables x (\displaystyle x). The challenge is to ensure that the relationship between y (\displaystyle y) And x (\displaystyle x) approximate by some function known to within some unknown parameters b (\displaystyle b), that is, actually find the best values of the parameters b (\displaystyle b), maximally approximating the values f (x , b) (\displaystyle f(x,b)) to actual values y (\displaystyle y). In fact, this comes down to the case of “solving” an overdetermined system of equations with respect to b (\displaystyle b):

F (x t , b) = y t , t = 1 , … , n (\displaystyle f(x_(t),b)=y_(t),t=1,\ldots ,n).

In regression analysis and in particular in econometrics, probabilistic models of dependence between variables are used

Y t = f (x t , b) + ε t (\displaystyle y_(t)=f(x_(t),b)+\varepsilon _(t)),

Where ε t (\displaystyle \varepsilon _(t))- so called random errors models.

Accordingly, deviations of the observed values y (\displaystyle y) from model f (x , b) (\displaystyle f(x,b)) is already assumed in the model itself. The essence of the least squares method (ordinary, classical) is to find such parameters b (\displaystyle b), at which the sum of squared deviations (errors, for regression models they are often called regression residuals) e t (\displaystyle e_(t)) will be minimal:

b ^ O L S = arg ⁡ min b R S S (b) (\displaystyle (\hat (b))_(OLS)=\arg \min _(b)RSS(b)),

Where R S S (\displaystyle RSS)- English Residual Sum of Squares is defined as:

R S S (b) = e T e = ∑ t = 1 n e t 2 = ∑ t = 1 n (y t − f (x t , b)) 2 (\displaystyle RSS(b)=e^(T)e=\sum _ (t=1)^(n)e_(t)^(2)=\sum _(t=1)^(n)(y_(t)-f(x_(t),b))^(2) ).

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case they talk about nonlinear least squares(NLS or NLLS - English Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function R S S (b) (\displaystyle RSS(b)), differentiating it according to unknown parameters b (\displaystyle b), equating the derivatives to zero and solving the resulting system of equations:

∑ t = 1 n (y t − f (x t , b)) ∂ f (x t , b) ∂ b = 0 (\displaystyle \sum _(t=1)^(n)(y_(t)-f(x_ (t),b))(\frac (\partial f(x_(t),b))(\partial b))=0).

OLS in the case of linear regression

Let the regression dependence be linear:

y t = ∑ j = 1 k b j x t j + ε = x t T b + ε t (\displaystyle y_(t)=\sum _(j=1)^(k)b_(j)x_(tj)+\varepsilon =x_( t)^(T)b+\varepsilon _(t)).

Let y is the column vector of observations of the variable being explained, and X (\displaystyle X)- This (n × k) (\displaystyle ((n\times k)))-matrix of factor observations (rows of the matrix are vectors of factor values in a given observation, columns are a vector of values of a given factor in all observations). The matrix representation of the linear model has the form:

y = X b + ε (\displaystyle y=Xb+\varepsilon ).

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

y ^ = X b , e = y − y ^ = y − X b (\displaystyle (\hat (y))=Xb,\quad e=y-(\hat (y))=y-Xb).

Accordingly, the sum of squares of the regression residuals will be equal to

R S S = e T e = (y − X b) T (y − X b) (\displaystyle RSS=e^(T)e=(y-Xb)^(T)(y-Xb)).

Differentiating this function with respect to the vector of parameters b (\displaystyle b) and equating the derivatives to zero, we obtain a system of equations (in matrix form):

(X T X) b = X T y (\displaystyle (X^(T)X)b=X^(T)y).

In deciphered matrix form, this system of equations looks like this:

(∑ x t 1 2 ∑ x t 1 x t 2 ∑ x t 1 x t 3 … ∑ x t 1 x t k ∑ x t 2 x t 1 ∑ x t 2 2 ∑ x t 2 x t 3 … ∑ x t 2 x t k ∑ x t 3 x t 1 ∑ x t 3 x t 2 ∑ x t 3 2 … ∑ x t 3 x t k ⋮ ⋮ ⋮ ⋱ ⋮ ∑ x t k x t 1 ∑ x t k x t 2 ∑ x t k x t 3 … ∑ x t k 2) (b 1 b 2 b 3 ⋮ b k) = (∑ x t 1 y t ∑ x t 2 y t ∑ x t 3 y t ⋮ ∑ x t k y t) , (\displaystyle (\begin(pmatrix)\sum x_(t1)^(2)&\sum x_(t1)x_(t2)&\sum x_(t1)x_(t3)&\ldots &\sum x_(t1)x_(tk)\\\sum x_(t2)x_(t1)&\sum x_(t2)^(2)&\sum x_(t2)x_(t3)&\ldots &\ sum x_(t2)x_(tk)\\\sum x_(t3)x_(t1)&\sum x_(t3)x_(t2)&\sum x_(t3)^(2)&\ldots &\sum x_ (t3)x_(tk)\\\vdots &\vdots &\vdots &\ddots &\vdots \\\sum x_(tk)x_(t1)&\sum x_(tk)x_(t2)&\sum x_ (tk)x_(t3)&\ldots &\sum x_(tk)^(2)\\\end(pmatrix))(\begin(pmatrix)b_(1)\\b_(2)\\b_(3 )\\\vdots \\b_(k)\\\end(pmatrix))=(\begin(pmatrix)\sum x_(t1)y_(t)\\\sum x_(t2)y_(t)\\ \sum x_(t3)y_(t)\\\vdots \\\sum x_(tk)y_(t)\\\end(pmatrix)),) where all sums are taken over all valid values t (\displaystyle t).

If a constant is included in the model (as usual), then x t 1 = 1 (\displaystyle x_(t1)=1) in front of everyone t (\displaystyle t), therefore, in the upper left corner of the matrix of the system of equations there is the number of observations n (\displaystyle n), and in the remaining elements of the first row and first column - simply the sums of the variable values: ∑ x t j (\displaystyle \sum x_(tj)) and the first element of the right side of the system is ∑ y t (\displaystyle \sum y_(t)).

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

b ^ O L S = (X T X) − 1 X T y = (1 n X T X) − 1 1 n X T y = V x − 1 C x y (\displaystyle (\hat (b))_(OLS)=(X^(T )X)^(-1)X^(T)y=\left((\frac (1)(n))X^(T)X\right)^(-1)(\frac (1)(n ))X^(T)y=V_(x)^(-1)C_(xy)).

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when dividing by n, arithmetic means appear instead of sums). If in a regression model the data centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If in addition the data is also normalized to MSE (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with constant- the line of the constructed regression passes through the center of gravity of the sample data, that is, the equality is satisfied:

y ¯ = b 1 ^ + ∑ j = 2 k b ^ j x ¯ j (\displaystyle (\bar (y))=(\hat (b_(1)))+\sum _(j=2)^(k) (\hat (b))_(j)(\bar (x))_(j)).

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws of large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

The simplest special cases

In the case of paired linear regression y t = a + b x t + ε t (\displaystyle y_(t)=a+bx_(t)+\varepsilon _(t)), when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

(1 x ¯ x ¯ x 2 ¯) (a b) = (y ¯ x y ¯) (\displaystyle (\begin(pmatrix)1&(\bar (x))\\(\bar (x))&(\bar (x^(2)))\\\end(pmatrix))(\begin(pmatrix)a\\b\\\end(pmatrix))=(\begin(pmatrix)(\bar (y))\\ (\overline (xy))\\\end(pmatrix))).

From here it is easy to find coefficient estimates:

( b ^ = Cov ⁡ (x , y) Var ⁡ (x) = x y ¯ − x ¯ y ¯ x 2 ¯ − x ¯ 2 , a ^ = y ¯ − b x ¯ . (\displaystyle (\begin(cases) (\hat (b))=(\frac (\mathop (\textrm (Cov)) (x,y))(\mathop (\textrm (Var)) (x)))=(\frac ((\overline (xy))-(\bar (x))(\bar (y)))((\overline (x^(2)))-(\overline (x))^(2))),\\( \hat (a))=(\bar (y))-b(\bar (x)).\end(cases)))

Despite the fact that in the general case models with a constant are preferable, in some cases it is known from theoretical considerations that a constant a (\displaystyle a) must be equal to zero. For example, in physics the relationship between voltage and current is U = I ⋅ R (\displaystyle U=I\cdot R); When measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about the model y = b x (\displaystyle y=bx). In this case, instead of a system of equations we have a single equation

(∑ x t 2) b = ∑ x t y t (\displaystyle \left(\sum x_(t)^(2)\right)b=\sum x_(t)y_(t)).

Therefore, the formula for estimating the single coefficient has the form

B ^ = ∑ t = 1 n x t y t ∑ t = 1 n x t 2 = x y ¯ x 2 ¯ (\displaystyle (\hat (b))=(\frac (\sum _(t=1)^(n)x_(t )y_(t))(\sum _(t=1)^(n)x_(t)^(2)))=(\frac (\overline (xy))(\overline (x^(2)) ))).

The case of a polynomial model

If the data is fit by a polynomial regression function of one variable f (x) = b 0 + ∑ i = 1 k b i x i (\displaystyle f(x)=b_(0)+\sum \limits _(i=1)^(k)b_(i)x^(i)), then, perceiving degrees x i (\displaystyle x^(i)) as independent factors for each i (\displaystyle i) it is possible to estimate the model parameters based on the general formula for estimating the parameters of a linear model. To do this, it is enough to take into account in the general formula that with such an interpretation x t i x t j = x t i x t j = x t i + j (\displaystyle x_(ti)x_(tj)=x_(t)^(i)x_(t)^(j)=x_(t)^(i+j)) And x t j y t = x t j y t (\displaystyle x_(tj)y_(t)=x_(t)^(j)y_(t)). Consequently, the matrix equations in this case will take the form:

(n ∑ n x t … ∑ n x t k ∑ n x t ∑ n x i 2 … ∑ m x i k + 1 ⋮ ⋮ ⋱ ⋮ ∑ n x t k ∑ n x t k + 1 … ∑ n x t 2 k) [ b 0 b 1 ⋮ b k ] = [ ∑ n y t ∑ n x t y t ⋮ ∑ n x t k y t ] . (\displaystyle (\begin(pmatrix)n&\sum \limits _(n)x_(t)&\ldots &\sum \limits _(n)x_(t)^(k)\\\sum \limits _( n)x_(t)&\sum \limits _(n)x_(i)^(2)&\ldots &\sum \limits _(m)x_(i)^(k+1)\\\vdots & \vdots &\ddots &\vdots \\\sum \limits _(n)x_(t)^(k)&\sum \limits _(n)x_(t)^(k+1)&\ldots &\ sum \limits _(n)x_(t)^(2k)\end(pmatrix))(\begin(bmatrix)b_(0)\\b_(1)\\\vdots \\b_(k)\end( bmatrix))=(\begin(bmatrix)\sum \limits _(n)y_(t)\\\sum \limits _(n)x_(t)y_(t)\\\vdots \\\sum \limits _(n)x_(t)^(k)y_(t)\end(bmatrix)).)

Statistical properties of OLS estimators

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to fulfill the most important condition of regression analysis: the mathematical expectation of a random error, conditional on the factors, must be equal to zero. This condition, in particular, is satisfied if

the mathematical expectation of random errors is zero, and
factors and random errors are independent random variables.

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not met, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even a very large amount of data does not allow us to obtain high-quality estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix V x (\displaystyle V_(x)) to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) least squares to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

These assumptions can be formulated for the covariance matrix of the random error vector V (ε) = σ 2 I (\displaystyle V(\varepsilon)=\sigma ^(2)I).

A linear model that satisfies these conditions is called classical. OLS estimates for classical linear regression are unbiased, consistent and the most effective estimates in the class of all linear unbiased estimates (in the English literature the abbreviation is sometimes used BLUE (Best Linear Unbiased Estimator) - the best linear unbiased estimate; In Russian literature, the Gauss-Markov theorem is more often cited). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

V (b ^ O L S) = σ 2 (X T X) − 1 (\displaystyle V((\hat (b))_(OLS))=\sigma ^(2)(X^(T)X)^(-1 )).

Efficiency means that this covariance matrix is “minimal” (any linear combination of coefficients, and in particular the coefficients themselves, have minimal variance), that is, in the class of linear unbiased estimators, OLS estimators are best. The diagonal elements of this matrix - the variances of coefficient estimates - are important parameters of the quality of the obtained estimates. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proven that an unbiased and consistent (for a classical linear model) estimate of the variance of random errors is the quantity:

S 2 = R S S / (n − k) (\displaystyle s^(2)=RSS/(n-k)).

Substituting this value into the formula for the covariance matrix, we obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and hence the variance of the coefficients) and the estimates of the model parameters are independent random variables, which makes it possible to obtain test statistics for testing hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, OLS parameter estimates are not the most efficient and, where W (\displaystyle W) is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is proportional to the identity matrix. As is known, for symmetric matrices (or operators) there is an expansion W = P T P (\displaystyle W=P^(T)P). Therefore, the specified functional can be represented as follows e T P T P e = (P e) T P e = e ∗ T e ∗ (\displaystyle e^(T)P^(T)Pe=(Pe)^(T)Pe=e_(*)^(T)e_( *)), that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized Least Squares (GLS - Generalized Least Squares)- LS method with a weight matrix equal to the inverse covariance matrix of random errors: W = V ε − 1 (\displaystyle W=V_(\varepsilon )^(-1)).

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

B ^ G L S = (X T V − 1 X) − 1 X T V − 1 y (\displaystyle (\hat (b))_(GLS)=(X^(T)V^(-1)X)^(-1) X^(T)V^(-1)y).

The covariance matrix of these estimates will accordingly be equal to

V (b ^ G L S) = (X T V − 1 X) − 1 (\displaystyle V((\hat (b))_(GLS))=(X^(T)V^(-1)X)^(- 1)).

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted Least Squares (WLS). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation: e T W e = ∑ t = 1 n e t 2 σ t 2 (\displaystyle e^(T)We=\sum _(t=1)^(n)(\frac (e_(t)^(2))(\ sigma_(t)^(2)))). In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

ISBN 978-5-7749-0473-0 .

Econometrics. Textbook / Ed. Eliseeva I.I. - 2nd ed. - M.: Finance and Statistics, 2006. - 576 p. - ISBN 5-279-02786-3.

Alexandrova N.V. History of mathematical terms, concepts, notations: dictionary-reference book. - 3rd ed. - M.: LKI, 2008. - 248 p. - ISBN 978-5-382-00839-4. I.V. Mitin, Rusakov V.S. Analysis and processing of experimental data - 5th edition - 24 p.

Let us approximate the function by a polynomial of degree 2. To do this, we calculate the coefficients of the normal system of equations:

, ,

Let's create a normal least squares system, which has the form:

The solution to the system is easy to find:, , .

Thus, a polynomial of the 2nd degree is found: .

Theoretical information

Return to page<Введение в вычислительную математику. Примеры>

Example 2. Finding the optimal degree of a polynomial.

Return to page<Введение в вычислительную математику. Примеры>

Example 3. Derivation of a normal system of equations for finding the parameters of the empirical dependence.

Let us derive a system of equations to determine the coefficients and functions , which carries out the root-mean-square approximation of a given function by points. Let's compose a function and write down the necessary extremum condition for it:

Then the normal system will take the form:

We obtained a linear system of equations for unknown parameters and, which is easily solved.

Theoretical information

Return to page<Введение в вычислительную математику. Примеры>

Example.

Experimental data on the values of variables X And at are given in the table.

As a result of their alignment, the function is obtained

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And btakes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding the partial derivatives of a function by variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or Cramer’s method) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n— amount of experimental data. We recommend calculating the values of these amounts separately.

Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values in the fourth row of the table are obtained by multiplying the values of the 2nd row by the values of the 3rd row for each number i.

The values in the fifth row of the table are obtained by squaring the values in the 2nd row for each number i.

The values in the last column of the table are the sums of the values across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values from the last column of the table into them:

Hence, y = 0.165x+2.184— the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Top of page

Proof.

The second order differential has the form:

That is

Therefore, the matrix of quadratic form has the form

and the values of the elements do not depend on A And b.

Let us show that the matrix is positive definite. To do this, the angular minors must be positive.

Angular minor of the first order . The inequality is strict because the points do not coincide. In what follows we will imply this.

Second order angular minor

Let's prove that by the method of mathematical induction.

Conclusion: found values A And b correspond to the smallest value of the function , therefore, are the required parameters for the least squares method.

No time to figure it out?
Order a solution

Top of page

Developing a forecast using the least squares method. Example of problem solution

Extrapolation is a scientific research method that is based on the dissemination of past and present trends, patterns, and connections to the future development of the forecast object. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

Essence least squares method consists in minimizing the sum of square deviations between observed and calculated values. The calculated values are found using the selected equation - the regression equation. The smaller the distance between the actual values and the calculated ones, the more accurate the forecast based on the regression equation.

A theoretical analysis of the essence of the phenomenon being studied, the change in which is reflected by a time series, serves as the basis for choosing a curve. Sometimes considerations about the nature of the increase in the levels of the series are taken into account. Thus, if output growth is expected in an arithmetic progression, then smoothing is performed in a straight line. If it turns out that the growth is in geometric progression, then smoothing must be done using an exponential function.

Working formula for the least squares method : Y t+1 = a*X + b, where t + 1 – forecast period; Уt+1 – predicted indicator; a and b are coefficients; X is a symbol of time.

Calculation of coefficients a and b is carried out using the following formulas:

where, Uf – actual values of the dynamics series; n – number of time series levels;

Smoothing time series using the least squares method serves to reflect the pattern of development of the phenomenon being studied. In the analytical expression of a trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

The development of a phenomenon does not depend on how many years have passed since the starting point, but on what factors influenced its development, in what direction and with what intensity. From here it is clear that the development of a phenomenon over time is the result of the action of these factors.

Correctly establishing the type of curve, the type of analytical dependence on time is one of the most difficult tasks of predictive analysis .

The selection of the type of function that describes the trend, the parameters of which are determined by the least squares method, is carried out in most cases empirically, by constructing a number of functions and comparing them with each other according to the value of the mean square error, calculated by the formula:

where UV are the actual values of the dynamics series; Ur – calculated (smoothed) values of the dynamics series; n – number of time series levels; p – the number of parameters defined in formulas describing the trend (development trend).

Disadvantages of the least squares method :

when trying to describe the economic phenomenon being studied using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
the complexity of selecting a regression equation that is solvable using standard computer programs.

An example of using the least squares method to develop a forecast

Task . There are data characterizing the unemployment rate in the region, %

Construct a forecast of the unemployment rate in the region for November, December, January using the following methods: moving average, exponential smoothing, least squares.
Calculate the errors in the resulting forecasts using each method.
Compare the results and draw conclusions.

Least squares solution

To solve this, we will draw up a table in which we will make the necessary calculations:

ε = 28.63/10 = 2.86% forecast accuracy high.

Conclusion : Comparing the results obtained from the calculations moving average method , exponential smoothing method and the least squares method, we can say that the average relative error when calculating using the exponential smoothing method falls within the range of 20-50%. This means that the accuracy of the forecast in this case is only satisfactory.

In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the moving average method made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,13%.

Least square method

MNC program

Enter data

Data and approximation y = a + b x

i- number of experimental point;
x i- value of a fixed parameter at a point i;
y i- value of the measured parameter at a point i;
ωi- measurement weight at a point i;
y i, calc.- difference between measured and regression calculated value y at the point i;
S x i (x i)- error estimate x i when measuring y at the point i.

Data and approximation y = k x

i	x i	y i	ωi	y i, calc.	Δy i	S x i (x i)

Click on the chart

User's manual for the MNC online program.

In the data field, enter on each separate line the values of `x` and `y` at one experimental point. Values must be separated by a whitespace character (space or tab).

The third value could be the weight of the point `w`. If the weight of a point is not specified, it is equal to one. In the vast majority of cases, the weights of experimental points are unknown or not calculated, i.e. all experimental data are considered equivalent. Sometimes the weights in the studied range of values are absolutely not equivalent and can even be calculated theoretically. For example, in spectrophotometry, weights can be calculated using simple formulas, although this is mostly neglected to reduce labor costs.

Data can be pasted via the clipboard from a spreadsheet in an office suite such as Excel from Microsoft Office or Calc from Open Office. To do this, in the spreadsheet, select the range of data to copy, copy to the clipboard, and paste the data into the data field on this page.

To calculate using the least squares method, at least two points are needed to determine two coefficients `b` - the tangent of the angle of inclination of the line and `a` - the value intercepted by the line on the `y` axis.

To estimate the error of the calculated regression coefficients, you need to set the number of experimental points to more than two.

Least squares method (LSM).

The greater the number of experimental points, the more accurate the statistical assessment of the coefficients (due to a decrease in the Student coefficient) and the closer the estimate to the estimate of the general sample.

Obtaining values at each experimental point is often associated with significant labor costs, so a compromise number of experiments is often carried out that gives a manageable estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for a linear least squares dependence with two coefficients is selected in the region of 5-7 points.

A Brief Theory of Least Squares for Linear Relationships

Let's say we have a set of experimental data in the form of pairs of values [`y_i`, `x_i`], where `i` is the number of one experimental measurement from 1 to `n`; `y_i` - the value of the measured quantity at point `i`; `x_i` - the value of the parameter we set at point `i`.

As an example, consider the operation of Ohm's law. By changing the voltage (potential difference) between sections of an electrical circuit, we measure the amount of current passing through this section. Physics gives us a dependence found experimentally:

`I = U/R`,
where `I` is the current strength; `R` - resistance; `U` - voltage.

In this case, `y_i` is the current value being measured, and `x_i` is the voltage value.

As another example, consider the absorption of light by a solution of a substance in solution. Chemistry gives us the formula:

`A = ε l C`,
where `A` is the optical density of the solution; `ε` - transmittance of the solute; `l` - path length when light passes through a cuvette with a solution; `C` is the concentration of the dissolved substance.

In this case, `y_i` is the measured value of optical density `A`, and `x_i` is the concentration value of the substance that we specify.

We will consider the case when the relative error in the assignment `x_i` is significantly less than the relative error in the measurement `y_i`. We will also assume that all measured values `y_i` are random and normally distributed, i.e. obey the normal distribution law.

In the case of a linear dependence of `y` on `x`, we can write the theoretical dependence:
`y = a + b x`.

From a geometric point of view, the coefficient `b` denotes the tangent of the angle of inclination of the line to the `x` axis, and the coefficient `a` - the value of `y` at the point of intersection of the line with the `y` axis (at `x = 0`).

Finding the regression line parameters.

In an experiment, the measured values of `y_i` cannot exactly lie on the theoretical straight line due to measurement errors, which are always inherent in real life. Therefore, a linear equation must be represented by a system of equations:
`y_i = a + b x_i + ε_i` (1),
where `ε_i` is the unknown measurement error of `y` in the `i`-th experiment.

Dependency (1) is also called regression, i.e. the dependence of two quantities on each other with statistical significance.

The task of restoring the dependence is to find the coefficients `a` and `b` from the experimental points [`y_i`, `x_i`].

To find the coefficients `a` and `b` it is usually used least square method(MNC). It is a special case of the maximum likelihood principle.

Let's rewrite (1) in the form `ε_i = y_i - a - b x_i`.

Then the sum of squared errors will be
`Φ = sum_(i=1)^(n) ε_i^2 = sum_(i=1)^(n) (y_i - a - b x_i)^2`. (2)

The principle of least squares (least squares) is to minimize the sum (2) with respect to parameters `a` and `b`.

The minimum is achieved when the partial derivatives of the sum (2) with respect to the coefficients `a` and `b` are equal to zero:
`frac(partial Φ)(partial a) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial a) = 0`
`frac(partial Φ)(partial b) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial b) = 0`

Expanding the derivatives, we obtain a system of two equations with two unknowns:
`sum_(i=1)^(n) (2a + 2bx_i — 2y_i) = sum_(i=1)^(n) (a + bx_i — y_i) = 0`
`sum_(i=1)^(n) (2bx_i^2 + 2ax_i — 2x_iy_i) = sum_(i=1)^(n) (bx_i^2 + ax_i — x_iy_i) = 0`

We open the brackets and transfer the sums independent of the required coefficients to the other half, we obtain a system of linear equations:
`sum_(i=1)^(n) y_i = a n + b sum_(i=1)^(n) bx_i`
`sum_(i=1)^(n) x_iy_i = a sum_(i=1)^(n) x_i + b sum_(i=1)^(n) x_i^2`

Solving the resulting system, we find formulas for the coefficients `a` and `b`:

`a = frac(sum_(i=1)^(n) y_i sum_(i=1)^(n) x_i^2 — sum_(i=1)^(n) x_i sum_(i=1)^(n ) x_iy_i) (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.1)

`b = frac(n sum_(i=1)^(n) x_iy_i — sum_(i=1)^(n) x_i sum_(i=1)^(n) y_i) (n sum_(i=1)^ (n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.2)

These formulas have solutions when `n > 1` (the line can be constructed using at least 2 points) and when the determinant `D = n sum_(i=1)^(n) x_i^2 - (sum_(i= 1)^(n) x_i)^2 != 0`, i.e. when the `x_i` points in the experiment are different (i.e. when the line is not vertical).

Estimation of errors of regression line coefficients

For a more accurate assessment of the error in calculating the coefficients `a` and `b`, a large number of experimental points is desirable. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will uniquely pass through two points.

The error of the random variable `V` is determined law of error accumulation
`S_V^2 = sum_(i=1)^p (frac(partial f)(partial z_i))^2 S_(z_i)^2`,
where `p` is the number of parameters `z_i` with error `S_(z_i)`, which affect the error `S_V`;
`f` is a function of the dependence of `V` on `z_i`.

Let us write down the law of error accumulation for the error of coefficients `a` and `b`
`S_a^2 = sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial a )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 `,
`S_b^2 = sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial b )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 `,
because `S_(x_i)^2 = 0` (we previously made a reservation that the error `x` is negligible).

`S_y^2 = S_(y_i)^2` - error (variance, squared standard deviation) in the measurement of `y`, assuming that the error is uniform for all values of `y`.

Substituting formulas for calculating `a` and `b` into the resulting expressions we get

`S_a^2 = S_y^2 frac(sum_(i=1)^(n) (sum_(i=1)^(n) x_i^2 — x_i sum_(i=1)^(n) x_i)^2 ) (D^2) = S_y^2 frac((n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2) sum_(i=1) ^(n) x_i^2) (D^2) = S_y^2 frac(sum_(i=1)^(n) x_i^2) (D)` (4.1)

`S_b^2 = S_y^2 frac(sum_(i=1)^(n) (n x_i — sum_(i=1)^(n) x_i)^2) (D^2) = S_y^2 frac( n (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)) (D^2) = S_y^2 frac(n) (D) ` (4.2)

In most real experiments, the value of `Sy` is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points in the plan, which increases the time (and possibly the cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The estimate of variance `y` in this case is calculated using the formula.

`S_y^2 = S_(y, rest)^2 = frac(sum_(i=1)^n (y_i - a - b x_i)^2) (n-2)`.

The `n-2` divisor appears because our number of degrees of freedom has decreased due to the calculation of two coefficients using the same sample of experimental data.

This estimate is also called the residual variance relative to the regression line `S_(y, rest)^2`.

The significance of coefficients is assessed using the Student’s t test

`t_a = frac(|a|) (S_a)`, `t_b = frac(|b|) (S_b)`

If the calculated criteria `t_a`, `t_b` are less than the tabulated criteria `t(P, n-2)`, then it is considered that the corresponding coefficient is not significantly different from zero with a given probability `P`.

To assess the quality of the description of a linear relationship, you can compare `S_(y, rest)^2` and `S_(bar y)` relative to the mean using the Fisher criterion.

`S_(bar y) = frac(sum_(i=1)^n (y_i — bar y)^2) (n-1) = frac(sum_(i=1)^n (y_i — (sum_(i= 1)^n y_i) /n)^2) (n-1)` - sample estimate of the variance `y` relative to the mean.

To assess the effectiveness of the regression equation to describe the dependence, the Fisher coefficient is calculated
`F = S_(bar y) / S_(y, rest)^2`,
which is compared with the tabular Fisher coefficient `F(p, n-1, n-2)`.

If `F > F(P, n-1, n-2)`, the difference between the description of the relationship `y = f(x)` using the regression equation and the description using the mean is considered statistically significant with probability `P`. Those. regression describes the dependence better than the spread of `y` around the mean.

Click on the chart
to add values to the table

Least square method. The least squares method means the determination of unknown parameters a, b, c, the accepted functional dependence

The least squares method refers to the determination of unknown parameters a, b, c,… accepted functional dependence

y = f(x,a,b,c,…),

which would provide a minimum of the mean square (variance) of the error

, (24)

where x i, y i is a set of pairs of numbers obtained from the experiment.

Since the condition for the extremum of a function of several variables is the condition that its partial derivatives are equal to zero, then the parameters a, b, c,… are determined from the system of equations:

; ; ; … (25)

It must be remembered that the least squares method is used to select parameters after the type of function y = f(x) defined

If, from theoretical considerations, no conclusions can be drawn about what the empirical formula should be, then one has to be guided by visual representations, primarily by graphical representations of the observed data.

In practice, they are most often limited to the following types of functions:

1) linear ;

2) quadratic a.

The essence of the least squares method is in finding the parameters of a trend model that best describes the tendency of development of any random phenomenon in time or space (a trend is a line that characterizes the tendency of this development). The task of the least squares method (LSM) comes down to finding not just some trend model, but to finding the best or optimal model. This model will be optimal if the sum of square deviations between the observed actual values and the corresponding calculated trend values is minimal (smallest):

where is the square deviation between the observed actual value

and the corresponding calculated trend value,

The actual (observed) value of the phenomenon being studied,

The calculated value of the trend model,

The number of observations of the phenomenon being studied.

MNC is used quite rarely on its own. As a rule, most often it is used only as a necessary technical technique in correlation studies. It should be remembered that the information basis of OLS can only be a reliable statistical series, and the number of observations should not be less than 4, otherwise the smoothing procedures of OLS may lose common sense.

The MNC toolkit boils down to the following procedures:

First procedure. It turns out whether there is any tendency at all to change the resultant attribute when the selected factor-argument changes, or in other words, is there a connection between “ at " And " X ».

Second procedure. It is determined which line (trajectory) can best describe or characterize this trend.

Third procedure.

Example. Let's say we have information about the average sunflower yield for the farm under study (Table 9.1).

Table 9.1

Observation number

Productivity, c/ha

Since the level of technology in sunflower production in our country has remained virtually unchanged over the past 10 years, it means that, apparently, fluctuations in yield during the analyzed period were very much dependent on fluctuations in weather and climatic conditions. Is this really true?

First OLS procedure. The hypothesis about the existence of a trend in sunflower yield changes depending on changes in weather and climatic conditions over the analyzed 10 years is tested.

In this example, for " y " it is advisable to take the sunflower yield, and for " x » – number of the observed year in the analyzed period. Testing the hypothesis about the existence of any relationship between " x " And " y "can be done in two ways: manually and using computer programs. Of course, with the availability of computer technology, this problem can be solved by itself. But in order to better understand the MNC tools, it is advisable to test the hypothesis about the existence of a relationship between “ x " And " y » manually, when only a pen and an ordinary calculator are at hand. In such cases, the hypothesis about the existence of a trend is best checked visually by the location of the graphical image of the analyzed series of dynamics - the correlation field:

The correlation field in our example is located around a slowly increasing line. This in itself indicates the existence of a certain trend in changes in sunflower yields. It is impossible to talk about the presence of any tendency only when the correlation field looks like a circle, a circle, a strictly vertical or strictly horizontal cloud, or consists of chaotically scattered points. In all other cases, the hypothesis about the existence of a relationship between “ x " And " y ", and continue research.

Second OLS procedure. It is determined which line (trajectory) can best describe or characterize the trend of changes in sunflower yield over the analyzed period.

If you have computer technology, the selection of the optimal trend occurs automatically. In “manual” processing, the selection of the optimal function is carried out, as a rule, visually - by the location of the correlation field. That is, based on the type of graph, the equation of the line that best fits the empirical trend (the actual trajectory) is selected.

As is known, in nature there is a huge variety of functional dependencies, so it is extremely difficult to visually analyze even a small part of them. Fortunately, in real economic practice, most relationships can be described quite accurately either by a parabola, or a hyperbola, or a straight line. In this regard, with the “manual” option of selecting the best function, you can limit yourself to only these three models.

		Hyperbola:

Second order parabola: :

It is easy to see that in our example, the trend in sunflower yield changes over the analyzed 10 years is best characterized by a straight line, so the regression equation will be the equation of a straight line.

Third procedure. The parameters of the regression equation characterizing this line are calculated, or in other words, an analytical formula is determined that describes the best trend model.

Finding the values of the parameters of the regression equation, in our case the parameters and , is the core of the OLS. This process comes down to solving a system of normal equations.

(9.2)

This system of equations can be solved quite easily by the Gauss method. Let us recall that as a result of the solution, in our example, the values of the parameters and are found. Thus, the found regression equation will have the following form:

Data analysis using the least squares method. Least squares method in Excel

Introduction

Minimum quadratic form

Laplace's equation with Dirichlet boundary condition

Poisson's equation

Encyclopedic YouTube

Subtitles

Story

The essence of the least squares method

Example - system of linear equations

OLS in regression analysis (data approximation)

OLS in the case of linear regression

The simplest special cases

The case of a polynomial model

Statistical properties of OLS estimators

Weighted OLS

The essence of the least squares method (LSM).

Deriving formulas for finding coefficients.

Error estimation of the least squares method.

Graphic illustration of the least squares (LS) method.

Developing a forecast using the least squares method. Example of problem solution

An example of using the least squares method to develop a forecast

Least square method

Other articles on this topic:

MNC program

Enter data

Data and approximation y = a + b x

Data and approximation y = k x

User's manual for the MNC online program.

Least squares method (LSM).

A Brief Theory of Least Squares for Linear Relationships

Finding the regression line parameters.

Estimation of errors of regression line coefficients

Least square method. The least squares method means the determination of unknown parameters a, b, c, the accepted functional dependence