home · Networks · Standard deviation. Calculation of Variation Indices

Standard deviation. Calculation of Variation Indices

We have to deal with the calculation of such values ​​as dispersion, standard deviation and, of course, coefficient of variation. It is the calculation of the latter that deserves special attention. It is very important that every beginner who is just starting to work with a spreadsheet editor can quickly calculate the relative limit of the spread of values.

What is the coefficient of variation and why is it needed?

So, it seems to me that it would be useful to take a short theoretical excursion and understand the nature of the coefficient of variation. This indicator is necessary to reflect the range of data relative to the average value. In other words, it shows the ratio of the standard deviation to the mean. The coefficient of variation is usually measured in percentage terms and used to display the homogeneity of a time series.

The coefficient of variation will become an indispensable assistant when you need to make a forecast based on data from a given sample. This indicator will highlight the main series of values ​​that will be most useful for subsequent forecasting, and will also clear the sample of unimportant factors. So, if you see that the coefficient value is 0%, then confidently declare that the series is homogeneous, which means that all the values ​​in it are equal to one another. If the coefficient of variation takes a value exceeding 33%, this indicates that you are dealing with a heterogeneous series in which individual values ​​differ significantly from the sample average.

How to find the standard deviation?

Since to calculate the variation index in Excel we need to use the standard deviation, it would be quite appropriate to find out how we can calculate this parameter.

From the school algebra course we know that the standard deviation is the square root extracted from the variance, that is, this indicator determines the degree of deviation of a particular indicator of the overall sample from its average value. With its help, we can measure the absolute measure of fluctuation of the characteristic being studied and clearly interpret it.

Calculating the coefficient in Excel

Unfortunately, Excel does not have a standard formula that would allow you to calculate the variation index automatically. But this does not mean that you have to do the calculations in your head. The absence of a template in the “Formula Bar” in no way detracts from Excel’s abilities, so you can quite easily force the program to perform the calculation you need by entering the appropriate command manually.

In order to calculate the variation index in Excel, you need to remember your high school math course and divide the standard deviation by the sample mean. That is, in fact, the formula looks like this - STANDARDEVAL(specified data range)/AVERAGE(specified data range). You must enter this formula into the Excel cell in which you want to get the calculation you need.

Do not forget that since the coefficient is expressed as a percentage, the cell with the formula will need to be formatted accordingly. You can do this as follows:

  1. Open the Home tab.
  2. Find the “Cell Format” category in it and select the required option.

Alternatively, you can set the percentage format for the cell by right-clicking on the activated table cell. In the context menu that appears, similar to the above algorithm, you need to select the “Cell Format” category and set the required value.

Select Percentage and, if necessary, enter the number of decimal places

Perhaps the above algorithm may seem complicated to some. In fact, calculating the coefficient is as simple as adding two natural numbers. Once you complete this task in Excel, you will never return to tedious, complex solutions in a notebook.

Still can't make a qualitative comparison of the degree of data scatter? Confused by the size of the sample? Then get down to business right now and master in practice all the theoretical material that was presented above! Let statistical analysis and forecast development no longer make you feel afraid and negative. Save your energy and time with

CALCULATION OF VARIATION INDICATORS

PRACTICAL WORK 3

Goal of the work: obtaining practical skills in calculating various indicators (measures) of variation depending on the objectives set by the study.

Work order:

1. Determine the type and form (simple or weighted) of variation indicators.

3. Formulate conclusions.

1. Determination of the type and form of variation indicators.

Variation indicators are divided into two groups: absolute and relative. The absolute ones include: range of variation, quartile deviation, average linear deviation, dispersion and standard deviation. Relative indicators are coefficients of oscillation, variation, relative linear deviation, relative quartile variation, etc.

Range of variation (R) is the simplest measure of variation of a trait and is determined by the following formula:

where is the highest value of the varying characteristic;

– the smallest value of the varying characteristic.

Quartile deviation (Q)– used to characterize the variation of a characteristic in the aggregate. Can be used instead of range of variation to avoid the disadvantages associated with using extreme values.

where and are the first and third quartiles of the distribution, respectively.

Quartiles– these are the values ​​of the characteristic in the ranked series of the distribution, selected in such a way that 25% of the population units will be less in value; 25% of the units will be contained between and ; 25% of the units will be contained between and , and the remaining 25% exceed .

Quartiles 1 and 3 are determined by the formulas:

,

Where is the lower limit of the interval in which the first quartile is located;

– the sum of the accumulated frequencies of intervals preceding the interval in which the first quartile is located;

– frequency of the interval in which the first quartile is located.

where Me is the median of the series;

,

The symbols are the same as for quantities.

In symmetric or moderately asymmetric distributions Q»2/3s. Since the quartile deviation is not affected by the deviations of all values ​​of the attribute, its use should be limited to cases where determining the standard deviation is difficult or impossible.

Average linear deviation () represents the average value of the absolute deviations of the attribute variants from their average. It can be calculated using the arithmetic mean formula, both unweighted and weighted, depending on the absence or presence of frequencies in the distribution series.



Unweighted average linear deviation,

- weighted average linear deviation.

variance()– the average square of deviations of individual values ​​of a characteristic from their average value. The variance is calculated using the simple unweighted and weighted formulas.

- unweighted,

- weighted.

Standard deviation (s)– the most common indicator of variation, is the square root of the variance value.

The range of variation, quartile deviation, average linear and square deviations are named quantities and have the dimension of the characteristic being averaged. Dispersion has no unit of measurement.

For the purpose of comparing the variability of different characteristics in the same population or when comparing the variability of the same characteristic in several populations, relative indicators of variation are calculated. The basis for comparison is the arithmetic mean. Most often, relative indicators are expressed as percentages and characterize not only a comparative assessment of variation, but also characterize the homogeneity of the population.

Oscillation coefficient(relative range of variation) is calculated using the formula:

,

Linear coefficient of variation(relative linear deviation):

Relative quartile variation index:

or

The coefficient of variation:

,

The most commonly used indicator of relative variability in statistics is the coefficient of variation. It is used not only for a comparative assessment of variation, but also as a characteristic of the homogeneity of the population. The greater the coefficient of variation, the greater the spread of attribute values ​​around the average, the greater the heterogeneity of the population. There is a scale for determining the degree of homogeneity of a population depending on the values ​​of the coefficient of variation (17; P.61).

To obtain an approximate idea of ​​the shape of the distribution, distribution graphs (polygon and histogram) are constructed.

In the practice of statistical research one encounters a wide variety of distributions. When studying homogeneous populations, we usually deal with single-vertex distributions. Multivertex indicates the heterogeneity of the population being studied; the appearance of two or more vertices indicates the need to regroup the data in order to identify more homogeneous groups. Determining the general nature of the distribution involves assessing the degree of its homogeneity, as well as calculating indicators of asymmetry and kurtosis. Symmetrical is a distribution in which the frequencies of any two options, equally spaced on both sides of the distribution center, are equal to each other. For symmetric distributions, the arithmetic mean, mode and median are equal. In this regard, the simplest indicator asymmetry is based on the ratio of indicators of the distribution center: the greater the difference between the means, the greater the asymmetry of the series.

To characterize the asymmetry in the central part of the distribution, that is, the bulk of units, or for a comparative analysis of the degree of asymmetry of several distributions, the relative asymmetry index of K. Pearson is calculated:

The value of the As indicator can be positive and negative. A positive value of the indicator indicates the presence of right-sided asymmetry (the right branch relative to the maximum ordinate is more elongated than the left). In the case of right-sided asymmetry, there is a relationship between the indicators of the distribution center: . A negative sign of the asymmetry index indicates the presence of left-sided asymmetry (Fig. 1). In this case, there is a relationship between the indicators of the distribution center: .



Rice. 1. Distribution:

1 – with left-sided asymmetry; 2 – with right-sided asymmetry.

Another indicator, proposed by the Swedish mathematician Lindbergh, is calculated using the formula:

where P is the percentage of those characteristic values ​​that exceed the arithmetic mean in value.

The most accurate and widespread indicator is based on the determination of the third-order central moment (in a symmetric distribution its value is zero):

where is the third-order central moment:

σ – standard deviation.

The use of this indicator makes it possible not only to determine the magnitude of asymmetry, but also to answer the question about the presence or absence of asymmetry in the distribution of a characteristic in the general population. An assessment of the degree of significance of this indicator is given using the mean square error, which depends on the volume of observations n and is calculated by the formula:

.

If the ratio is , the asymmetry is significant and the distribution of the trait in the population is not symmetrical. If the ratio , asymmetry is insignificant, its presence can be explained by the influence of various random circumstances.

For symmetric distributions, the indicator is calculated excess(sharpness). Lindbergh proposed the following indicator for assessing kurtosis:

,

where P is the proportion (%) of the number of options lying in the interval equal to half the standard deviation in one direction or another from the arithmetic mean.

The most accurate indicator is using the fourth order central moment:

where is the central moment of the fourth moment;

- for ungrouped data;

- for grouped data.

Figure 2 shows two distributions: one is peaked (the kurtosis value is positive), the second is flat-topped (the kurtosis value is negative). Kurtosis is the extent of the top of the empirical distribution moving up or down from the top of the normal distribution curve. In a normal distribution the ratio is .



Rice. 2. Distribution:

1.4 – normal; 2 – pointed; 3 – flat top

The mean square error of kurtosis is calculated using the formula:

,

where n is the number of observations.

If , then the kurtosis is significant, if , then it is not significant.

Assessing the significance of the asymmetry and kurtosis indicators allows us to conclude whether this empirical study can be classified as a type of normal distribution curve.

2. Let us consider the methodology for calculating variation indices.

Variation is measured using relative values ​​called coefficients of variation, defined as the ratio of the average deviation to the average value. The coefficient of variation is used not only for a comparative assessment of the variation of population units, but also as a characteristic of the homogeneity of the population. The values ​​of the coefficient of variation vary from 0 to 100% and the closer it is to zero, the more typical the found average value is for the statistical population being studied, and therefore the better the statistical data are selected. The population is considered quantitatively homogeneous if the coefficient of variation does not exceed 33% (for distributions close to normal). The following relative indicators of variation are distinguished:

The coefficient of variation:

where is the standard deviation, is the arithmetic mean.

Linear coefficient of variation:

where is the average linear deviation.

Oscillation coefficient:

where is the range of variation.

Let's calculate the coefficients of variation for a group of organizations for freight turnover of road transport (Table 5.1) using formulas 5.9, 5.10, 5.11

The coefficient of variation will be equal to: , which exceeds 33%, therefore, the population is heterogeneous.

Let's calculate the linear coefficient of variation: . Consequently, the share of the average value of absolute deviations of organizations from the average value is 30.7%

Let's find the oscillation coefficient: . It follows from this that the difference between the maximum and minimum values ​​of organizations exceeds the average value by almost 1.078 times.

Let us determine the coefficients of variation for the grouping of residential areas (on average per inhabitant) (Table 5.3).

Let's calculate the coefficient of variation using formula (5.9):

. This means that the coefficient of variation does not exceed 33%, therefore, the population is homogeneous.

Let's calculate the linear coefficient of variation using formula (5.10):

. This means that the share of the average value of the absolute deviations of the areas of residential premises from the average value is 5.56%.

Let us find the oscillation coefficient using formula (5.11):

. The difference between the maximum and minimum values ​​of residential areas does not exceed the average value.

CALCULATION AND CONSTRUCTION OF STRUCTURAL CHARACTERISTICS OF VARIATION SERIES

A little more on topic

Political economy of D. Ricardo as an ideologist of the industrial revolution
In the last third of the 18th century. The industrial revolution (industrial revolution) began in England. For several decades in the light industry, one invention followed another. The entire production process in this industry was transferred to a machine basis. Gradually, the revolution spread to other sectors of light and then heavy industry. A widespread re...

In statistics, the variation in the values ​​of a particular indicator in the aggregate is understood as the difference in its levels in certain units of the analyzed composition in the same period or moment of the study. In the case when an analysis of differences in the values ​​of an indicator is carried out for the same object, for the same unit of the population at different periods or points in time, then this will no longer be called variation, but fluctuations or changes during a certain period.

Posted on www.site

To study such fluctuations, we use our own methods of analysis, which differ from the methods of variation analysis. An objective factor in the occurrence of the phenomenon of variation is the difference in the conditions of activity of certain objects under study in the population. For example, the work of a trading enterprise is influenced by the level of competition, taxes, the use of advanced technologies in its activities, the condition of equipment, etc. Fluctuation is characteristic of almost all natural phenomena and facets of social life. However, there are also non-variable indicators that are formed when certain phenomena are recorded in legal acts. For example, the number of general directors of an enterprise cannot vary; according to the law, there must be one. Such non-variable objects, as a rule, are not the subject or object of statistical research. In our life, the fluctuation of signs is an important factor influencing it. For example, changing the range of standard sizes of parts allows you to create an optimal assortment, but at the same time, a high level of variation within one standard size indicates a high level of defects and the need to implement appropriate measures. Significant levels of variation in turnover or prices may indicate market monopolization or poor inventory management and require appropriate action, etc. The above allows us to assert that in public life, which from the point of view of statistics is a mass aggregate, there is objectively variability in various characteristics and elements, which dictates the relevance of studying this phenomenon using special indicators to formulate optimal methods for managing it. The coefficient of variation is one such indicator. Moreover, it belongs to the group of relative indicators of variation. The coefficient under consideration is a relative indicator characterizing the ratio of the standard deviation to the average value of the characteristic being studied, and is usually expressed as a percentage. This criterion reflects the relationship between the level of influence of factors that lead to variability and the general conditions of all elements of the population that give rise to the typical value of the attribute - its average value. The coefficient of variation is used to study the degree of variability of various characteristics of the same population and variability in different populations that have different average values.

Many people are faced with the variability of the characteristic being studied in individual units of the population, its fluctuation relative to a certain value, that is, its variation. This is something that should be taken into account in order to obtain the most reliable information about the progress of a particular scientific research.

Most researchers, when determining the interval of change in the value of a particular parameter, most often resort to absolute ones. Among the latter, the coefficient of variation is most widely used, which, if the value under study is characterized by a normal distribution, is a criterion for the homogeneity of the population. This indicator allows you to determine what degree of scattering the values ​​of the parameter under study will have, regardless of the scale and unit of measurement.

The coefficient of variation can be calculated by dividing by the arithmetic mean of the variable, expressed as a percentage. The result of this calculation can fall in the range from zero to infinity, increasing as the variation of the trait increases. If the obtained value is less than 33.3%, the variation of the trait is weak. If more - strong. In the latter case, the data set under study is heterogeneous, it is considered atypical, and therefore cannot be a generalizing indicator. Therefore, for this population it is worth using other indicators.

It is worth noting that the coefficient of variation not only characterizes the homogeneity of a certain population, but is also used as a comparative assessment of it. For example, it is used if fluctuations of a particular characteristic are necessary in populations for which the calculated average value is different. In this case, the scattering of the data obtained does not allow an objective assessment of the acquired meaning. The coefficient of variation characterizes the relative variability of a variable, and therefore can be a relative measure of fluctuations in the value of the parameter being studied.

However, there are some limitations here. In particular, it is possible to assess the degree of fluctuation in parameter values ​​only for a specific characteristic and if the population has a certain composition. Moreover, the equality of these indicators may indicate both strong and weak variation. This is the case if the signs are different or the studies are conducted on different populations. This result is formed under the influence of very objective reasons, and this should be taken into account when processing the obtained experimental data.

The coefficient of variation is widely used in various fields of science and technology. In particular, it is actively used when assessing fluctuations in parameters in economics and sociology. At the same time, the use of the coefficient becomes impossible if it is necessary to assess the variability of variables that can change their sign to the opposite one. After all, then, as a result of the calculations, incorrect values ​​of this indicator will be obtained: either it will be very small or will have a negative sign. In the latter case, it is worth checking the correctness of the calculations performed.

Thus, we can say that the coefficient of variation is a parameter that will allow you to evaluate the degree of dispersion and relative variability of the average value. The use of this indicator allows us to identify the most significant factors, focusing on which will allow us to achieve our goals and solve the necessary problems.