home · Appliances · What is the coefficient of variation for? Determination of Variation Indices

What is the coefficient of variation for? Determination of Variation Indices

Variation- this is the acceptance by units of a population or groups of different, diverging from each other, meanings of a sign. Variation is the result of the influence of a combination of many factors on a unit. Synonyms for termination are the concepts of change (variability, variability’).

Variation- one of the most important categories of statistical science. Phenomena subject to variation lie in the field of study of statistical science, while unchanging, statistical, constant phenomena are not considered in statistics.

Almost all phenomena that have a natural origin are subject to variability (for example, chemical processes, variability of hereditary characteristics in each person, etc.). Phenomena, as well as a number of natural laws, can be immutable (for example, the minimum wage)

It is necessary to emphasize the importance of the study of variation in statistical science:

1 . Identification of variability in the dimensions of a phenomenon makes it possible to assess the degree of dependence of the phenomenon being studied on other factors, which in turn are subject to variability, or, in other words, to assess the degree of stability of the phenomenon to external influences.

2. Variation involves an assessment of the homogeneity of the phenomenon being studied, that is, a measure of typicality calculated for this phenomenon of average size.

Variation series is a sequence of different options written in ascending order along with the corresponding frequencies.

Depending on the type of attribute there are discrete and interval variation series. Depending on the volume of source data and the range of permissible values ​​of a one-dimensional quantitative characteristic, frequency distributions are also divided into discrete and interval. If there are a lot of different ones (more than 10-15), then these options are grouped by choosing a certain number of grouping intervals and thus the interval frequency distribution.

The first step in constructing an interval variation series is the choice of a certain principle, which is given as the basis for constructing an interval series. The choice of this principle depends on the degree of homogeneity of the aggregate under consideration. If the population is homogeneous, then the principle of equal intervals is used when constructing a series. In this case, the issue of homogeneity is resolved by a meaningful analysis of the phenomena being studied.

The variability of a phenomenon in statistical analysis is reflected by a number of characteristics called a system variation indicators. It includes:

absolute rates of variation:

1) scope of variation;

2) average values ​​(group and general):

- power average values;

- structural average values;


3) average linear deviation;

4) variances (group, intergroup and total) and standard deviation;

relative indicators of variation:

1) oscillation coefficient;

2) coefficients of variation (including linear);

3) coefficients of determination (empirical and theoretical).

Range of variation reflects the limits of variability of a characteristic or, in other words, the amplitude of variation. The range of variation is calculated as the difference between the maximum value of the sign (x) and the minimum value of the sign (x), i.e. according to the formula:

x - the greatest value of the attribute;

X. - the smallest value of the attribute.

Dispersion- the average square of deviations of individual values ​​of a characteristic from their average value:

For a variation series, variance is calculated using the following formula: (see table 2.)

It is often convenient for research to represent the measure of dispersion in the same units as the variants. Then instead of variance they use standard deviation, which is the square root of the variance, i.e. standard deviation is calculated using the formula: (see table 2)

The dispersion measures discussed above (range of variation, dispersion, standard deviation) are absolute values, It is not always possible to judge from them the degree of variability of a characteristic; in some tasks it is necessary to use relative dispersion indicators. This indicator is the coefficient of variation (V), which is the ratio of the standard deviation to the arithmetic mean, expressed as a percentage:

The coefficient of variation allows:

Compare the variation of the same trait in different groups of objects;

Identify the degree of difference in the same characteristic of the same group of objects at different times;

Compare the variation of different characteristics in the same groups of objects.

If the value of the coefficient of variation does not exceed 33, then the population under study is considered homogeneous .

Let's look at an example of the method for calculating the standard deviation and variance of a characteristic.

EXAMPLE 5. As a result of a random check of tea packaging, the following data was obtained:

Weight of a pack of tea, g. Number of packs of tea, pcs.

52 and above 3

Calculate the average mass of a pack of tea, the standard deviation, and the variance of the characteristic.

For calculations we use the formulas from Table 2.

It is advisable to format all calculations in the form of a table. To determine the middle of the interval

In each group, i.e. average value, it is necessary to move from interval to discrete series. The value of the interval is 1 (for example, 50 – 49 = 1). This means that the average value for the first group will be ((48 +49) /2 = 48.5; for the second and third groups, respectively, 49.5 and 50.5, etc. d.

Mass Number Middle X*f X – X (X – X) (X – X) * f

The same document provides rules for determining the coefficient of variation. Several methods for identifying NMCC have been developed: normative, tariff, design and estimate, cost. The method of comparable market prices is considered the highest priority. It is recommended to use it when determining the starting price. It involves comparing commercial offers provided by potential suppliers at the customer's request. To carry out such an analysis, the coefficient of variation is used. It is expressed as a percentage. The coefficient of variation is a measure of the relative dispersion of offered prices. It shows what proportion the average price spread occupies from the average price value. This indicator can take the following values:

  1. Less than 10%. In this case, the difference in prices is considered insignificant.
  2. From 10% to 20%. The spread is considered average.
  3. From 20% to 33%.

The coefficient of variation

To check the compliance of the studied values ​​with the law of normal distribution, the ratio of the asymmetry indicator to its error and the ratio of the kurtosis indicator to its error are used. Asymmetry index The asymmetry index (A) and its error (ma) are calculated using the following formulas: , where A is the asymmetry index, is the standard deviation, a is the arithmetic mean, n is the number of measurements of the parameter, ai is the measured value at the i-th step.


Kurtosis index The kurtosis index (E) and its error (me) are calculated using the following formulas: , where E is the kurtosis index, is the standard deviation, a is the arithmetic mean, n is the number of measurements of the parameter, ai is the measured value at the i-th step. If A< 0, то больше данных с меньшими значениями, чем среднеарифметическое.
If E< 0, то данные сконцентрированы около среднеарифметического значения.

Info

X – individual values, X̅ – arithmetic mean for the sample. Note. Excel has a special function for calculating variance.


It is worth noting that this calculation of variance has a drawback - it turns out to be biased, i.e. its mathematical expectation is not equal to the true value of the variance. Read more about this here. At the same time, not everything is so bad.
As the sample size increases, it still approaches its theoretical analogue, i.e. is asymptotically unbiased. Therefore, when working with large sample sizes, you can use the formula above.
It is useful to translate the language of signs into the language of words. It turns out that the variance is the average square of the deviations. That is, the average value is first calculated, then the difference between each original and average value is taken, squared, added, and then divided by the number of values ​​in the population.

What characterizes the coefficient of variation

To determine the dispersion of the normal law of error distribution in this case, use the formula: , where 2 is the dispersion, a is the arithmetic mean, n is the number of measurements of the parameter, ai is the measured value at the i-th step. Standard deviation Standard deviation shows the absolute deviation of the measured values ​​from the arithmetic mean.
In accordance with the formula for the measure of accuracy of a linear combination, the standard error of the arithmetic mean is determined by the formula: , where is the standard deviation, a is the arithmetic mean, n is the number of measurements of the parameter, ai is the measured value at the i-th step. Coefficient of variation The coefficient of variation characterizes the relative measure of deviation of the measured values ​​from the arithmetic mean: , where V is the coefficient of variation, is the standard deviation, a is the arithmetic mean.

Variation (statistics)

To complete the description, you need to understand what the difference is between the average height of each student and the average value. At the first stage, we calculate the dispersion parameter. Dispersion in statistics (denoted by σ2 (sigma squared)) is the ratio of the sum of squares of the difference between the arithmetic mean (μ) and the value of a series member (X) to the number of all members of the population (N).

In the form of a formula, this is calculated more clearly: We will present the values ​​that we obtain as a result of calculations using this formula as a square of the value (in our case, square centimeters). To characterize height in centimeters by square centimeters, you will agree, is absurd. Therefore, we can correct, or rather, simplify this expression and get the standard deviation formula and calculation, example: Thus, we have obtained the value of the standard deviation (or standard deviation) - the square root of the variance.

Coefficient of variation in statistics: calculation examples

The difference between an individual value and the average reflects the measure of deviation. It is squared so that all deviations become exclusively positive numbers and to avoid mutual destruction of positive and negative deviations when summing them up. Then, given the squared deviations, we simply calculate the arithmetic mean. Average - square - deviations. The deviations are squared and the average is calculated.

Attention

The solution lies in just three words. However, in its pure form, such as the arithmetic mean, or index, dispersion is not used. It is rather an auxiliary and intermediate indicator that is necessary for other types of statistical analysis.


It doesn't even have a normal unit of measurement. Judging by the formula, this is the square of the unit of measurement of the original data. Without a bottle, as they say, you can’t figure it out.

Statistical parameters

Four commercial price proposals were received: 2500 rubles, 2800 rubles, 2450 rubles and 2600 rubles. First of all, it is necessary to calculate the arithmetic mean value of the price. The next step is to calculate the standard deviation. All that remains is to calculate the coefficient of variation. The resulting coefficient value is less than 33%, therefore, all the collected data is suitable for calculating the starting price of the contract. The calculation of the NMCC and the coefficient of variation is drawn up in the form of a report, which becomes a mandatory part of the procurement documentation. The coefficient of variation is an important tool for assessing the accuracy of price quotes received from suppliers. Therefore, when drawing up documentation, customers need to take into account the rules for calculating this indicator and the features of its application.

What is the coefficient of variation for?

How to prove that a pattern obtained from studying experimental data is not the result of a coincidence or an experimenter’s error, that it is reliable? This is a question that new researchers face. Descriptive statistics provides tools to solve these problems. It has two large sections - a description of the data and their comparison in groups or in a row with each other. Table of contents:

  • Descriptive statistics indicators
  • Average
  • Standard deviation
  • The coefficient of variation
  • Calculations in Microsoft Excel 2016

One of the main statistical indicators of a sequence of numbers is the coefficient of variation. To find it, quite complex calculations are made. Microsoft Excel tools make it much easier for the user.

This indicator is the ratio of the standard deviation to the arithmetic mean. The result obtained is expressed as a percentage.

In Excel, there is no separate function for calculating this indicator, but there are formulas for calculating the standard deviation and the arithmetic mean of a series of numbers, namely, they are used to find the coefficient of variation.

Step 1: Calculate Standard Deviation

The standard deviation, or as it is otherwise called, the root mean square deviation, is the square root of . To calculate the standard deviation, use the function STANDARD DEVIATION. Starting with Excel 2010, it is divided, depending on whether the calculation is based on the population or the sample, into two separate options: STDEV.G And STDEV.V.

The syntax for these functions looks like this:

STANDARDEVAL(Number1,Number2,…)
= STANDARD DEVIATION.G(Number1;Number2;…)
= STANDARDEV.B(Number1;Number2;…)


Step 2: Calculate the arithmetic mean

The arithmetic mean is the ratio of the total sum of all values ​​in a number series to their number. There is also a separate function to calculate this indicator - AVERAGE. Let's calculate its value using a specific example.


Step 3: Finding the Coefficient of Variation

Now we have all the necessary data to directly calculate the coefficient of variation itself.


Thus, we calculated the coefficient of variation, referring to cells in which the standard deviation and arithmetic mean had already been calculated. But you can do it a little differently, without calculating these values ​​separately.


There is a conditional distinction. It is believed that if the coefficient of variation is less than 33%, then the set of numbers is homogeneous. Otherwise, it is usually characterized as heterogeneous.

As you can see, the Excel program allows you to significantly simplify the calculation of such a complex statistical calculation as finding the coefficient of variation. Unfortunately, the application does not yet have a function that would calculate this indicator in one action, but using the operators STANDARD DEVIATION And AVERAGE this task is greatly simplified. Thus, even a person who does not have a high level of knowledge related to statistical patterns can perform it in Excel.

CALCULATION OF VARIATION INDICATORS

PRACTICAL WORK 3

Goal of the work: obtaining practical skills in calculating various indicators (measures) of variation depending on the objectives set by the study.

Work order:

1. Determine the type and form (simple or weighted) of variation indicators.

3. Formulate conclusions.

1. Determination of the type and form of variation indicators.

Variation indicators are divided into two groups: absolute and relative. The absolute ones include: range of variation, quartile deviation, average linear deviation, dispersion and standard deviation. Relative indicators are coefficients of oscillation, variation, relative linear deviation, relative quartile variation, etc.

Range of variation (R) is the simplest measure of variation of a trait and is determined by the following formula:

where is the highest value of the varying characteristic;

– the smallest value of the varying characteristic.

Quartile deviation (Q)– used to characterize the variation of a characteristic in the aggregate. Can be used instead of range of variation to avoid the disadvantages associated with using extreme values.

where and are the first and third quartiles of the distribution, respectively.

Quartiles– these are the values ​​of the characteristic in the ranked series of the distribution, selected in such a way that 25% of the population units will be less in value; 25% of the units will be contained between and ; 25% of the units will be contained between and , and the remaining 25% exceed .

Quartiles 1 and 3 are determined by the formulas:

,

Where is the lower limit of the interval in which the first quartile is located;

– the sum of the accumulated frequencies of intervals preceding the interval in which the first quartile is located;

– frequency of the interval in which the first quartile is located.

where Me is the median of the series;

,

The symbols are the same as for quantities.

In symmetric or moderately asymmetric distributions Q»2/3s. Since the quartile deviation is not affected by the deviations of all values ​​of the attribute, its use should be limited to cases where determining the standard deviation is difficult or impossible.

Average linear deviation () represents the average value of the absolute deviations of the attribute variants from their average. It can be calculated using the arithmetic mean formula, both unweighted and weighted, depending on the absence or presence of frequencies in the distribution series.



Unweighted average linear deviation,

- weighted average linear deviation.

variance()– the average square of deviations of individual values ​​of a characteristic from their average value. The variance is calculated using the simple unweighted and weighted formulas.

- unweighted,

- weighted.

Standard deviation (s)– the most common indicator of variation, is the square root of the variance value.

The range of variation, quartile deviation, average linear and square deviations are named quantities and have the dimension of the characteristic being averaged. Dispersion has no unit of measurement.

For the purpose of comparing the variability of different characteristics in the same population or when comparing the variability of the same characteristic in several populations, relative indicators of variation are calculated. The basis for comparison is the arithmetic mean. Most often, relative indicators are expressed as percentages and characterize not only a comparative assessment of variation, but also characterize the homogeneity of the population.

Oscillation coefficient(relative range of variation) is calculated using the formula:

,

Linear coefficient of variation(relative linear deviation):

Relative quartile variation index:

or

The coefficient of variation:

,

The most commonly used indicator of relative variability in statistics is the coefficient of variation. It is used not only for a comparative assessment of variation, but also as a characteristic of the homogeneity of the population. The greater the coefficient of variation, the greater the spread of attribute values ​​around the average, the greater the heterogeneity of the population. There is a scale for determining the degree of homogeneity of a population depending on the values ​​of the coefficient of variation (17; P.61).

To obtain an approximate idea of ​​the shape of the distribution, distribution graphs (polygon and histogram) are constructed.

In the practice of statistical research one encounters a wide variety of distributions. When studying homogeneous populations, we usually deal with single-vertex distributions. Multivertex indicates the heterogeneity of the population being studied; the appearance of two or more vertices indicates the need to regroup the data in order to identify more homogeneous groups. Determining the general nature of the distribution involves assessing the degree of its homogeneity, as well as calculating indicators of asymmetry and kurtosis. Symmetrical is a distribution in which the frequencies of any two options, equally spaced on both sides of the distribution center, are equal to each other. For symmetric distributions, the arithmetic mean, mode and median are equal. In this regard, the simplest indicator asymmetry is based on the ratio of indicators of the distribution center: the greater the difference between the means, the greater the asymmetry of the series.

To characterize the asymmetry in the central part of the distribution, that is, the bulk of units, or for a comparative analysis of the degree of asymmetry of several distributions, the relative asymmetry index of K. Pearson is calculated:

The value of the As indicator can be positive and negative. A positive value of the indicator indicates the presence of right-sided asymmetry (the right branch relative to the maximum ordinate is more elongated than the left). In the case of right-sided asymmetry, there is a relationship between the indicators of the distribution center: . A negative sign of the asymmetry index indicates the presence of left-sided asymmetry (Fig. 1). In this case, there is a relationship between the indicators of the distribution center: .



Rice. 1. Distribution:

1 – with left-sided asymmetry; 2 – with right-sided asymmetry.

Another indicator, proposed by the Swedish mathematician Lindbergh, is calculated using the formula:

where P is the percentage of those characteristic values ​​that exceed the arithmetic mean in value.

The most accurate and widespread indicator is based on the determination of the third-order central moment (in a symmetric distribution its value is zero):

where is the third-order central moment:

σ – standard deviation.

The use of this indicator makes it possible not only to determine the magnitude of asymmetry, but also to answer the question about the presence or absence of asymmetry in the distribution of a characteristic in the general population. An assessment of the degree of significance of this indicator is given using the mean square error, which depends on the volume of observations n and is calculated by the formula:

.

If the ratio is , the asymmetry is significant and the distribution of the trait in the population is not symmetrical. If the ratio , asymmetry is insignificant, its presence can be explained by the influence of various random circumstances.

For symmetric distributions, the indicator is calculated excess(sharpness). Lindbergh proposed the following indicator for assessing kurtosis:

,

where P is the proportion (%) of the number of options lying in the interval equal to half the standard deviation in one direction or another from the arithmetic mean.

The most accurate indicator is using the fourth order central moment:

where is the central moment of the fourth moment;

- for ungrouped data;

- for grouped data.

Figure 2 shows two distributions: one is peaked (the kurtosis value is positive), the second is flat-topped (the kurtosis value is negative). Kurtosis is the extent of the top of the empirical distribution moving up or down from the top of the normal distribution curve. In a normal distribution the ratio is .



Rice. 2. Distribution:

1.4 – normal; 2 – pointed; 3 – flat top

The mean square error of kurtosis is calculated using the formula:

,

where n is the number of observations.

If , then the kurtosis is significant, if , then it is not significant.

Assessing the significance of the asymmetry and kurtosis indicators allows us to conclude whether this empirical study can be classified as a type of normal distribution curve.

2. Let's consider the methodology for calculating variation indices.

In statistics, the variation in the values ​​of a particular indicator in the aggregate is understood as the difference in its levels in certain units of the analyzed composition in the same period or moment of the study. In the case when an analysis of differences in the values ​​of an indicator is carried out for the same object, for the same unit of the population at different periods or points in time, then this will no longer be called variation, but fluctuations or changes during a certain period.

Posted on www.site

To study such fluctuations, we use our own methods of analysis, which differ from the methods of variation analysis. An objective factor in the occurrence of the phenomenon of variation is the difference in the conditions of activity of certain objects under study in the population. For example, the work of a trading enterprise is influenced by the level of competition, taxes, the use of advanced technologies in its activities, the condition of equipment, etc. Fluctuation is characteristic of almost all natural phenomena and facets of social life. However, there are also non-variable indicators that are formed when certain phenomena are recorded in legal acts. For example, the number of general directors of an enterprise cannot vary; according to the law, there must be one. Such non-variable objects, as a rule, are not the subject or object of statistical research. In our life, the fluctuation of signs is an important factor influencing it. For example, changing the range of standard sizes of parts allows you to create an optimal assortment, but at the same time, a high level of variation within one standard size indicates a high level of defects and the need to implement appropriate measures. Significant levels of variation in turnover or prices may indicate market monopolization or poor inventory management and require appropriate action, etc. The above allows us to assert that in public life, which from the point of view of statistics is a mass aggregate, there is objectively variability in various characteristics and elements, which dictates the relevance of studying this phenomenon using special indicators to formulate optimal methods for managing it. The coefficient of variation is one such indicator. Moreover, it belongs to the group of relative indicators of variation. The coefficient under consideration is a relative indicator characterizing the ratio of the standard deviation to the average value of the characteristic being studied, and is usually expressed as a percentage. This criterion reflects the relationship between the level of influence of factors that lead to variability and the general conditions of all elements of the population that give rise to the typical value of the attribute - its average value. The coefficient of variation is used to study the degree of variability of various characteristics of the same population and variability in different populations that have different average values.