home · Appliances · Formula for the variance of a random variable. Variance and standard deviation

Formula for the variance of a random variable. Variance and standard deviation

The variance of a random variable is a measure of the spread of the values ​​of this variable. Low variance means that the values ​​are clustered close together. Large dispersion indicates a strong spread of values. The concept of variance of a random variable is used in statistics. For example, if you compare the variance of two values ​​(such as between male and female patients), you can test the significance of a variable. Variance is also used when building statistical models, since low variance can be a sign that you are overfitting the values.

Steps

Calculating sample variance

  1. Record the sample values. In most cases, statisticians only have access to samples of specific populations. For example, as a rule, statisticians do not analyze the cost of maintaining the totality of all cars in Russia - they analyze a random sample of several thousand cars. Such a sample will help determine the average cost of a car, but, most likely, the resulting value will be far from the real one.

    • For example, let's analyze the number of buns sold in a cafe over 6 days, taken in random order. The sample looks like this: 17, 15, 23, 7, 9, 13. This is a sample, not a population, because we do not have data on buns sold for each day the cafe is open.
    • If you are given a population rather than a sample of values, continue to the next section.
  2. Write down a formula to calculate sample variance. Dispersion is a measure of the spread of values ​​of a certain quantity. The closer the variance value is to zero, the closer the values ​​are grouped together. When working with a sample of values, use the following formula to calculate variance:

    • s 2 (\displaystyle s^(2)) = ∑[(x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))] / (n - 1)
    • s 2 (\displaystyle s^(2))– this is dispersion. Dispersion is measured in square units.
    • x i (\displaystyle x_(i))– each value in the sample.
    • x i (\displaystyle x_(i)) you need to subtract x̅, square it, and then add the results.
    • x̅ – sample mean (sample mean).
    • n – number of values ​​in the sample.
  3. Calculate the sample mean. It is denoted as x̅. The sample mean is calculated as a simple arithmetic mean: add up all the values ​​in the sample, and then divide the result by the number of values ​​in the sample.

    • In our example, add the values ​​in the sample: 15 + 17 + 23 + 7 + 9 + 13 = 84
      Now divide the result by the number of values ​​in the sample (in our example there are 6): 84 ÷ 6 = 14.
      Sample mean x̅ = 14.
    • The sample mean is the central value around which the values ​​in the sample are distributed. If the values ​​in the sample cluster around the sample mean, then the variance is small; otherwise the variance is large.
  4. Subtract the sample mean from each value in the sample. Now calculate the difference x i (\displaystyle x_(i))- x̅, where x i (\displaystyle x_(i))– each value in the sample. Each result obtained indicates the degree of deviation of a particular value from the sample mean, that is, how far this value is from the sample mean.

    • In our example:
      x 1 (\displaystyle x_(1))- x = 17 - 14 = 3
      x 2 (\displaystyle x_(2))- x̅ = 15 - 14 = 1
      x 3 (\displaystyle x_(3))- x = 23 - 14 = 9
      x 4 (\displaystyle x_(4))- x̅ = 7 - 14 = -7
      x 5 (\displaystyle x_(5))- x̅ = 9 - 14 = -5
      x 6 (\displaystyle x_(6))- x̅ = 13 - 14 = -1
    • The correctness of the results obtained is easy to check, since their sum should be equal to zero. This is related to the definition of the average, since negative values ​​(distances from the average to smaller values) are completely offset by positive values ​​(distances from the average to larger values).
  5. As noted above, the sum of the differences x i (\displaystyle x_(i))- x̅ must be equal to zero. This means that the average variance is always zero, which does not give any idea about the spread of values ​​of a certain quantity. To solve this problem, square each difference x i (\displaystyle x_(i))- x̅. This will result in you only getting positive numbers, which will never add up to 0.

    • In our example:
      (x 1 (\displaystyle x_(1))- x̅) 2 = 3 2 = 9 (\displaystyle ^(2)=3^(2)=9)
      (x 2 (\displaystyle (x_(2))- x̅) 2 = 1 2 = 1 (\displaystyle ^(2)=1^(2)=1)
      9 2 = 81
      (-7) 2 = 49
      (-5) 2 = 25
      (-1) 2 = 1
    • You found the square of the difference - x̅) 2 (\displaystyle ^(2)) for each value in the sample.
  6. Calculate the sum of the squares of the differences. That is, find that part of the formula that is written like this: ∑[( x i (\displaystyle x_(i))- x̅) 2 (\displaystyle ^(2))]. Here the sign Σ means the sum of squared differences for each value x i (\displaystyle x_(i)) in the sample. You have already found the squared differences (x i (\displaystyle (x_(i))- x̅) 2 (\displaystyle ^(2)) for each value x i (\displaystyle x_(i)) in the sample; now just add these squares.

    • In our example: 9 + 1 + 81 + 49 + 25 + 1 = 166 .
  7. Divide the result by n - 1, where n is the number of values ​​in the sample. Some time ago, to calculate sample variance, statisticians simply divided the result by n; in this case you will get the mean of the squared variance, which is ideal for describing the variance of a given sample. But remember that any sample is only a small part of the population of values. If you take another sample and perform the same calculations, you will get a different result. As it turns out, dividing by n - 1 (rather than just n) gives a more accurate estimate of the population variance, which is what you're interested in. Division by n – 1 has become common, so it is included in the formula for calculating sample variance.

    • In our example, the sample includes 6 values, that is, n = 6.
      Sample variance = s 2 = 166 6 − 1 = (\displaystyle s^(2)=(\frac (166)(6-1))=) 33,2
  8. The difference between variance and standard deviation. Note that the formula contains an exponent, so the dispersion is measured in square units of the value being analyzed. Sometimes such a magnitude is quite difficult to operate; in such cases, use the standard deviation, which is equal to the square root of the variance. That is why the sample variance is denoted as s 2 (\displaystyle s^(2)), and the standard deviation of the sample is as s (\displaystyle s).

    • In our example, the standard deviation of the sample is: s = √33.2 = 5.76.

    Calculating Population Variance

    1. Analyze some set of values. The set includes all values ​​of the quantity under consideration. For example, if you are studying the age of residents of the Leningrad region, then the totality includes the age of all residents of this region. When working with a population, it is recommended to create a table and enter the population values ​​into it. Consider the following example:

      • In a certain room there are 6 aquariums. Each aquarium contains the following number of fish:
        x 1 = 5 (\displaystyle x_(1)=5)
        x 2 = 5 (\displaystyle x_(2)=5)
        x 3 = 8 (\displaystyle x_(3)=8)
        x 4 = 12 (\displaystyle x_(4)=12)
        x 5 = 15 (\displaystyle x_(5)=15)
        x 6 = 18 (\displaystyle x_(6)=18)
    2. Write down a formula to calculate the population variance. Since the population includes all values ​​of a certain quantity, the formula below allows you to obtain the exact value of the population variance. To distinguish population variance from sample variance (which is only an estimate), statisticians use various variables:

      • σ 2 (\displaystyle ^(2)) = (∑(x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)))/n
      • σ 2 (\displaystyle ^(2))– population dispersion (read as “sigma squared”). Dispersion is measured in square units.
      • x i (\displaystyle x_(i))– each value in its entirety.
      • Σ – sum sign. That is, from each value x i (\displaystyle x_(i)) you need to subtract μ, square it, and then add the results.
      • μ – population mean.
      • n – number of values ​​in the population.
    3. Calculate the population mean. When working with a population, its mean is denoted as μ (mu). The population mean is calculated as a simple arithmetic mean: add up all the values ​​in the population, and then divide the result by the number of values ​​in the population.

      • Keep in mind that averages are not always calculated as the arithmetic mean.
      • In our example, the population mean: μ = 5 + 5 + 8 + 12 + 15 + 18 6 (\displaystyle (\frac (5+5+8+12+15+18)(6))) = 10,5
    4. Subtract the population mean from each value in the population. The closer the difference value is to zero, the closer the specific value is to the population mean. Find the difference between each value in the population and its mean, and you will get a first idea of ​​the distribution of values.

      • In our example:
        x 1 (\displaystyle x_(1))- μ = 5 - 10.5 = -5.5
        x 2 (\displaystyle x_(2))- μ = 5 - 10.5 = -5.5
        x 3 (\displaystyle x_(3))- μ = 8 - 10.5 = -2.5
        x 4 (\displaystyle x_(4))- μ = 12 - 10.5 = 1.5
        x 5 (\displaystyle x_(5))- μ = 15 - 10.5 = 4.5
        x 6 (\displaystyle x_(6))- μ = 18 - 10.5 = 7.5
    5. Square each result obtained. The difference values ​​will be both positive and negative; If these values ​​are plotted on a number line, they will lie to the right and left of the population mean. This is not good for calculating variance because positive and negative numbers cancel each other out. So square each difference to get exclusively positive numbers.

      • In our example:
        (x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2)) for each population value (from i = 1 to i = 6):
        (-5,5)2 (\displaystyle ^(2)) = 30,25
        (-5,5)2 (\displaystyle ^(2)), Where x n (\displaystyle x_(n))– the last value in the population.
      • To calculate the average value of the results obtained, you need to find their sum and divide it by n:(( x 1 (\displaystyle x_(1)) - μ) 2 (\displaystyle ^(2)) + (x 2 (\displaystyle x_(2)) - μ) 2 (\displaystyle ^(2)) + ... + (x n (\displaystyle x_(n)) - μ) 2 (\displaystyle ^(2)))/n
      • Now let's write down the above explanation using variables: (∑( x i (\displaystyle x_(i)) - μ) 2 (\displaystyle ^(2))) / n and get a formula for calculating the population variance.

Variation range (or range of variation) - this is the difference between the maximum and minimum values ​​of the characteristic:

In our example, the range of variation in shift output of workers is: in the first brigade R = 105-95 = 10 children, in the second brigade R = 125-75 = 50 children. (5 times more). This suggests that the output of the 1st brigade is more “stable”, but the second brigade has more reserves for increasing output, because If all workers reach the maximum output for this brigade, it can produce 3 * 125 = 375 parts, and in the 1st brigade only 105 * 3 = 315 parts.
If the extreme values ​​of a characteristic are not typical for the population, then quartile or decile ranges are used. The quartile range RQ= Q3-Q1 covers 50% of the population volume, the first decile range RD1 = D9-D1 covers 80% of the data, the second decile range RD2= D8-D2 – 60%.
The disadvantage of the variation range indicator is that its value does not reflect all fluctuations of the trait.
The simplest general indicator reflecting all fluctuations of a characteristic is average linear deviation, which is the arithmetic mean of the absolute deviations of individual options from their average value:

,
for grouped data
,
where xi is the value of the attribute in a discrete series or the middle of the interval in the interval distribution.
In the above formulas, the differences in the numerator are taken modulo, otherwise, according to the property of the arithmetic mean, the numerator will always be equal to zero. Therefore, the average linear deviation is rarely used in statistical practice, only in cases where summing indicators without taking into account the sign makes economic sense. With its help, for example, the composition of the workforce, the profitability of production, and foreign trade turnover are analyzed.
Variance of a trait is the average square of deviations from their average value:
simple variance
,
variance weighted
.
The formula for calculating variance can be simplified:

Thus, the variance is equal to the difference between the average of the squares of the option and the square of the average of the option of the population:
.
However, due to the summation of the squared deviations, the variance gives a distorted idea of ​​the deviations, so the average is calculated based on it standard deviation, which shows how much on average specific variants of a trait deviate from their average value. Calculated by taking the square root of the variance:
for ungrouped data
,
for variation series

The smaller the value of the variance and standard deviation, the more homogeneous the population, the more reliable (typical) the average value will be.
Average linear and standard deviation are named numbers, i.e. they are expressed in units of measurement of a characteristic, are identical in content and close in meaning.
It is recommended to calculate absolute variations using tables.
Table 3 - Calculation of variation characteristics (using the example of the period of data on the shift output of crew workers)


Number of workers

The middle of the interval

Calculated values

Total:

Average shift output of workers:

Average linear deviation:

Production variance:

The standard deviation of the output of individual workers from the average output:
.

1 Calculation of dispersion using the method of moments

Calculating variances involves cumbersome calculations (especially if the average is expressed as a large number with several decimal places). Calculations can be simplified by using a simplified formula and dispersion properties.
The dispersion has the following properties:

  1. If all values ​​of a characteristic are reduced or increased by the same value A, then the dispersion will not decrease:

,

, then or
Using the properties of dispersion and first reducing all variants of the population by the value A, and then dividing by the value of the interval h, we obtain a formula for calculating dispersion in variation series with equal intervals in a way:
,
where is the dispersion calculated using the method of moments;
h – the value of the interval of the variation series;
– new (transformed) values ​​option;
A is a constant value, which is used as the middle of the interval with the highest frequency; or the option with the highest frequency;
– square of the first order moment;
– moment of the second order.
Let us calculate the dispersion using the method of moments based on data on the shift output of the team’s workers.
Table 4 - Calculation of variance using the method of moments


Groups of production workers, pcs.

Number of workers

The middle of the interval

Calculated values

Calculation procedure:


  1. We calculate the variance:

2 Calculation of the variance of an alternative characteristic

Among the characteristics studied by statistics, there are also those that have only two mutually exclusive meanings. These are alternative signs. They are given, respectively, two quantitative values: options 1 and 0. The frequency of option 1, which is denoted by p, is the proportion of units possessing this characteristic. The difference 1-р=q is the frequency of options 0. Thus,


xi

Arithmetic mean of the alternative sign
, because p+q=1.

Alternative trait variance
, because 1-р=q
Thus, the variance of an alternative characteristic is equal to the product of the proportion of units possessing this characteristic and the proportion of units not possessing this characteristic.
If values ​​1 and 0 occur equally often, i.e. p=q, the variance reaches its maximum pq=0.25.
The variance of an alternative attribute is used in sample surveys, for example, of product quality.

3 Between-group variance. Variance addition rule

Dispersion, unlike other characteristics of variation, is an additive quantity. That is, in the aggregate, which is divided into groups according to factor characteristics X , variance of the resultant characteristic y can be decomposed into the variance within each group (within groups) and the variance between groups (between groups). Then, along with studying the variation of a trait throughout the entire population as a whole, it becomes possible to study the variation in each group, as well as between these groups.

Total variance measures variation in a trait at in its entirety under the influence of all factors that caused this variation (deviations). It is equal to the mean square deviation of individual values ​​of the attribute at from the grand average and can be calculated as simple or weighted variance.
Intergroup variance characterizes the variation of the resulting trait at caused by the influence of the factor-sign X, which formed the basis of the grouping. It characterizes the variation of group averages and is equal to the mean square of deviations of group averages from the overall average:
,
where is the arithmetic mean of the i-th group;
– number of units in the i-th group (frequency of the i-th group);
– the overall average of the population.
Within-group variance reflects random variation, i.e. that part of the variation that is caused by the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the grouping. It characterizes the variation of individual values ​​relative to group averages and is equal to the mean square deviation of individual values ​​of the attribute at within a group from the arithmetic mean of this group (group mean) and is calculated as a simple or weighted variance for each group:
or ,
where is the number of units in the group.
Based on the within-group variances for each group, one can determine overall mean of within-group variances:
.
The relationship between the three dispersions is called rules for adding variances, according to which the total variance is equal to the sum of the between-group variance and the average of the within-group variances:

Example. When studying the influence of the tariff category (qualification) of workers on the level of productivity of their labor, the following data were obtained.
Table 5 – Distribution of workers by average hourly output.



p/p

Workers of the 4th category

Workers of the 5th category

Output
worker, pcs.,

Output
worker, pcs.,

1
2
3
4
5
6

7
9
9
10
12
13

7-10=-3
9-10=-1
-1
0
2
3

9
1
1
0
4
9

1
2
3
4

14
14
15
17

14-15=-1
-1
0
2

1
1
0
4

In this example, workers are divided into two groups according to factor characteristics X– qualifications, which are characterized by their rank. The resulting trait—production—varies both under its influence (intergroup variation) and due to other random factors (intragroup variation). The goal is to measure these variations using three variances: total, between-groups, and within-groups. The empirical coefficient of determination shows the proportion of variation in the resulting characteristic at under the influence of a factor sign X. Rest of the total variation at caused by changes in other factors.
In the example, the empirical coefficient of determination is:
or 66.7%,
This means that 66.7% of the variation in worker productivity is due to differences in qualifications, and 33.3% is due to the influence of other factors.
Empirical correlation relationship shows the close connection between grouping and performance characteristics. Calculated as the square root of the empirical coefficient of determination:

The empirical correlation ratio, like , can take values ​​from 0 to 1.
If there is no connection, then =0. In this case =0, that is, the group means are equal to each other and there is no intergroup variation. This means that the grouping characteristic - factor does not affect the formation of general variation.
If the connection is functional, then =1. In this case, the variance of the group means is equal to the total variance (), that is, there is no within-group variation. This means that the grouping characteristic completely determines the variation of the resulting characteristic being studied.
The closer the value of the correlation ratio is to unity, the closer, closer to the functional dependence, is the connection between the characteristics.
To qualitatively assess the closeness of connections between characteristics, Chaddock’s relations are used.

In the example , which indicates a close connection between worker productivity and their qualifications.

Expectation and variance are the most commonly used numerical characteristics of a random variable. They characterize the most important features of the distribution: its position and degree of scattering. In many practical problems, a complete, exhaustive characteristic of a random variable - the distribution law - either cannot be obtained at all, or is not needed at all. In these cases, one is limited to an approximate description of a random variable using numerical characteristics.

The expected value is often called simply the average value of a random variable. Dispersion of a random variable is a characteristic of dispersion, the spread of a random variable around its mathematical expectation.

Expectation of a discrete random variable

Let us approach the concept of mathematical expectation, first based on the mechanical interpretation of the distribution of a discrete random variable. Let the unit mass be distributed between the points of the x-axis x1 , x 2 , ..., x n, and each material point has a corresponding mass of p1 , p 2 , ..., p n. It is required to select one point on the abscissa axis, characterizing the position of the entire system of material points, taking into account their masses. It is natural to take the center of mass of the system of material points as such a point. This is the weighted average of the random variable X, to which the abscissa of each point xi enters with a “weight” equal to the corresponding probability. The average value of the random variable obtained in this way X is called its mathematical expectation.

The mathematical expectation of a discrete random variable is the sum of the products of all its possible values ​​and the probabilities of these values:

Example 1. A win-win lottery has been organized. There are 1000 winnings, of which 400 are 10 rubles. 300 - 20 rubles each. 200 - 100 rubles each. and 100 - 200 rubles each. What is the average winnings for someone who buys one ticket?

Solution. We will find the average winnings if we divide the total amount of winnings, which is 10*400 + 20*300 + 100*200 + 200*100 = 50,000 rubles, by 1000 (total amount of winnings). Then we get 50000/1000 = 50 rubles. But the expression for calculating the average winnings can be presented in the following form:

On the other hand, in these conditions, the winning size is a random variable, which can take values ​​of 10, 20, 100 and 200 rubles. with probabilities equal to 0.4, respectively; 0.3; 0.2; 0.1. Therefore, the expected average win is equal to the sum of the products of the size of the wins and the probability of receiving them.

Example 2. The publisher decided to publish a new book. He plans to sell the book for 280 rubles, of which he himself will receive 200, 50 - the bookstore and 30 - the author. The table provides information about the costs of publishing a book and the probability of selling a certain number of copies of the book.

Find the publisher's expected profit.

Solution. The random variable “profit” is equal to the difference between the income from sales and the cost of costs. For example, if 500 copies of a book are sold, then the income from the sale is 200 * 500 = 100,000, and the cost of publication is 225,000 rubles. Thus, the publisher faces a loss of 125,000 rubles. The following table summarizes the expected values ​​of the random variable - profit:

NumberProfit xi Probability pi xi p i
500 -125000 0,20 -25000
1000 -50000 0,40 -20000
2000 100000 0,25 25000
3000 250000 0,10 25000
4000 400000 0,05 20000
Total: 1,00 25000

Thus, we obtain the mathematical expectation of the publisher’s profit:

.

Example 3. Probability of hitting with one shot p= 0.2. Determine the consumption of projectiles that provide a mathematical expectation of the number of hits equal to 5.

Solution. From the same mathematical expectation formula that we have used so far, we express x- shell consumption:

.

Example 4. Determine the mathematical expectation of a random variable x number of hits with three shots, if the probability of a hit with each shot p = 0,4 .

Hint: find the probability of random variable values ​​by Bernoulli's formula .

Properties of mathematical expectation

Let's consider the properties of mathematical expectation.

Property 1. The mathematical expectation of a constant value is equal to this constant:

Property 2. The constant factor can be taken out of the mathematical expectation sign:

Property 3. The mathematical expectation of the sum (difference) of random variables is equal to the sum (difference) of their mathematical expectations:

Property 4. The mathematical expectation of a product of random variables is equal to the product of their mathematical expectations:

Property 5. If all values ​​of a random variable X decrease (increase) by the same number WITH, then its mathematical expectation will decrease (increase) by the same number:

When you can’t limit yourself only to mathematical expectation

In most cases, only the mathematical expectation cannot sufficiently characterize a random variable.

Let the random variables X And Y are given by the following distribution laws:

Meaning X Probability
-0,1 0,1
-0,01 0,2
0 0,4
0,01 0,2
0,1 0,1
Meaning Y Probability
-20 0,3
-10 0,1
0 0,2
10 0,1
20 0,3

The mathematical expectations of these quantities are the same - equal to zero:

However, their distribution patterns are different. Random value X can only take values ​​that differ little from the mathematical expectation, and the random variable Y can take values ​​that deviate significantly from the mathematical expectation. A similar example: the average wage does not make it possible to judge the share of high- and low-paid workers. In other words, one cannot judge from the mathematical expectation what deviations from it, at least on average, are possible. To do this, you need to find the variance of the random variable.

Variance of a discrete random variable

Variance discrete random variable X is called the mathematical expectation of the square of its deviation from the mathematical expectation:

The standard deviation of a random variable X the arithmetic value of the square root of its variance is called:

.

Example 5. Calculate variances and standard deviations of random variables X And Y, the distribution laws of which are given in the tables above.

Solution. Mathematical expectations of random variables X And Y, as found above, are equal to zero. According to the dispersion formula at E(X)=E(y)=0 we get:

Then the standard deviations of random variables X And Y make up

.

Thus, with the same mathematical expectations, the variance of the random variable X very small, but a random variable Y- significant. This is a consequence of differences in their distribution.

Example 6. The investor has 4 alternative investment projects. The table summarizes the expected profit in these projects with the corresponding probability.

Project 1Project 2Project 3Project 4
500, P=1 1000, P=0,5 500, P=0,5 500, P=0,5
0, P=0,5 1000, P=0,25 10500, P=0,25
0, P=0,25 9500, P=0,25

Find the mathematical expectation, variance and standard deviation for each alternative.

Solution. Let us show how these values ​​are calculated for the 3rd alternative:

The table summarizes the found values ​​for all alternatives.

All alternatives have the same mathematical expectations. This means that in the long run everyone has the same income. Standard deviation can be interpreted as a measure of risk - the higher it is, the greater the risk of the investment. An investor who does not want much risk will choose project 1 since it has the smallest standard deviation (0). If the investor prefers risk and high returns in a short period, then he will choose the project with the largest standard deviation - project 4.

Dispersion properties

Let us present the properties of dispersion.

Property 1. The variance of a constant value is zero:

Property 2. The constant factor can be taken out of the dispersion sign by squaring it:

.

Property 3. The variance of a random variable is equal to the mathematical expectation of the square of this value, from which the square of the mathematical expectation of the value itself is subtracted:

,

Where .

Property 4. The variance of the sum (difference) of random variables is equal to the sum (difference) of their variances:

Example 7. It is known that a discrete random variable X takes only two values: −3 and 7. In addition, the mathematical expectation is known: E(X) = 4 . Find the variance of a discrete random variable.

Solution. Let us denote by p the probability with which a random variable takes a value x1 = −3 . Then the probability of the value x2 = 7 will be 1 − p. Let us derive the equation for the mathematical expectation:

E(X) = x 1 p + x 2 (1 − p) = −3p + 7(1 − p) = 4 ,

where we get the probabilities: p= 0.3 and 1 − p = 0,7 .

Law of distribution of a random variable:

X −3 7
p 0,3 0,7

We calculate the variance of this random variable using the formula from property 3 of dispersion:

D(X) = 2,7 + 34,3 − 16 = 21 .

Find the mathematical expectation of a random variable yourself, and then look at the solution

Example 8. Discrete random variable X takes only two values. It accepts the greater of the values ​​3 with probability 0.4. In addition, the variance of the random variable is known D(X) = 6 . Find the mathematical expectation of a random variable.

Example 9. There are 6 white and 4 black balls in an urn. 3 balls are drawn from the urn. The number of white balls among the drawn balls is a discrete random variable X. Find the mathematical expectation and variance of this random variable.

Solution. Random value X can take values ​​0, 1, 2, 3. The corresponding probabilities can be calculated from probability multiplication rule. Law of distribution of a random variable:

X 0 1 2 3
p 1/30 3/10 1/2 1/6

Hence the mathematical expectation of this random variable:

M(X) = 3/10 + 1 + 1/2 = 1,8 .

The variance of a given random variable is:

D(X) = 0,3 + 2 + 1,5 − 3,24 = 0,56 .

Expectation and variance of a continuous random variable

For a continuous random variable, the mechanical interpretation of the mathematical expectation will retain the same meaning: the center of mass for a unit mass distributed continuously on the x-axis with density f(x). Unlike a discrete random variable, whose function argument xi changes abruptly; for a continuous random variable, the argument changes continuously. But the mathematical expectation of a continuous random variable is also related to its average value.

To find the mathematical expectation and variance of a continuous random variable, you need to find definite integrals . If the density function of a continuous random variable is given, then it directly enters into the integrand. If a probability distribution function is given, then by differentiating it, you need to find the density function.

The arithmetic average of all possible values ​​of a continuous random variable is called its mathematical expectation, denoted by or .

The main generalizing indicators of variation in statistics are dispersions and standard deviations.

Dispersion this arithmetic mean squared deviations of each characteristic value from the overall average. The variance is usually called the mean square of deviations and is denoted by  2. Depending on the source data, the variance can be calculated using the simple or weighted arithmetic mean:

 unweighted (simple) variance;

 variance weighted.

Standard deviation this is a generalizing characteristic of absolute sizes variations signs in the aggregate. It is expressed in the same units of measurement as the attribute (in meters, tons, percentage, hectares, etc.).

The standard deviation is the square root of the variance and is denoted by :

 standard deviation unweighted;

 weighted standard deviation.

The standard deviation is a measure of the reliability of the mean. The smaller the standard deviation, the better the arithmetic mean reflects the entire represented population.

The calculation of the standard deviation is preceded by the calculation of the variance.

The procedure for calculating the weighted variance is as follows:

1) determine the weighted arithmetic mean:

2) calculate the deviations of the options from the average:

3) square the deviation of each option from the average:

4) multiply the squares of deviations by weights (frequencies):

5) summarize the resulting products:

6) the resulting amount is divided by the sum of the weights:

Example 2.1

Let's calculate the weighted arithmetic mean:

The values ​​of deviations from the mean and their squares are presented in the table. Let's define the variance:

The standard deviation will be equal to:

If the source data is presented in the form of interval distribution series , then you first need to determine the discrete value of the attribute, and then apply the described method.

Example 2.2

Let us show the calculation of variance for an interval series using data on the distribution of the sown area of ​​a collective farm according to wheat yield.

The arithmetic mean is:

Let's calculate the variance:

6.3. Calculation of variance using a formula based on individual data

Calculation technique variances complex, and with large values ​​of options and frequencies it can be cumbersome. Calculations can be simplified using the properties of dispersion.

The dispersion has the following properties.

1. Reducing or increasing the weights (frequencies) of a varying characteristic by a certain number of times does not change the dispersion.

2. Decrease or increase each value of a characteristic by the same constant amount A does not change the dispersion.

3. Decrease or increase each value of a characteristic by a certain number of times k respectively reduces or increases the variance in k 2 times standard deviation  in k once.

4. The dispersion of a characteristic relative to an arbitrary value is always greater than the dispersion relative to the arithmetic mean per square of the difference between the average and arbitrary values:

If A 0, then we arrive at the following equality:

that is, the variance of the characteristic is equal to the difference between the mean square of the characteristic values ​​and the square of the mean.

Each property can be used independently or in combination with others when calculating variance.

The procedure for calculating variance is simple:

1) determine arithmetic mean :

2) square the arithmetic mean:

3) square the deviation of each variant of the series:

X i 2 .

4) find the sum of squares of the options:

5) divide the sum of the squares of the options by their number, i.e. determine the average square:

6) determine the difference between the mean square of the characteristic and the square of the mean:

Example 3.1 The following data is available on worker productivity:

Let's make the following calculations:

According to the sample survey, depositors were grouped according to the size of their deposit in the city’s Sberbank:

Define:

1) scope of variation;

2) average deposit size;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next one, and the value of the interval of the last group is equal to the value of the interval of the previous one.

The value of the interval of the second group is equal to 200, therefore, the value of the first group is also equal to 200. The value of the interval of the penultimate group is equal to 200, which means that the last interval will also have a value of 200.

1) Let us define the range of variation as the difference between the largest and smallest value of the attribute:

The range of variation in the deposit size is 1000 rubles.

2) The average size of the contribution will be determined using the weighted arithmetic average formula.

Let us first determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be:

the second - 500, etc.

Let's enter the calculation results in the table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic mean of the absolute deviations of individual values ​​of a characteristic from the overall average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The weighted arithmetic mean is calculated, as shown in paragraph 2).

2. Absolute deviations from the average are determined:

3. The resulting deviations are multiplied by frequencies:

4. Find the sum of weighted deviations without taking into account the sign:

5. The sum of weighted deviations is divided by the sum of frequencies:

It is convenient to use the calculation data table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each attribute value from the arithmetic mean.

Calculation of variance in interval distribution series is carried out using the formula:

The procedure for calculating variance in this case is as follows:

1. Determine the weighted arithmetic mean, as shown in paragraph 2).

2. Find deviations from the average:

3. Square the deviation of each option from the average:

4. Multiply the squares of the deviations by the weights (frequencies):

5. Sum up the resulting products:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of depositors, fMiddle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000