Statistics-1

4 LESSON 4 MEASURES OF DISPERSION

LESSON 4
MEASURES OF DISPERSION


Why dispersion?
Measures of central tendency, Mean, Median, Mode, etc., indicate the central position of a series. They indicate the general magnitude of the data but fail to reveal all the peculiarities and characteristics of theseries. In other words, they fail to reveal the degree of the spread out or the extent of the variability inindividual items of the distribution. This can be explained by certain other measures, known as ‘Measures ofDispersion’ or Variation.
We can understand variation with the help of the following example :

----------------------------------------------

Series 1      Series 11      Series III

---------------------------------------------
10              2                     10
10              8                     12
10              20                    8

---------------------------------------------
∑X = 30       30                    30

----------------------------------------------

In all three series, the value of arithmetic mean is 10. On the basis of this average, we can say that the series are alike. If we carefully examine the composition of three series, we find the following differences:
(i) In case of 1st series, three items are equal; but in 2nd and 3rd series, the items are unequal and do not follow any specific order.
(ii) The magnitude of deviation, item-wise, is different for the 1st, 2nd and 3rd series. But all these deviations cannot be ascertained if the value of simple mean is taken into consideration.
(iii) In these three series, it is quite possible that the value of arithmetic mean is 10; but the value of median may differ from each other. This can be understood as follows ;

I                   II                 III
10                2                  8
10 Median     8 Median       10 Median
10                20                 12

The value of Median’ in 1st series is 10, in 2nd series = 8 and in 3rd series = 10. Therefore, the value of the Mean and Median are not identical.
(iv) Even though the average remains the same, the nature and extent of the distribution of the size of the items may vary. In other words, the structure of the frequency distributions may differ even (though their means are identical.


What is Dispersion?
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the sizes or quantities of the items of a group or series. According to Reiglemen, “Dispersion is the extent to which the magnitudes or quantities of the items differ, the degree of diversity.” The word dispersion may also be used to indicate the spread of the data.
In all these definitions, we can find the basic property of dispersion as a value that indicates the extent to which all other values are dispersed about the central value in a particular distribution.

Properties of a good measure of Dispersion
There are certain pre-requisites for a good measure of dispersion:
1. It should be simple to understand.
2. It should be easy to compute.
3. It should be rigidly defined.
4. It should be based on each individual item of the distribution.
5. It should be capable of further algebraic treatment.
6. It should have sampling stability.
7. It should not be unduly affected by the extreme items.

Types of Dispersion
The measures of dispersion can be either ‘absolute’ or “relative”. Absolute measures of dispersion are expressed in the same units in which the original data are expressed. For example, if the series is expressed as Marks of the students in a particular subject; the absolute dispersion will provide the value in Marks. The only difficulty is that if two or more series are expressed in different units, the series cannot be compared on the basis of dispersion.
‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion to an appropriate average. The basic advantage of this measure is that two or more series can be compared with each other despite the fact they are expressed in different units. Theoretically, ‘Absolute measure’ of dispersion is better. But from a practical point of view, relative or coefficient of dispersion is considered better as it is used to make comparison between series.

Methods of Dispersion
Methods of studying dispersion are divided into two types :
(i) Mathematical Methods: We can study the ‘degree’ and ‘extent’ of variation by these methods. In this category, commonly used measures of dispersion are :
(a) Range
(b) Quartile Deviation
(c) Average Deviation
(d) Standard deviation and coefficient of variation.
(ii) Graphic Methods: Where we want to study only the extent of variation, whether it is higher or lesser a Lorenz-curve is used.

Mathematical Methods

(a) Range
It is the simplest method of studying dispersion. Range is the difference between the smallest value and the largest value of a series. While computing range, we do not take into account frequencies of different groups.
Formula: Absolute Range = L – S
Coefficient of Range =
where, L represents largest value in a distribution
S represents smallest value in a distribution
We can understand the computation of range with the help of examples of different series,
(i) Raw Data: Marks out of 50 in a subject of 12 students, in a class are given as follows:
12, 18, 20, 12, 16, 14, 30, 32, 28, 12, 12 and 35.
In the example, the maximum or the highest marks obtained by a candidate is ‘35’ and the lowest marks obtained by a candidate is ‘12’. Therefore, we can calculate range;
L = 35 and S = 12
Absolute Range = L – S = 35 – 12 = 23 marks
Coefficient of Range =

(ii) Discrete Series

----------------------------------------------------------
Marks of the Students in           No. of students
Statistics (out of 50)

                     (X)                      (f)

-----------------------------------------------------------

Smallest          10                        4
                      12                       10
                      18                       16
Largest            20                       15

-----------------------------------------------------------

                                               Total = 45

-----------------------------------------------------------

Absolute Range = 20 – 10 = 10 marks
Coefficient of Range =

(iii) Continuous Series

------------------------------------------
                 X             Frequencies

------------------------------------------
                10 – 15             4
S = 10        15 – 20            10
L = 30        20 – 25             26
                25 – 30             8

-------------------------------------------

Absolute Range = L – S = 30 – 10 = 20 marks
Coefficient of Range =
Range is a simplest method of studying dispersion. It takes lesser time to compute the ‘absolute’ and ‘relative’ range. Range does not take into account all the values of a series, i.e. it considers only the extreme items and middle items are not given any importance. Therefore, Range cannot tell us anything about the character of the distribution. Range cannot be computed in the case of “open ends’ distribution i.e., a distribution where the lower limit of the first group and upper limit of the higher group is not given.
The concept of range is useful in the field of quality control and to study the variations in the prices of the shares etc.

(b) Quartile Deviations (Q.D.)
The concept of ‘Quartile Deviation does take into account only the values of the ‘Upper quartile (Q3) and the ‘Lower quartile’ (Q1). Quartile Deviation is also called ‘inter-quartile range’. It is a better method when we are interested in knowing the range within which certain proportion of the items fall.
‘Quartile Deviation’ can be obtained as :
(i) Inter-quartile range = Q3 – Q1
(ii) Semi-quartile range =
(iii) Coefficient of Quartile Deviation =

Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile Deviation in case of Raw Data
Suppose the values of X are : 20, 12, 18, 25, 32, 10
In case of quartile-deviation, it is necessary to calculate the values of Q1 and Q3 by arranging the given data in ascending of descending order.
Therefore, the arranged data are (in ascending order):
X = 10, 12, 18, 20, 25, 32
No. of items = 6
Q1 = the value of item = = 1.75th item
= the value of 1st item + 0.75 (value of 2nd item – value of 1st item)
= 10 + 0.75 (12 – 10) = 10 + 0.75(2) = 10 + 1.50 = 11.50
Q3 = the value of item =
= the value of 3(7/4)th item = the value of 5.25th item
= 25 + 0.25 (32 – 25) = 25 + 0.25 (7) = 26.075

Therefore,
(i) Inter-quartile range = Q3 – Q1 = 26.75 – 11.50 = 15.25
(ii) Semi-quartile range =
(iii) Coefficient of Quartile Deviation =

Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile Deviation in discrete series
Suppose a series consists of the salaries (Rs.) and number of the workers in a factory:

----------------------------------------

Salaries (Rs.)     No. of workers

----------------------------------------
60                    4
100                  20
120                  21
140                  16
160                  9

----------------------------------------

In the problem, we will first compute the values of Q3 and Q1

-------------------------------------------------------------------------------------

Salaries (Rs.)             No. of workers           Cumulative frequencies

(x)                              (f)                              (c.f.)

--------------------------------------------------------------------------------------

60                                 4                                4
100                               20                              24 – Q1 lies in this cumulative
120                               21                              45 frequency
140                               16                              61 – Q3 lies in this cumulative
160                               9                                70 frequency

---------------------------------------------------------------------------------------

                                    N = Sf = 70

----------------------------------------------------------------------------------------

Calculation of Q1 :                                     Calculation of Q3 :
Q1 = size of th item                                    Q3 = size of th item
= size of th item = 17.75                             = size of th item = 53.25th item
17.75 lies in the cumulative frequency 24,     53.25 lies in the cumulative frequency 61 which
which is corresponding to the value Rs. 100    is corresponding to Rs. 140
Q1 = Rs. 100                                                Q3 = Rs. 140

-------------------------------------------------------------------------------------------

(i) Inter-quartie range = Q3 – Q1 = Rs. 140 – Rs. 100 = Rs. 40
(ii) Semi-quartie range =
(iii) Coefficient of Quartile Deviation =

Calculation of Inter-quartile range, semi-quartile range and Coefficient of Quartile Deviation in case of continuous series
We are given the following data :

---------------------------------------------

Salaries (Rs.)           No. of Workers

---------------------------------------------

10 – 20                   4
20 – 30                   6
30 – 10                   10
40 – 50                   5

--------------------------------------------

Total = 25
In this example, the values of Q3 and Q1 are obtained as follows:

Salaries (Rs.) No. of workers Cumulative frequencies
(x) (f) (c.f.)
10 – 20 4 4
20 – 30 6 10
30 – 40 10 20
40 – 50 5 25
N = 25

Q1 =
Therefore, . It lies in the cumulative frequency 10, which is corresponding to class
20 – 30.
Therefore, Q1 group is 20 – 30.
where, l1 = 20, f = 6, i = 10, and cfo = 4
Q1 =
Q3 =
Therefore, = 18.75, which lies in the cumulative frequency 20, which is corresponding to class 30 –40, Therefore Q3 group is 30 – 40.
where, l1 = 30, i = 10, cf0 = 10, and f = 10
Q3 = = Rs. 38.75
Therefore :
(i) Inter-quartile range = Q3 – Ql = Rs. 38.75 – Rs. 23.75 = Rs.15.00
(iii) Semi-quartile range =
(iii) Coefficient of Quartile Deviation =

Advantages of Quartile Deviation
Some of the important advantages are :
(i) It is easy to calculate. We are required simply to find the values of Q1 and Q3 and then apply the formula of absolute and coefficient of quartic deviation.
(ii) It has better results than range method. While calculating range, we consider only the extreme values that make dispersion erratic, in the case of quartile deviation, we take into account middle 50% items.
(iii) The quartile deviation is not affected by the extreme items.
Disadvantages
(i) It is completely dependent on the central items. If these values are irregular and abnormal the result is bound to be affected.

(ii) All the items of the frequency distribution are not given equal importance in finding the values of Q1 and Q3.
(iii) Because it does not take into account all the items of the series, considered to be inaccurate.
Similarly, sometimes we calculate percentile range, say, 90th and 10th percentile as it gives slightly better measure of dispersion in certain cases.
(i) Absolute percentile range = P90 – P10.
(ii) Coefficient of percentile range =
This method of calculating dispersion can be applied generally in case of open end series where the importance of extreme values are not considered.

(c) Average Deviation
Average deviation is defined as a value which is obtained by taking the average of the deviations of various items from a measure of central tendency Mean or Median or Mode, ignoring negative signs. Generally, the measure of central tendency from which the deviations arc taken, is specified in the problem. If nothing is mentioned regarding the measure of central tendency specified than deviations are
taken from median because the sum of the deviations (after ignoring negative signs) is minimum.
Computation in case of raw data
(i) Absolute Average Deviation about Mean or Median or Mode=
where: N = Number of observations,
|d| = deviations taken from Mean or Median or Mode ignoring signs.
(ii) Coefficient of A.D. =
Steps to Compute Average Deviation :
(i) Calculate the value of Mean or Median or Mode
(ii) Take deviations from the given measure of central-tendency and they are shown as d.
(iii) Ignore the negative signs of the deviation that can be shown as \d\ and add them to find S|d|.
(iv) Apply the formula to get Average Deviation about Mean or Median or Mode.
Example : Suppose the values are 5, 5, 10, 15, 20. We want to calculate Average Deviation and Coefficient of Average Deviation about Mean or Median or Mode.
Solution : Average Deviation about mean (Absolute and Coefficient).

-------------------------------------------------------------------------------------------

                       Deviation from mean          Deviations after ignoring signs
(x)                        d                                  | d |

--------------------------------------------------------------------------------------------

5                           – 6                                6 =
5                           – 6                                6 where N = 5. SX = 55
10                         + 1                                1
15                         + 4                                4

20                         + 9                                9

--------------------------------------------------------------------------------------------

∑X = 55                                                      ∑| d | = 26

--------------------------------------------------------------------------------------------

Average Deviation about Mean =
Coefficient of Average Deviation about mean =
Average Deviation (Absolute and Coefficient) about Median

Average Deviation about Mode =
Coefficient of Average Deviation about Mode =
Average deviation in case of discrete and continuous series
Average Deviation about Mean or Median or Mode =
where N = No. of items
|d| = deviations from Mean or Median or Mode after ignoring signs.
Coefficient of A.D. about Mean or Median or Mode =
Example: Suppose we want to calculate coefficient of Average Deviation about Mean from the following discrete series:

X      Frequency
10    5
15    10
20    15
25    10
30     5

Solution: First of all, we shall calculate the value of arithmetic Mean,
Calculation of Arithmetic Mean

Coefficient of Average Deviation about Mean =
Average Deviation about Mean =
In case we want to calculate coefficient of Average Deviation about Median from the following data:

Class Interval     Frequency
10 – 14               5
15 – 19               10
20 – 24               15

25 – 29               10
30 – 34               5

First of all we shall calculate the value of Median but it is necessary to find the ‘real limits’ of the given class-intervals. This is possible by subtracting 0.5 from all the lower-limits and add 0.5 to all the upper limits of the given classes. Hence, the real limits shall be : 9.5 – 14.5, 14.5 – 19.5, 19.5 – 24.5, 24.5 – 29.5 and 29.5 – 34.5
Calculation of Median

Advantages of Average Deviations
1. Average deviation takes into account all the items of a series and hence, it provides sufficiently representative results.
2. It simplifies calculations since all signs of the deviations are taken as positive.

3. Average Deviation may be calculated either by taking deviations from Mean or Median or Mode.
4. Average Deviation is not affected by extreme items.
5. It is easy to calculate and understand.
6. Average deviation is used to make healthy comparisons.
Disadvantages of Average Deviations
1. It is illogical and mathematically unsound to assume all negative signs as positive signs.
2. Because the method is not mathematically sound, the results obtained by this method are not reliable.
3. This method is unsuitable for making comparisons either of the series or structure of the series.
This method is more effective during the reports presented to the general public or to groups who are not familiar with statistical methods.

(d) Standard Deviation
The standard deviation, which is shown by greek letter s (read as sigma) is extremely useful in judging the representativeness of the mean. The concept of standard deviation, which was introduced by Karl Pearson has a practical significance because it is free from all defects, which exists in a range, quartile deviation or average deviation.
Standard deviation is calculated as the square root of average of squared deviations taken from actual mean. It is also called root mean square deviation. The square of standard deviation i.e., s2 is called ‘variance’.
Calculation of standard deviation in case of raw data
There are four ways of calculating standard deviation for raw data:
(i) When actual values are considered;
(ii) When deviations are taken from actual mean;
(iii) When deviations are taken from assumed mean; and
(iv) When ‘step deviations’ are taken from assumed mean.

(i) When the actual values are considered:
σ = where, N = Number of the items,
or σ2 = X = Given values of the series,
= Arithmetic mean of the series
We can also write the formula as follows :
σ = where, =
Steps to calculate σ
(i) Compute simple mean of the given values,
(ii) Square the given values and aggregate them
(iii) Apply the formula to find the value of standard deviation
Example: Suppose the values are given 2, 4, 6, 8, 10. We want to apply the formula

σ =
Solution: We are required to calculate the values of N, , SX2. They are calculated as follows :

X     X2
2      4
4      16
6      36
8      64
10     100
N = 5 ∑X2 = 220

σ =
Variance (σ)2 =
=

(ii) When the deviations are taken from actual mean
σ = where, N = no. of items and x = (X – )
Steps to Calculate σ
(i) Compute the deviations of given values from actual mean i.e., (X – ) and represent them by x.
(ii) Square these deviations and aggegate them
(iii) Use the formula, σ =
Example : We are given values as 2, 4, 6, 8, 10. We want to find out standard deviation.

X                      (X – ) = x                x2
2                       2 – 6 = – 4             (– 4)2 = 16
4                       4 – 6 = – 2             (– 2)2 = 4
6                       6 –6 = 0                         = 0
8                       8 – 6 = + 2             (2)2 = 4
10                     10 – 6 = + 4            (4)2 = 16
N = 5                                             ∑x2 = 40

(iii) When the deviations are taken from assumed mean
σ =
where, N = no. of items,

dx = deviations from assumed mean i.e., (X – A).
A = assumed mean
Steps to Calculate :
(i) We consider any value as assumed mean. The value may be given in the series or may not be given in the series.
(ii) We take deviations from the assumed value i.e., (X – A), to obtain dx for the series and aggregate them to find ∑dx.
(iii) We square these deviations to obtain dx2 and aggregate them to find ∑dx2.
(iv) Apply the formula given above to find standard deviation.

Example : Suppose the values are given as 2, 4, 6, 8 and 10. We can obtain the standard deviation as:

-----------------------------------------------------------------

                       X             dx = (X – A)             dx2

-----------------------------------------------------------------
                       2            – 2 = (2 – 4)                4
assumed mean (A) 4           0 = (4 – 4)                0
                       6            + 2 = (6 – 4)                4
                       8            + 4 = (8 – 4)                16
                      10           + 6 = (10 – 4)               36

-----------------------------------------------------------------
                   N = 5          ∑dx = 10              ∑dx2 = 60

-----------------------------------------------------------------

(iv) When step deviations are taken from assumed mean
σ =
where, i = common factor, N = number of item, dx (Step-deviations) =
Steps to Calculate :
(i) We consider any value as assumed mean from the given values or from outside.
(ii) We take deviation from the assumed mean i.e. (X – A).
(iii) We divide the deviations obtained in step (ii) with a common factor to find step deviations and represent them as dx and aggregate them to obtain ∑dx.
(iv) We square the step deviations to obtain dx2 and aggregate them to find ∑dx2.
Example : We continue with the same example to understand the computation of Standard Deviation.

X d = (X – A) dx = and i = 2 dx2
2 – 2 1 1
A = 4 0 0 0
6 + 2 1 1
8 + 4 2 4
10 + 6 3 9

N = 5 Sdx = 5 Sdx2 = 15

s = where N = 5, i = 2, dx = 5, and Sdx2 = 15
s =
Note :We can notice an important point that the standard deviation value is identical by four methods. Therefore any of the four formulae can be applied to find the value of standard deviation. But the suitability of a formula depends on the magnitude of items in a question.
Coefficient of Standard-deviation =
In the above given example, s = 2.828 and = 6
Therefore, coefficient of standard deviation =

Coefficient of Variation or C. V.
=
Generally, coefficient of variation is used to compare two or more series. If coefficient of variation (C.V.) is more for one series as compared to the other, there will be more variations in that series, lesser stability or consistency in its composition. If coefficient of variation is lesser as compared to other series, it will be more stable or consistent. Moreover that series is always better where coefficient of variation or coefficient of standard deviation is lesser.
Example : Suppose we want to compare two firms where the salaries of the employees are given as follows:

                           Firm A     FirmB
No. of workers       100         100
Mean salary (Rs.)     100         80
Standard-deviation (Rs.) 40    45

Solution : We can compare these firms either with the help of coefficient of standard deviation or coefficient of variation. If we use coefficient of variation, then we shall apply the formula :

Firm A                              Firm B
C.V. =                        C.V. =
       = 100, σ = 40.             = 80, σ = 45

Because the coefficient of variation is lesser for firm A than firm B, therefore, firm A is less variable and more stable.
Calculation of standard-deviation in discrete and continuous series
We use the same formula for calculating standard deviation for a discrete series and a continuous series. The only difference is that in a discrete series, values and frequencies are given whereas in a continuous series, class-intervals and frequencies are given. When the mid-points of these class-intervals are obtained, a continuous series takes shape of a discrete series. X denotes values in a discrete series and mid points in a continuous series.
When the deviations are taken from actual mean
We use the same formula for calculating standard deviation for a continuous series

σ =
where N = Number of items
f = Frequencies corresponding to different values or class-intervals.
x = Deviations from actual mean (X – ).
X = Values in a discrete series and mid-points in a continuous series.

Step to calculate σ
(i) Compute the arithmetic mean by applying the required formula.
(ii) Take deviations from the arithmetic mean and represent these deviations by x.
(iii) Square the deviations to obtain values of x .
(iv) Multiply the frequencies of different class-intervals with x2 to find fx2. Aggregate fx2 column to obtain ∑ fx2.
(v) Apply the formula to obtain the value of standard deviation.
If we want to calculate variance then we can compute σ2 =
Example : We can understand the procedure by taking an example :

σ = where, N = 45, ∑fx2 = 1500
σ  =
When the deviations are taken from assumed mean
In some cases, the value of simple mean may be in fractions, them it becomes time consuming to
take deviations and square them. Alternatively, we can take deviations from the assumed mean.
σ  =
where N = Number of the items,
dx = deviations from assumed mean (X – A),
f = frequencies of the different groups,
A = assumed mean and
X = Values or mid points.
Step to calculate σ
(i) Take the assumed mean from the given values or mid points.
(ii) Take deviations from the assumed mean and represent them by dx.
(iii) Square the deviations to get dx2 .
(iv) Multiply f with dx of different groups to abtain fdx and add them up to get fdx.
(v) Multiply f with dx2 of different groups to abtain fdx2 and add them up to get fdx2.
(vi) Apply the formula to get the value of standard deviation.

Steps to calculate σ
(i) Take deviations from the assumed mean of the calculated mid-points and divide all deviations by a common factor (i) and represent these values by dx.
(ii) Square these step deviations dx to obtain dx2 for different groups.
(iii) Multiply f with dx of different groups to find fdx and add them to obtain fdx .
(iv) Multiply f with dx2 of different groups to find fdx2 for different groups and add them to obtain ∑fdx2.
(v) Apply the formula to find standard deviation.

Advantages of Standard Deviation
(i) Standard deviation is the best measure of dispersion because it takes into account all the items and is capable of future algebric treatment and statistical analysis.
(ii) It is possible to calculate standard deviation for two or more series.
(iii) This measure is most suitable for making comparisons among two or more series about varibility.
Disadvantages
(i) It is difficult to compute.
(ii) It assigns more weights to extreme items and less weights to items that are nearer to mean. It is because of this fact that the squares of the deviations which are large in size would be proportionately greater than the squares of those deviations which are comparatively small.

Mathematical properties of standard deviation (σ)
(i) If deviations of given items are taken from arithmetic mean and squared then the sum of squared deviation should be minimum, i.e., = Minimum,
(ii) If different values are increased or decreased by a constant, the standard deviation will remain the same. If different values arc multiplied or divided by a constant than the standard deviation will be multiplied or divided by that constant.
(iii) Combined standard deviation can be obtained for two or more series with below given formula:
σ12 =
where:

N1 represents number of items in first series,
N2 represents number of items in second series,
represents variance of first series,
represents variance of second series,
d1 represents the difference between
d2 represents the difference between
represents arithmetic mean of first series,
represents arithmetic mean of second series,
represents combined arithmetic mean of both the series.

Example : Find the combined smadard deviation of two series, from the below given information :
                              First Series              Second Series
No. of items                    10                         15
Arithmetic means             15                         20
Standard deviation             4                          5

Solution : Since we are considering two series, therefore combined standard deviation is computed by the following formula :

σ12 =
where: N1 = 10, N2 = 15, , , σ1 = 4, s2 = 5
=
or =
d1 =
By applying the formula of combined standard deviation, we get :
σ12 =
=
=
(iv) Standard deviation of n natural numbers can he computed as :
σ = where, N represents number of items.

(v) For a symmetrical distribution
σ covers 68.27% of items.
+ 2σ covers 95.45% of items.
+ 3σ covers 99.73% of items.

Example : You are heading a rationing department in a State affected by food shortage. Local investigators submit the following report:

Daily calorie value of food available per adult during current period :
       Area Mean         Standard deviation
A         2,500              400
B         2,000              200

The estimated requirement of an adult is taken at 2,800 calories daily and the absolute minimum is 1,350. Comment on the reported figures, and determine which area, in your opinion, need more urgent attention.
Solution : We know that + σ covers 68.27% of items. + 2σ covers 95.45% of items and + 3σ covers 99.73% . In the gjven problem if we take into consideration 99.73%. i.e., almost the whole population, the limits would be + 3σ.

For Area A these limits are :
+ 3σ = 2,500 + (3 × 400) = 3,700
– 3σ = 2,500 – (3 × 400) = 1,300
For Area B these limits are :
+ 3σ = 2,000 + (3 × 200) = 2,600
– 3σ = 2,000 – (3 × 200) = 1,400

It is clear from above limits that in Area A there are some persons who are getting 1300 calories, i.e. below the minimum which is 1,350. But in case of area B there is no one who is getting less than the minimum. Hence area A needs more urgent attention.

(vi) Relationship between quartile deviation, average deviation and standard deviation is given as:
Quartile deviation = 2/3 Standard deviation
Average deviation = 4/5 Standard deviation
(vii) We can also compute corrected standard deviation by using the following formula :
Correct σ =
(a) Compute corrected =
where, corrected ∑f = X + correct items – wrong items
where, X = N.
(b) Compute corrected X2 = X2 + (Each correct item)2 – (Each wrong item}2
where, X2 = Nσ1 +

Example : (a) Find out the coefficient of variation of a series for which the following results are given :
N = 50, X’ = 25, X’2 = 500 where: X’ = deviation from the assumed average 5.
(b) For a frequency distribution of marks in statistics of 100 candidates, (grouped in class inervals of 0 – 10, 10 – 20) the mean and standard deviation were found to be 45 and 20. Later it was discovered that the score 54 was misread as 64 in obtaining frequency distribution. Find out the correct mean and correct standard deviation of the frequency destribution.
(c) Can coefficient of variation be greater than 100%? If so, when?
Solution : (a) We want to calculate, coefficient of variation which is =

Therefore, we are required to calculate mean and standard deviation.
Calculation of simple mean

= where, A = 5, N = 50, X’ = 25

Calculation of standard deviation
σ =
Calculation of Coefficient of variation
C.V. =

(b) Given = 45, σ = 20, N = 100, wrong value = 64, correct value = 54
Since this is a case of continuous series, therefore, we will apply the formula for mean and standard deviation that are applicable in a continuous series.

Calculation of correct Mean
= or N = ∑fX
By substituting the values, we get 100 × 45 = 4500
Correct fX = 4500 – 64 + 54 = 4490
 Correct =

Calculation of correct σ 
σ = or σ2 =
where, σ = 20, N = 100, = 45
(20)2 =
or 400 =
or 400 + 2025 =
or 2425 × 100 = ∑fX2 = 242500
\ Correct ∑fX2 = 242500 – (64)2 + (54)2 = 242500 – 4096 + 2916 = 242500 – 1180 = 241320
Correct σ =

(c) The formulae for the computation of coefficient of variation is =
Hence, coefficient of variation can be greater than 100% only when the value of standard deviation is greater than the value of mean.
This will happen when data contains a large number of small items and few items are quite large. In such a case the value of simple mean will be pulled down and the value of standard deviation will go up. Similarly, if there arc negative items in a series, the value of mean will come down and the value of standard deviation shall not be affecied because of squaring the deviations.
Example : In a distribution of 10 observations, the value of mean and standard deviation are given as 20 and 8. By mistake, two values are taken as 2 and 6 instead of 4 and 8. Find out the value of correct mean and variance.
Solution : We are given: N – 10, = 20, σ = 3
Wrong values = 2 and 6 and Correct values = 4 and 8
Calculation of correct Mean
=
∑X = 10 × 20 = 200
But X is incorrect. Therefore we shall find correct ∑X.
Correct X = 200 – 2 – 6 + 4 + 8 = 204
Correct Mean =
Calculation of correct variance
σ =
or σ2 =
or (8)2 =
or 64 + 400 =
or X2 = 4640
But this is wrong and hence we shall compute correct X2
Correct X2 = 4640 – 22 – 62 + 42 + 82
= 4640 – 4 – 36 + 16 + 64 = 4680
Correct σ2 =

Revisionary Problems
Example : Compute (a) Inter-quartile range. (b) Semi-quartile range, and (c) Coefficient of quartile deviation from the following data :

--------------------------------------------------------------------------------------

Farm Size (acres)   No. of firms    Farm Size (acres)    No. of firms

--------------------------------------------------------------------------------------
0 – 40                        394                      161 – 200                     169
41 – 80                      461                       201 – 240                    113

81 – 120                    391                       24 1 and over             148
121 – 160                  334

-------------------------------------------------------------------------------------

Solution :
In this case, the real limits of the class intervals are obtained by subtracting 0.5 from the lower limits of each class and adding 0.5 to the upper limits of each class. This adjustment is necessary to calculate median and quartiles of the series.

-------------------------------------------------------------------------------

Farm Size (acres)   No. of firms    Cumulative frequency (c.f.)

-------------------------------------------------------------------------------
– 0.5 – 10.5                394                     394
40.5 – 80.5                 461                     855
80.5 – 120.5               391                     1246
120.5 – 160.5             334                     1580
160.5 – 200.5             169                     1749
200.5 – 240.5             113                     1862
240.5 and over          148                     2010

-------------------------------------------------------------------------------

                             N = 2010

-------------------------------------------------------------------------------

Q1 =
=
Q1 lies in the cumulative frequency of the group 40.5 – 80.5. and l1 = 40.5, f = 461, i = 40, cf0 = 394, = 502.5
 Q1 =
Similarly, Q3 =
=
Q3 lies in the cumulative frequency of the group 121 – 160, where the real limits of the class interval are 120.5 – 160.5 and l1 = 120.5, i = 40, f = 334, = 1507.5, c.f. = 1246
 Q3 =
Inter-quartile range = Q3 – Q1 = 151.8 – 49.9 = 101.9 acres
Semi-quartile range =
Coefficient of quartile deviation =

Example : Calculate mean and coefficient of mean deviation about mean from the following data :

----------------------------------------------
Marks less than        No. of students

----------------------------------------------
10                         4
20                         10
30                         20
40                         40

50                         50
60                         56
70                         60

----------------------------------------------

Solution :
In this question, we are given less than type series alongwith the cumulative frequencies. Therefore, we are required first of all to find out class intervals and frequencies for calculating mean and coefficient of mean deviation about mean.

M.D. about mean =
Coefficient of M.D. about mean =
Example : Calculate standard deviation from the following data :

Class Interval       frequency
– 30 to – 20            5
– 20 to – 10            10
– 10 to – 0              15

0 to 10                  10
10 to 20                 5
                            N = 45

Example : For two firms A and B belonging to same industry, the following details are available :

----------------------------------------------------------------------------------
                                                            Firm A             Firm B

---------------------------------------------------------------------------------
Number of Employees :                                100                200
Average wage per month :                            Rs. 240          Rs. 170
Standard deviation of the wage per month :   Rs. 6             Rs. 8

---------------------------------------------------------------------------------
Find (i) Which firm pays out larger amount as monthly wages?
(ii) Which firm shows greater variability in the distribution of wages?
(iii) Find average monthly wages and the standard deviation of wages of all employees for both the firms.

Solution : (i) For finding out which firm pays larger amount, we have to find out ∑X.
X =         or ∑X = NX

Firm A : N = 100, X = 240   ,  ∑X = 100 × 240 = 24000
Firm B : N = 200, X = 170 ,    ∑X = 200 × 170 = 34000
Hence firm B pays larger amount as monthly wages.

(ii) For finding out which firm shows greater variability in the distribution of wages, we have to calculate coefficient of variation.
Firm A : C.V. =
Firm B : C.V. =
Since coefficient of variation is greater for firm B. hence it shows greater variability in the distribution of wages.
(iii) Combined wages : =
where, N1 = 100, = 240, N2 = 200, = 170
Hence =

Combined Standard Deviation :
σ12 =
where N1 = 100, N2 = 200, σ1 = 6, σ2 = 8, = 240 – 193.3 = 46.7
and d1 = = 170 – 193.3 = – 23.3
σ12  =
=

Example : From the following frequency distribution of heights of 360 boys in the age-group 10 – 20 years calculate the :
(i) arithmetic mean;
(ii) coefficient of variation; and
(iii) quartile deviation

----------------------------------------------------------------------------

Height (cms)    No. of boys        Height (cms)      No. of boys

----------------------------------------------------------------------------
126 – 130          31                    146 – 150             60
131 – 135          44                    151 – 155             55
136 – 140          48                    156 – 160             43
141 – 145          51                    161 – 165             28

Solution : Calculation of , Q.D., and C.V.

---------------------------------------------------------------------------------------

Heights        m.p.                         (X – 143)/5
                   X              f                dx           fdx          fdx2          c.f.

---------------------------------------------------------------------------------------
126 – 130    128             31            – 3          – 93             279            31

131 – 135    133             44             – 2         – 88            176             75
136 – 140    138             48             – 1         – 48            48              123
141 – 145    143             51               0             0             0               174
146 – 150    148             60             + 1        + 60            60               234
151 – 155    153             55             + 2        + 10           220              289
156 – 160    158             43             + 3        + 129         387               332
161 – 165    163             28             + 4        + 112         448               360

---------------------------------------------------------------------------------------
                N = 45                                   ∑fdx = 182     fdx2 = 1618

---------------------------------------------------------------------------------------

(i) = where, N = 360, A = 143, i = 5,  ∑fdx = 182
=
(ii) C.V. =
σ =
=
C.V. =
(iii) Q.D. =
Q1 = Size of th observation = observation
Q1 lies in the class 136 – 140. But the real limits of this class is 135.5 – 140.5
Q1 =
Q3 = Size of observition = observation
Q3 lies in the class 151 – 155. But the real limit of this class is 150 – 155.5
Q3 =
Q.D. =