# Mathematics

## Statistics-1

### 4 LESSON 4 MEASURES OF DISPERSION

**LESSON 4****MEASURES OF DISPERSION**

**Why dispersion?**

Measures of central tendency, Mean, Median, Mode, etc., indicate the central position of a series. They indicate the general magnitude of the data but fail to reveal all the peculiarities and characteristics of theseries. In other words, they fail to reveal the degree of the spread out or the extent of the variability inindividual items of the distribution. This can be explained by certain other measures, known as ‘Measures ofDispersion’ or Variation.

We can understand variation with the help of the following example :

**----------------------------------------------**

**Series 1 Series 11 Series III**

**---------------------------------------------**

10 2 10

10 8 12

10 20 8

**---------------------------------------------****∑X = 30 30 30**

**----------------------------------------------**

In all three series, the value of arithmetic mean is 10. On the basis of this average, we can say that the series are alike. If we carefully examine the composition of three series, we find the following differences:

(i) In case of 1st series, three items are equal; but in 2nd and 3rd series, the items are unequal and do not follow any specific order.

(ii) The magnitude of deviation, item-wise, is different for the 1st, 2nd and 3rd series. But all these deviations cannot be ascertained if the value of simple mean is taken into consideration.

(iii) In these three series, it is quite possible that the value of arithmetic mean is 10; but the value of median may differ from each other. This can be understood as follows ;

**I II III**

10 2 8

10 Median 8 Median 10 Median

10 20 12

The value of Median’ in 1st series is 10, in 2nd series = 8 and in 3rd series = 10. Therefore, the value of the Mean and Median are not identical.

(iv) Even though the average remains the same, the nature and extent of the distribution of the size of the items may vary. In other words, the structure of the frequency distributions may differ even (though their means are identical.

**What is Dispersion?**

Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the sizes or quantities of the items of a group or series. According to Reiglemen, “Dispersion is the extent to which the magnitudes or quantities of the items differ, the degree of diversity.” The word dispersion may also be used to indicate the spread of the data.

In all these definitions, we can find the basic property of dispersion as a value that indicates the extent to which all other values are dispersed about the central value in a particular distribution.

**Properties of a good measure of Dispersion**

There are certain pre-requisites for a good measure of dispersion:

1. It should be simple to understand.

2. It should be easy to compute.

3. It should be rigidly defined.

4. It should be based on each individual item of the distribution.

5. It should be capable of further algebraic treatment.

6. It should have sampling stability.

7. It should not be unduly affected by the extreme items.

**Types of Dispersion**

The measures of dispersion can be either ‘absolute’ or “relative”. Absolute measures of dispersion are expressed in the same units in which the original data are expressed. For example, if the series is expressed as Marks of the students in a particular subject; the absolute dispersion will provide the value in Marks. The only difficulty is that if two or more series are expressed in different units, the series cannot be compared on the basis of dispersion.

‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure of absolute dispersion to an appropriate average. The basic advantage of this measure is that two or more series can be compared with each other despite the fact they are expressed in different units. Theoretically, ‘Absolute measure’ of dispersion is better. But from a practical point of view, relative or coefficient of dispersion is considered better as it is used to make comparison between series.

**Methods of Dispersion**

Methods of studying dispersion are divided into two types :**(i) Mathematical Methods:** We can study the ‘degree’ and ‘extent’ of variation by these methods. In this category, commonly used measures of dispersion are :

(a) Range

(b) Quartile Deviation

(c) Average Deviation

(d) Standard deviation and coefficient of variation.**(ii) Graphic Methods:** Where we want to study only the extent of variation, whether it is higher or lesser a Lorenz-curve is used.

**Mathematical Methods**

**(a) Range**It is the simplest method of studying dispersion. Range is the difference between the smallest value and the largest value of a series. While computing range, we do not take into account frequencies of different groups.

Formula: Absolute Range = L – S

Coefficient of Range =

where, L represents largest value in a distribution

S represents smallest value in a distribution

We can understand the computation of range with the help of examples of different series,

**(i) Raw Data:**Marks out of 50 in a subject of 12 students, in a class are given as follows:

12, 18, 20, 12, 16, 14, 30, 32, 28, 12, 12 and 35.

In the example, the maximum or the highest marks obtained by a candidate is ‘35’ and the lowest marks obtained by a candidate is ‘12’. Therefore, we can calculate range;

L = 35 and S = 12

Absolute Range = L – S = 35 – 12 = 23 marks

Coefficient of Range =

**(ii) Discrete Series**

**----------------------------------------------------------****Marks of the Students in No. of students****Statistics (out of 50)**

** (X) (f)**

**-----------------------------------------------------------**

Smallest 10 4

12 10

18 16

Largest 20 15

**-----------------------------------------------------------**

**Total = 45**

**-----------------------------------------------------------**

Absolute Range = 20 – 10 = 10 marks

Coefficient of Range =

**(iii) Continuous Series**

**------------------------------------------**** X Frequencies**

**------------------------------------------**

10 – 15 4

S = 10 15 – 20 10

L = 30 20 – 25 26

25 – 30 8

**-------------------------------------------**

Absolute Range = L – S = 30 – 10 = 20 marks

Coefficient of Range =

Range is a simplest method of studying dispersion. It takes lesser time to compute the ‘absolute’ and ‘relative’ range. Range does not take into account all the values of a series, i.e. it considers only the extreme items and middle items are not given any importance. Therefore, Range cannot tell us anything about the character of the distribution. Range cannot be computed in the case of “open ends’ distribution i.e., a distribution where the lower limit of the first group and upper limit of the higher group is not given.

The concept of range is useful in the field of quality control and to study the variations in the prices of the shares etc.

**(b) Quartile Deviations (Q.D.)**

The concept of ‘Quartile Deviation does take into account only the values of the ‘Upper quartile (Q3) and the ‘Lower quartile’ (Q1). Quartile Deviation is also called ‘inter-quartile range’. It is a better method when we are interested in knowing the range within which certain proportion of the items fall.

‘Quartile Deviation’ can be obtained as :

(i) Inter-quartile range = Q3 – Q1

(ii) Semi-quartile range =

(iii) Coefficient of Quartile Deviation =

Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile Deviation in case of Raw Data

Suppose the values of X are : 20, 12, 18, 25, 32, 10

In case of quartile-deviation, it is necessary to calculate the values of Q1 and Q3 by arranging the given data in ascending of descending order.

Therefore, the arranged data are (in ascending order):

X = 10, 12, 18, 20, 25, 32

No. of items = 6

Q1 = the value of item = = 1.75th item

= the value of 1st item + 0.75 (value of 2nd item – value of 1st item)

= 10 + 0.75 (12 – 10) = 10 + 0.75(2) = 10 + 1.50 = 11.50

Q3 = the value of item =

= the value of 3(7/4)th item = the value of 5.25th item

= 25 + 0.25 (32 – 25) = 25 + 0.25 (7) = 26.075

Therefore,

(i) Inter-quartile range = Q3 – Q1 = 26.75 – 11.50 = 15.25

(ii) Semi-quartile range =

(iii) Coefficient of Quartile Deviation =

**Calculation of Inter-quartile Range, semi-quartile Range and Coefficient of Quartile Deviation in discrete series**

Suppose a series consists of the salaries (Rs.) and number of the workers in a factory:

**----------------------------------------**

**Salaries (Rs.) No. of workers**

**----------------------------------------**

60 4

100 20

120 21

140 16

160 9

**----------------------------------------**

In the problem, we will first compute the values of Q3 and Q1

**-------------------------------------------------------------------------------------**

**Salaries (Rs.) No. of workers Cumulative frequencies**

**(x) (f) (c.f.)**

**--------------------------------------------------------------------------------------**

60 4 4

100 20 24 – Q1 lies in this cumulative

120 21 45 frequency

140 16 61 – Q3 lies in this cumulative

160 9 70 frequency

**---------------------------------------------------------------------------------------**

** N = Sf = 70**

**----------------------------------------------------------------------------------------**

**Calculation of Q1 : Calculation of Q3 :**

Q1 = size of th item Q3 = size of th item

= size of th item = 17.75 = size of th item = 53.25th item

17.75 lies in the cumulative frequency 24, 53.25 lies in the cumulative frequency 61 which

which is corresponding to the value Rs. 100 is corresponding to Rs. 140

Q1 = Rs. 100 Q3 = Rs. 140

**-------------------------------------------------------------------------------------------**

(i) Inter-quartie range = Q3 – Q1 = Rs. 140 – Rs. 100 = Rs. 40

(ii) Semi-quartie range =

(iii) Coefficient of Quartile Deviation =

**Calculation of Inter-quartile range, semi-quartile range and Coefficient of Quartile Deviation in case of continuous series**

We are given the following data :

**---------------------------------------------**

**Salaries (Rs.) No. of Workers**

**---------------------------------------------**

10 – 20 4

20 – 30 6

30 – 10 10

40 – 50 5

**--------------------------------------------**

Total = 25

In this example, the values of Q3 and Q1 are obtained as follows:

Salaries (Rs.) No. of workers Cumulative frequencies

(x) (f) (c.f.)

10 – 20 4 4

20 – 30 6 10

30 – 40 10 20

40 – 50 5 25

N = 25

Q1 =

Therefore, . It lies in the cumulative frequency 10, which is corresponding to class

20 – 30.

Therefore, Q1 group is 20 – 30.

where, l1 = 20, f = 6, i = 10, and cfo = 4

Q1 =

Q3 =

Therefore, = 18.75, which lies in the cumulative frequency 20, which is corresponding to class 30 –40, Therefore Q3 group is 30 – 40.

where, l1 = 30, i = 10, cf0 = 10, and f = 10

Q3 = = Rs. 38.75

Therefore :

(i) Inter-quartile range = Q3 – Ql = Rs. 38.75 – Rs. 23.75 = Rs.15.00

(iii) Semi-quartile range =

(iii) Coefficient of Quartile Deviation =

**Advantages of Quartile Deviation**

Some of the important advantages are :

(i) It is easy to calculate. We are required simply to find the values of Q1 and Q3 and then apply the formula of absolute and coefficient of quartic deviation.

(ii) It has better results than range method. While calculating range, we consider only the extreme values that make dispersion erratic, in the case of quartile deviation, we take into account middle 50% items.

(iii) The quartile deviation is not affected by the extreme items.**Disadvantages**

(i) It is completely dependent on the central items. If these values are irregular and abnormal the result is bound to be affected.

(ii) All the items of the frequency distribution are not given equal importance in finding the values of Q1 and Q3.

(iii) Because it does not take into account all the items of the series, considered to be inaccurate.

Similarly, sometimes we calculate percentile range, say, 90th and 10th percentile as it gives slightly better measure of dispersion in certain cases.

(i) Absolute percentile range = P90 – P10.

(ii) Coefficient of percentile range =

This method of calculating dispersion can be applied generally in case of open end series where the importance of extreme values are not considered.

**(c) Average Deviation**

Average deviation is defined as a value which is obtained by taking the average of the deviations of various items from a measure of central tendency Mean or Median or Mode, ignoring negative signs. Generally, the measure of central tendency from which the deviations arc taken, is specified in the problem. If nothing is mentioned regarding the measure of central tendency specified than deviations are

taken from median because the sum of the deviations (after ignoring negative signs) is minimum.**Computation in case of raw data**

(i) Absolute Average Deviation about Mean or Median or Mode=

where: N = Number of observations,

|d| = deviations taken from Mean or Median or Mode ignoring signs.

(ii) Coefficient of A.D. =**Steps to Compute Average Deviation :**

(i) Calculate the value of Mean or Median or Mode

(ii) Take deviations from the given measure of central-tendency and they are shown as d.

(iii) Ignore the negative signs of the deviation that can be shown as \d\ and add them to find S|d|.

(iv) Apply the formula to get Average Deviation about Mean or Median or Mode.**Example :** Suppose the values are 5, 5, 10, 15, 20. We want to calculate Average Deviation and Coefficient of Average Deviation about Mean or Median or Mode.**Solution :** Average Deviation about mean (Absolute and Coefficient).

**-------------------------------------------------------------------------------------------**

** Deviation from mean Deviations after ignoring signs****(x) d | d |**

**--------------------------------------------------------------------------------------------**

5 – 6 6 =

5 – 6 6 where N = 5. SX = 55

10 + 1 1

15 + 4 4

20 + 9 9

**--------------------------------------------------------------------------------------------**

**∑X = 55 ∑| d | = 26**

**--------------------------------------------------------------------------------------------**

Average Deviation about Mean =

Coefficient of Average Deviation about mean =**Average Deviation (Absolute and Coefficient) about Median**

Average Deviation about Mode =

Coefficient of Average Deviation about Mode =**Average deviation in case of discrete and continuous series**

Average Deviation about Mean or Median or Mode =

where N = No. of items

|d| = deviations from Mean or Median or Mode after ignoring signs.

Coefficient of A.D. about Mean or Median or Mode =**Example:** Suppose we want to calculate coefficient of Average Deviation about Mean from the following discrete series:

**X Frequency**

10 5

15 10

20 15

25 10

30 5

**Solution:** First of all, we shall calculate the value of arithmetic Mean,**Calculation of Arithmetic Mean**

Coefficient of Average Deviation about Mean =

Average Deviation about Mean =

In case we want to calculate coefficient of Average Deviation about Median from the following data:

**Class Interval Frequency**

10 – 14 5

15 – 19 10

20 – 24 15

25 – 29 10

30 – 34 5

First of all we shall calculate the value of Median but it is necessary to find the ‘real limits’ of the given class-intervals. This is possible by subtracting 0.5 from all the lower-limits and add 0.5 to all the upper limits of the given classes. Hence, the real limits shall be : 9.5 – 14.5, 14.5 – 19.5, 19.5 – 24.5, 24.5 – 29.5 and 29.5 – 34.5**Calculation of Median**

**Advantages of Average Deviations**1. Average deviation takes into account all the items of a series and hence, it provides sufficiently representative results.

2. It simplifies calculations since all signs of the deviations are taken as positive.

3. Average Deviation may be calculated either by taking deviations from Mean or Median or Mode.

4. Average Deviation is not affected by extreme items.

5. It is easy to calculate and understand.

6. Average deviation is used to make healthy comparisons.**Disadvantages of Average Deviations**

1. It is illogical and mathematically unsound to assume all negative signs as positive signs.

2. Because the method is not mathematically sound, the results obtained by this method are not reliable.

3. This method is unsuitable for making comparisons either of the series or structure of the series.

This method is more effective during the reports presented to the general public or to groups who are not familiar with statistical methods.

**(d) Standard Deviation**

The standard deviation, which is shown by greek letter s (read as sigma) is extremely useful in judging the representativeness of the mean. The concept of standard deviation, which was introduced by Karl Pearson has a practical significance because it is free from all defects, which exists in a range, quartile deviation or average deviation.

Standard deviation is calculated as the square root of average of squared deviations taken from actual mean. It is also called root mean square deviation. The square of standard deviation i.e., s2 is called ‘variance’.**Calculation of standard deviation in case of raw data**

There are four ways of calculating standard deviation for raw data:

(i) When actual values are considered;

(ii) When deviations are taken from actual mean;

(iii) When deviations are taken from assumed mean; and

(iv) When ‘step deviations’ are taken from assumed mean.

**(i) When the actual values are considered:**

σ = where, N = Number of the items,

or σ^{2} = X = Given values of the series,

= Arithmetic mean of the series

We can also write the formula as follows :

σ = where, =**Steps to calculate σ**

(i) Compute simple mean of the given values,

(ii) Square the given values and aggregate them

(iii) Apply the formula to find the value of standard deviation**Example:** Suppose the values are given 2, 4, 6, 8, 10. We want to apply the formula

**σ** =**Solution:** We are required to calculate the values of N, , SX2. They are calculated as follows :

**X X ^{2}**

2 4

4 16

6 36

8 64

10 100

**N = 5 ∑X**

^{2}= 220**σ** =

Variance (**σ**)^{2} =

=

**(ii) When the deviations are taken from actual mean**** σ** = where, N = no. of items and x = (X – )

**Steps to Calculate σ**

(i) Compute the deviations of given values from actual mean i.e., (X – ) and represent them by x.

(ii) Square these deviations and aggegate them

(iii) Use the formula,

**=**

**σ****Example :**We are given values as 2, 4, 6, 8, 10. We want to find out standard deviation.

**X (X – ) = x x ^{2}**

2 2 – 6 = – 4 (– 4)

^{2}= 16

4 4 – 6 = – 2 (– 2)

^{2}= 4

6 6 –6 = 0 = 0

8 8 – 6 = + 2 (2)

^{2}= 4

10 10 – 6 = + 4 (4)

^{2}= 16

**N = 5 ∑x**

^{2}= 40**(iii) When the deviations are taken from assumed mean**σ =

where, N = no. of items,

dx = deviations from assumed mean i.e., (X – A).

A = assumed mean**Steps to Calculate :**

(i) We consider any value as assumed mean. The value may be given in the series or may not be given in the series.

(ii) We take deviations from the assumed value i.e., (X – A), to obtain dx for the series and aggregate them to find ∑dx.

(iii) We square these deviations to obtain dx2 and aggregate them to find ∑dx^{2}.

(iv) Apply the formula given above to find standard deviation.

**Example :** Suppose the values are given as 2, 4, 6, 8 and 10. We can obtain the standard deviation as:

**-----------------------------------------------------------------**

** X dx = (X – A) dx ^{2}**

^{}-----------------------------------------------------------------

2 – 2 = (2 – 4) 4

assumed mean (A) 4 0 = (4 – 4) 0

6 + 2 = (6 – 4) 4

8 + 4 = (8 – 4) 16

10 + 6 = (10 – 4) 36

**-----------------------------------------------------------------**

** N = 5 ∑dx = 10 ∑dx ^{2} = 60**

**-----------------------------------------------------------------**

**(iv) When step deviations are taken from assumed mean**

σ =

where, i = common factor, N = number of item, dx (Step-deviations) =**Steps to Calculate :**

(i) We consider any value as assumed mean from the given values or from outside.

(ii) We take deviation from the assumed mean i.e. (X – A).

(iii) We divide the deviations obtained in step (ii) with a common factor to find step deviations and represent them as dx and aggregate them to obtain ∑dx.

(iv) We square the step deviations to obtain dx^{2} and aggregate them to find ∑dx^{2}.**Example :** We continue with the same example to understand the computation of Standard Deviation.

X d = (X – A) dx = and i = 2 dx2

2 – 2 1 1

A = 4 0 0 0

6 + 2 1 1

8 + 4 2 4

10 + 6 3 9

N = 5 Sdx = 5 Sdx2 = 15

s = where N = 5, i = 2, dx = 5, and Sdx2 = 15

s =

Note :We can notice an important point that the standard deviation value is identical by four methods. Therefore any of the four formulae can be applied to find the value of standard deviation. But the suitability of a formula depends on the magnitude of items in a question.

Coefficient of Standard-deviation =

In the above given example, s = 2.828 and = 6

Therefore, coefficient of standard deviation =

**Coefficient of Variation or C. V.**

=

Generally, coefficient of variation is used to compare two or more series. If coefficient of variation (C.V.) is more for one series as compared to the other, there will be more variations in that series, lesser stability or consistency in its composition. If coefficient of variation is lesser as compared to other series, it will be more stable or consistent. Moreover that series is always better where coefficient of variation or coefficient of standard deviation is lesser.**Example :** Suppose we want to compare two firms where the salaries of the employees are given as follows:

** Firm A FirmB**

No. of workers 100 100

Mean salary (Rs.) 100 80

Standard-deviation (Rs.) 40 45

**Solution :** We can compare these firms either with the help of coefficient of standard deviation or coefficient of variation. If we use coefficient of variation, then we shall apply the formula :

**Firm A Firm B**

C.V. = C.V. =

= 100, σ = 40. = 80, σ = 45

Because the coefficient of variation is lesser for firm A than firm B, therefore, firm A is less variable and more stable.**Calculation of standard-deviation in discrete and continuous series**

We use the same formula for calculating standard deviation for a discrete series and a continuous series. The only difference is that in a discrete series, values and frequencies are given whereas in a continuous series, class-intervals and frequencies are given. When the mid-points of these class-intervals are obtained, a continuous series takes shape of a discrete series. X denotes values in a discrete series and mid points in a continuous series.**When the deviations are taken from actual mean**

We use the same formula for calculating standard deviation for a continuous series

σ =

where N = Number of items

f = Frequencies corresponding to different values or class-intervals.

x = Deviations from actual mean (X – ).

X = Values in a discrete series and mid-points in a continuous series.

**Step to calculate σ**

(i) Compute the arithmetic mean by applying the required formula.

(ii) Take deviations from the arithmetic mean and represent these deviations by x.

(iii) Square the deviations to obtain values of x .

(iv) Multiply the frequencies of different class-intervals with x^{2} to find fx^{2}. Aggregate fx^{2} column to obtain ∑ fx^{2}.

(v) Apply the formula to obtain the value of standard deviation.

If we want to calculate variance then we can compute σ^{2} =**Example :** We can understand the procedure by taking an example :

σ = where, N = 45, ∑fx^{2} = 1500

σ =

When the deviations are taken from assumed mean

In some cases, the value of simple mean may be in fractions, them it becomes time consuming to

take deviations and square them. Alternatively, we can take deviations from the assumed mean.

σ =

where N = Number of the items,

dx = deviations from assumed mean (X – A),

f = frequencies of the different groups,

A = assumed mean and

X = Values or mid points.**Step to calculate ****σ**

(i) Take the assumed mean from the given values or mid points.

(ii) Take deviations from the assumed mean and represent them by dx.

(iii) Square the deviations to get dx^{2} .

(iv) Multiply f with dx of different groups to abtain fdx and add them up to get ∑fdx.

(v) Multiply f with dx^{2} of different groups to abtain fdx^{2} and add them up to get ∑fdx^{2}.

(vi) Apply the formula to get the value of standard deviation.

**Steps to calculate σ**

(i) Take deviations from the assumed mean of the calculated mid-points and divide all deviations by a common factor (i) and represent these values by dx.

(ii) Square these step deviations dx to obtain dx^{2} for different groups.

(iii) Multiply f with dx of different groups to find fdx and add them to obtain fdx .

(iv) Multiply f with dx^{2} of different groups to find fdx2 for different groups and add them to obtain ∑fdx2.

(v) Apply the formula to find standard deviation.

**Advantages of Standard Deviation**

(i) Standard deviation is the best measure of dispersion because it takes into account all the items and is capable of future algebric treatment and statistical analysis.

(ii) It is possible to calculate standard deviation for two or more series.

(iii) This measure is most suitable for making comparisons among two or more series about varibility.**Disadvantages**

(i) It is difficult to compute.

(ii) It assigns more weights to extreme items and less weights to items that are nearer to mean. It is because of this fact that the squares of the deviations which are large in size would be proportionately greater than the squares of those deviations which are comparatively small.

**Mathematical properties of standard deviation (σ)**

(i) If deviations of given items are taken from arithmetic mean and squared then the sum of squared deviation should be minimum, i.e., = Minimum,

(ii) If different values are increased or decreased by a constant, the standard deviation will remain the same. If different values arc multiplied or divided by a constant than the standard deviation will be multiplied or divided by that constant.

(iii) Combined standard deviation can be obtained for two or more series with below given formula:

σ_{12} =

where:

N_{1} represents number of items in first series,

N_{2} represents number of items in second series,

represents variance of first series,

represents variance of second series,

d_{1} represents the difference between

d_{2} represents the difference between

represents arithmetic mean of first series,

represents arithmetic mean of second series,

represents combined arithmetic mean of both the series.

**Example :** Find the combined smadard deviation of two series, from the below given information :

**First Series Second Series**

No. of items 10 15

Arithmetic means 15 20

Standard deviation 4 5

**Solution :** Since we are considering two series, therefore combined standard deviation is computed by the following formula :

σ_{12} =

where: N_{1} = 10, N_{2} = 15, , , σ_{1} = 4, s2 = 5

=

or =

d_{1} =

By applying the formula of combined standard deviation, we get :

σ_{12} =

=

=**(iv) Standard deviation of n natural numbers can he computed as :**

σ = where, N represents number of items.

**(v) For a symmetrical distribution**

+ σ covers 68.27% of items.

+ 2σ covers 95.45% of items.

+ 3σ covers 99.73% of items.

**Example :** You are heading a rationing department in a State affected by food shortage. Local investigators submit the following report:

**Daily calorie value of food available per adult during current period :**

** Area Mean Standard deviation**

A 2,500 400

B 2,000 200

The estimated requirement of an adult is taken at 2,800 calories daily and the absolute minimum is 1,350. Comment on the reported figures, and determine which area, in your opinion, need more urgent attention.**Solution :** We know that + σ covers 68.27% of items. + 2σ covers 95.45% of items and + 3σ covers 99.73% . In the gjven problem if we take into consideration 99.73%. i.e., almost the whole population, the limits would be + 3σ.

For Area A these limits are :

+ 3σ = 2,500 + (3 × 400) = 3,700

– 3σ = 2,500 – (3 × 400) = 1,300

For Area B these limits are :

+ 3σ = 2,000 + (3 × 200) = 2,600

– 3σ = 2,000 – (3 × 200) = 1,400

It is clear from above limits that in Area A there are some persons who are getting 1300 calories, i.e. below the minimum which is 1,350. But in case of area B there is no one who is getting less than the minimum. Hence area A needs more urgent attention.

**(vi) Relationship between quartile deviation, average deviation and standard deviation is given as:**

Quartile deviation = 2/3 Standard deviation

Average deviation = 4/5 Standard deviation**(vii) We can also compute corrected standard deviation by using the following formula :**

Correct σ =

(a) Compute corrected =

where, corrected ∑f = ∑X + correct items – wrong items

where, ∑X = N.

(b) Compute corrected ∑X^{2} =∑ X^{2} + (Each correct item)^{2} – (Each wrong item}^{2}

where, ∑X^{2} = Nσ^{1} +

**Example :** (a) Find out the coefficient of variation of a series for which the following results are given :

N = 50, ∑X’ = 25, ∑X’^{2} = 500 where: X’ = deviation from the assumed average 5.

(b) For a frequency distribution of marks in statistics of 100 candidates, (grouped in class inervals of 0 – 10, 10 – 20) the mean and standard deviation were found to be 45 and 20. Later it was discovered that the score 54 was misread as 64 in obtaining frequency distribution. Find out the correct mean and correct standard deviation of the frequency destribution.

(c) Can coefficient of variation be greater than 100%? If so, when?**Solution :** (a) We want to calculate, coefficient of variation which is =

Therefore, we are required to calculate mean and standard deviation.**Calculation of simple mean**

= where, A = 5, N = 50, ∑X’ = 25

Calculation of standard deviation

σ =

Calculation of Coefficient of variation

C.V. =

(b) Given = 45, σ = 20, N = 100, wrong value = 64, correct value = 54

Since this is a case of continuous series, therefore, we will apply the formula for mean and standard deviation that are applicable in a continuous series.

**Calculation of correct Mean**

= or N = ∑fX

By substituting the values, we get 100 × 45 = 4500

Correct ∑fX = 4500 – 64 + 54 = 4490

Correct =

**Calculation of correct σ ****σ** = or **σ**^{2} =

where, **σ** = 20, N = 100, = 45

(20)^{2} =

or 400 =

or 400 + 2025 =

or 2425 × 100 = ∑fX^{2} = 242500

\ Correct ∑fX^{2} = 242500 – (64)^{2} + (54)^{2} = 242500 – 4096 + 2916 = 242500 – 1180 = 241320

Correct **σ** =

**(c) The formulae for the computation of coefficient of variation is =**

Hence, coefficient of variation can be greater than 100% only when the value of standard deviation is greater than the value of mean.

This will happen when data contains a large number of small items and few items are quite large. In such a case the value of simple mean will be pulled down and the value of standard deviation will go up. Similarly, if there arc negative items in a series, the value of mean will come down and the value of standard deviation shall not be affecied because of squaring the deviations.**Example :** In a distribution of 10 observations, the value of mean and standard deviation are given as 20 and 8. By mistake, two values are taken as 2 and 6 instead of 4 and 8. Find out the value of correct mean and variance.**Solution :** We are given: N – 10, = 20, σ = 3

Wrong values = 2 and 6 and Correct values = 4 and 8

Calculation of correct Mean

=

∑X = 10 × 20 = 200

But ∑X is incorrect. Therefore we shall find correct ∑X.

Correct ∑X = 200 – 2 – 6 + 4 + 8 = 204

Correct Mean =

Calculation of correct variance

σ =

or σ^{2} =

or (8)^{2} =

or 64 + 400 =

or ∑X^{2} = 4640

But this is wrong and hence we shall compute correct ∑X^{2}

Correct ∑X^{2} = 4640 – 22 – 62 + 42 + 82

= 4640 – 4 – 36 + 16 + 64 = 4680

Correct σ^{2} =

**Revisionary Problems****Example :** Compute (a) Inter-quartile range. (b) Semi-quartile range, and (c) Coefficient of quartile deviation from the following data :

**--------------------------------------------------------------------------------------**

**Farm Size (acres) No. of firms Farm Size (acres) No. of firms**

**--------------------------------------------------------------------------------------**

0 – 40 394 161 – 200 169

41 – 80 461 201 – 240 113

81 – 120 391 24 1 and over 148

121 – 160 334

**-------------------------------------------------------------------------------------**

**Solution :**

In this case, the real limits of the class intervals are obtained by subtracting 0.5 from the lower limits of each class and adding 0.5 to the upper limits of each class. This adjustment is necessary to calculate median and quartiles of the series.

**-------------------------------------------------------------------------------**

**Farm Size (acres) No. of firms Cumulative frequency (c.f.)**

**-------------------------------------------------------------------------------**

– 0.5 – 10.5 394 394

40.5 – 80.5 461 855

80.5 – 120.5 391 1246

120.5 – 160.5 334 1580

160.5 – 200.5 169 1749

200.5 – 240.5 113 1862

240.5 and over 148 2010

**-------------------------------------------------------------------------------**

**N = 2010**

**-------------------------------------------------------------------------------**

Q1 =

=

Q1 lies in the cumulative frequency of the group 40.5 – 80.5. and l1 = 40.5, f = 461, i = 40, cf0 = 394, = 502.5

Q1 =

Similarly, Q3 =

=

Q3 lies in the cumulative frequency of the group 121 – 160, where the real limits of the class interval are 120.5 – 160.5 and l1 = 120.5, i = 40, f = 334, = 1507.5, c.f. = 1246

Q3 =

Inter-quartile range = Q3 – Q1 = 151.8 – 49.9 = 101.9 acres

Semi-quartile range =

Coefficient of quartile deviation =

**Example :** Calculate mean and coefficient of mean deviation about mean from the following data :

**----------------------------------------------Marks less than No. of students**

**----------------------------------------------**

10 4

20 10

30 20

40 40

50 50

60 56

70 60

**----------------------------------------------**

**Solution :**

In this question, we are given less than type series alongwith the cumulative frequencies. Therefore, we are required first of all to find out class intervals and frequencies for calculating mean and coefficient of mean deviation about mean.

M.D. about mean =

Coefficient of M.D. about mean =**Example :** Calculate standard deviation from the following data :

**Class Interval frequency**

– 30 to – 20 5

– 20 to – 10 10

– 10 to – 0 15

0 to 10 10

10 to 20 5

N = 45

**Example :** For two firms A and B belonging to same industry, the following details are available :

**---------------------------------------------------------------------------------- Firm A Firm B**

**---------------------------------------------------------------------------------**

Number of Employees : 100 200

Average wage per month : Rs. 240 Rs. 170

Standard deviation of the wage per month : Rs. 6 Rs. 8

**---------------------------------------------------------------------------------**

Find (i) Which firm pays out larger amount as monthly wages?

(ii) Which firm shows greater variability in the distribution of wages?

(iii) Find average monthly wages and the standard deviation of wages of all employees for both the firms.

**Solution :** (i) For finding out which firm pays larger amount, we have to find out ∑X.

X = or ∑X = NX

Firm A : N = 100, X = 240 , ∑X = 100 × 240 = 24000

Firm B : N = 200, X = 170 , ∑X = 200 × 170 = 34000

Hence firm B pays larger amount as monthly wages.

(ii) For finding out which firm shows greater variability in the distribution of wages, we have to calculate coefficient of variation.

Firm A : C.V. =

Firm B : C.V. =

Since coefficient of variation is greater for firm B. hence it shows greater variability in the distribution of wages.

(iii) Combined wages : =

where, N_{1} = 100, = 240, N_{2} = 200, = 170

Hence =

**Combined Standard Deviation :**

σ_{12} =

where N1 = 100, N2 = 200, σ_{1} = 6, σ_{2} = 8, = 240 – 193.3 = 46.7

and d_{1} = = 170 – 193.3 = – 23.3

σ_{12} =

=

**Example :** From the following frequency distribution of heights of 360 boys in the age-group 10 – 20 years calculate the :

(i) arithmetic mean;

(ii) coefficient of variation; and

(iii) quartile deviation

**----------------------------------------------------------------------------**

**Height (cms) No. of boys Height (cms) No. of boys**

**----------------------------------------------------------------------------**

126 – 130 31 146 – 150 60

131 – 135 44 151 – 155 55

136 – 140 48 156 – 160 43

141 – 145 51 161 – 165 28

**Solution : Calculation of , Q.D., and C.V.**

**---------------------------------------------------------------------------------------**

**Heights m.p. (X – 143)/5**** X f dx fdx fdx ^{2} c.f.**

**---------------------------------------------------------------------------------------**

126 – 130 128 31 – 3 – 93 279 31

131 – 135 133 44 – 2 – 88 176 75

136 – 140 138 48 – 1 – 48 48 123

141 – 145 143 51 0 0 0 174

146 – 150 148 60 + 1 + 60 60 234

151 – 155 153 55 + 2 + 10 220 289

156 – 160 158 43 + 3 + 129 387 332

161 – 165 163 28 + 4 + 112 448 360

**---------------------------------------------------------------------------------------**

** N = 45 ∑fdx = 182 ∑fdx ^{2} = 1618**

**---------------------------------------------------------------------------------------**

(i) = where, N = 360, A = 143, i = 5, ∑fdx = 182

=

(ii) C.V. =

σ =

=

C.V. =

(iii) Q.D. =

Q1 = Size of th observation = observation

Q1 lies in the class 136 – 140. But the real limits of this class is 135.5 – 140.5

Q1 =

Q3 = Size of observition = observation

Q3 lies in the class 151 – 155. But the real limit of this class is 150 – 155.5

Q3 =

Q.D. =

Screen Reader Access