Its all about Data: Central Tendency and Variability

Statisticians refer to the mean and median as measures of central tendency.

Mean and Median :-

Mean : The mean of a sample or a population is computed by adding all of the observations and dividing by the number of observations. ex. 5 different men weight 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds. Mean is (100 + 100 + 130 + 140 + 150)/5 = 620/5 = 124 pounds.

Median : To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. ex. 5 different men weight 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds. Median is 130 pounds; since 130 pounds is the middle weight.

Mean vs. the Median :-

As measures of central tendency, the mean and the median each have advantages and disadvantages.

The median may be a better indicator of the most typical value if a set of scores has an outlier. An outlier is an extreme value that differs greatly from other values.

However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency.

Statisticians use summary measures to describe the amount of variability or spread in a set of data.

Variance is the average squared deviation from the population mean, as defined by the following formula :
σ² := ∑ (X_i - µ)²/N

where σ² is the variance, µ is the mean, X_i is the ith element from the population, and N is the number of elements in the population.

The standard deviation is the square root of the variance. Thus, the standard deviation of a population is:
σ := sqrt [σ²] = sqrt [ ∑ (X_i - µ)²/N ]

where σ is standard deviation, σ² is the variance, µ is the mean, X_i is the ith element from the population, and N is the number of elements in the population.

Conceptual Difference Between Standard Deviation and Variance :

The variance of a data set measures the mathematical dispersion of the data relative to the mean. However, though this value is theoretically correct, it is difficult to apply in a real-world sense because the values used to calculate it were squared. The standard deviation, as the square root of the variance gives a value that is in the same units as the original values, which makes it much easier to work with and easier to interpret in conjunction with the concept of the normal curve.

Standard Deviation, Variance and the Normal Curve :

A normal curve is a theoretical graphical representation of a data set where the values are evenly distributed across the x-axis, with the majority of the values falling fairly close to the mean. On the the normal curve, one standard deviation away from the mean in either direction will represent 68 percent of the population being measured. In terms of variance, less variance in the data or population will result in more than 68 percent of the data falling within the first standard deviation, while more variance has the opposite effect.

68–95–99.7 rule:

In statistics, the 68–95–99.7 rule, also known as the three-sigma rule or empirical rule, states that nearly all values lie within 3 standard deviations of the mean in a normal distribution.
About 68.27% of the values lie within 1 standard deviation of the mean. Similarly, about 95.45% of the values lie within 2 standard deviations of the mean. Nearly all (99.73%) of the values lie within 3 standard deviations of the mean.

Pictorial Representation as :

Standard Scores (z-Scores) :

A standard score ( z-score) indicates how many standard deviations an element is from the mean. A standard score can be calculated from the following formula.

z := (x - µ)/σ

where z is the z-score, X is the value of the element, µ is the mean of the population, and σ is the standard deviation.

Here is how to interpret z-scores :

A z-score less than 0 represents an element less than the mean.

A z-score greater than 0 represents an element greater than the mean.

A z-score equal to 0 represents an element equal to the mean.

A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.

A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.

If the number of elements in the set is large, about 68% of the elements have a z-score between -1 and 1; about 95% have a z-score between -2 and 2; and about 99% have a z-score between -3 and 3.

Its all about Data

Sunday, May 19, 2013

Central Tendency and Variability

No comments:

Post a Comment