Arithmetic mean vs Geometric mean
Posted by Kelvin on 24 Aug 2010 at 06:29 pm | Tagged as: programming
I've been brushing up on some basic statistics, and ran into this interesting bit of information.
We're all familiar with the average of a set of values, also known as the mean.
Arithmetic Mean
Turns out that there's more than one way to calculate the mean of a distribution. The method we probably associate with the average, is also known as the arithmetic mean.
The arithmetic mean is calculated by adding up all the numbers in a data set and dividing the result by the total number of data points.
Example: Arithmetic mean of 11, 13, 17 and 1,000 = (11 + 13 + 17 + 1,000) / 4 = 260.25
Geometric Mean
There is another way to calculate the mean, known as the geometric mean.
This is calculated by multiplying the numbers in the dataset, and taking the nth root of the result.
Example: Geometric mean of 11, 13, 17 and 1,000 = 4th root of (11 x 13 x 17 x 1,000) = 39.5
Geometric Mean and Logarithm
Another way to think of the geometric mean, is as the average of the logarithmic values of a data set, converted back to a base 10 number.
Lets work this out with the numbers 2 and 32.
So, the first way of calculating the geometric mean is by multiplication, then nth root:
sqrt(2 x 32) = sqrt(64) = 8
Remember, we take a square root because n=2, which is the 2nd root or square root.
Now, the second way of calculating the geometric mean is by expressing the numbers in term of a logarithm. In this case, we'll choose base-2.
2=21
32=25
21 x 25 = 26 (=64)
the square root of 26 is 23 (=8)
Another way of arriving at the same result is by taking the average of the exponents, i.e. (1+5)/2 = 3. Then re-expressing in terms of base 10, i.e. 23 = 8.
When to use which
If you look again at the first example given above,
Arithmetic mean of 11, 13, 17 and 1,000 = (11 + 13 + 17 + 1,000) / 4 = 260.25
Geometric mean of 11, 13, 17 and 1,000 = 4th root of (11 x 13 x 17 x 1,000) = 39.5
A geometric mean, unlike an arithmetic mean, tends to dampen the effect of very high or low values, which might bias the mean if a straight average (arithmetic mean) were calculated.
As stated on eHow.com:
Statisticians use arithmetic means to represent data with no significant outliers. This type of mean is good for representing average temperatures, because all the temperatures for January 22 in Chicago will be between -50 and 50 degrees F. A temperature of 10,000 degrees F is just not going to happen. Things like batting averages and average race car speeds are also represented well using arithmetic means.
Geometric means are used in cases where the differences among data points are logarithmic or vary by multiples of 10. Biologists use geometric means to describe the sizes of bacterial populations, which can be 20 organisms one day and 20,000 the next. Economists can use geometric means to describe income distributions. You and most of your neighbors might make around $65,000 per year, but what if the guy up on the hill makes $65 million per year? The arithmetic mean of the income in your neighborhood would be misleading here, so a geometric mean would be more suitable.
Geometric mean is often used to evaluate data covering several orders of magnitude. If your data covers a narrow range, or if the data is normally distributed around high values (i.e. skew to the left), geometric means may not be appropriate.
Geometric means is more appropriate than the arithmetic mean for describing proportional growth, both exponential growth (constant proportional growth) and varying growth; in business this is known as the compound annual growth rate (CAGR).
The geometric mean of growth over periods yields the equivalent constant growth rate that would yield the same final amount.
Do not use geometric mean on data that is already log transformed such as pH or decibels (dB).
Practical Applications
Many!
For example, according to this article:
Many wastewater dischargers, as well as regulators who monitor swimming beaches and shellfish areas, must test for and report fecal coliform bacteria concentrations. Often, the data must be summarized as a "geometric mean" (a type of average) of all the test results obtained during a reporting period. Typically, public health regulations identify a precise geometric mean concentration at which shellfish beds or swimming beaches must be closed.
A geometric mean, unlike an arithmetic mean, tends to dampen the effect of very high or low values, which might bias the mean if a straight average (arithmetic mean) were calculated. This is helpful when analyzing bacteria concentrations, because levels may vary anywhere from 10 to 10,000 fold over a given period. As explained below, geometric mean is really a log-transformation of data to enable meaningful statistical evaluations.