Biased estimators: focus on dispersion indicators
One of the aims of a statistical study is to estimate quantities (e.g., mean, variance, standard deviation, etc.) that can be used to describe a population in order to study its characteristics. To estimate these quantities, it is necessary to define mathematical functions called estimators. These estimators are then applied to sample data from the study population. For example, if we want to estimate the mean age of patients in a study population (denoted ), we use the arithmetic mean estimator, where X is the random variable representing the age of patients:
Estimation of the variance of a random variable
To continue with the previous example, let’s imagine that the objective is to estimate the variance of the age of patients in the population. To do this, we need to construct a variance estimator. A first proposal would be to naively estimate the variance using the following estimator:
It is possible to show that this estimator is in fact biased, despite its intuitive appearance. The following variance estimator is then proposed:
This estimator can be shown to be an unbiased estimator of the variance of the age of patients in the study population*.
To illustrate this, simulations were carried out. The following graph shows the estimations for different sample sizes considered.
The results show that for small sample sizes (less than or equal to 10) the bias of the naive variance estimator can be quite high. However, for larger sample sizes (greater than or equal to 100) the bias becomes negligible. The corrected variance estimator is always unbiased, whatever the sample size. Since correcting the estimator is not complex, it is a good idea to keep this corrected estimator, whatever the sample size.
Estimation of the standard deviation of a random variable
It has already been shown that the estimator is an unbiased estimator of the variance of a random variable. To estimate the standard deviation, it would then be tempting to calculate . However, it can be shown that is a biased estimator of the standard deviation in the population.
It is also possible to demonstrate that should be an even more biased estimator *.
Unlike the variance estimator, it is impossible to define a standard deviation estimator which is unbiased, whatever the distribution of the variable X. However, in the case where X follows a normal distribution, the bias of can be expressed. This is a multiplicative bias denoted by convention*.
Thus, if X follows a normal distribution, it is possible to calculate the following unbiased estimator :
To illustrate the three standard deviation estimators presented above, a simulation study similar to the previous one was carried out. The following graph shows the estimates for the different sample sizes considered.
The results show that for small sample sizes (less than or equal to 10) the bias of the estimator can be quite high. However, for larger sample sizes (greater than or equal to 50) the bias becomes negligible. The estimator corrected for standard deviation (by the quantity ) is always unbiased, whatever the sample size. The , estimator is of no interest because it is always more biased than.
The correction of the estimator is made through a complex quantity (), that is why the biased estimator of the standard deviation is retained for large sample sizes, but it is important to correct for small sample sizes.
The aim of this brief article is to raise aware of the notion of bias in an estimator, and to remind everyone to be vigilant. The example of the estimator of the standard deviation demonstrates quite effectively that intuitions can sometimes play tricks in statistics.
*Demonstration available on request
For further information on the longer version of this article, contact our experts at email@example.com
Discover all our technical articles and news
Efor accelerates the structuring of its center of excellence for health data valorization.
With a position among the leaders in specialized consulting for Life Sciences industries, the Efor Group supports its clients thro
ICH Q12 application for the implementation of a change: example of a site transfer
The ICH Q12 guideline entitled « Technical and regulatory considerations for pharmaceutical product life cycle management” is
Efor X Institut Pasteur de Dakar
We are delighted to announce that Efor has entered into a collaboration with the Institut Pasteur de Dakar to support its major ex