Central limit theorem (CLT): What it is and how it works

CLT is a statistical assumption that, given a sufficiently large sample size from a population with a finite level of variance, the mean of all variables sampled from the same population will be approximately equal to the mean of the entire population. According to the central limit theorem, the mean of a sample of data will become closer to the mean of the entire population in question as the sample size increases, regardless of the actual distribution of the data. Let's look at what the central limit theorem is, what it is for, and its key components. 

What is the central limit theorem (CLT)

In probability theory, the central limit theorem (CLT) states that the distribution of a sample variable approaches a normal distribution (i.e., a "bell curve") as the sample size increases. assuming that all samples are identical in size and regardless of the actual shape of the population distribution. In other words, CLT is a statistical assumption that, given a sufficiently large sample size of a population with a finite level of variance, the mean of all variables sampled from the same population will be approximately equal to the mean of the entire population. Furthermore, these samples approach a normal distribution and their variances are approximately equal to the population variance as the sample size increases, according to the law of large numbers. Although this concept was first developed by Abraham de Moivre in 1733, it was not formalized until 1920, when the famous Hungarian mathematician George Pólya named it the central limit theorem.

formulas

Formula of the central limit theorem. Source: Inchcalculator.com.

What is the central limit theorem (CLT) for?

According to the central limit theorem, the mean of a sample of data will become closer to the mean of the entire population in question as the sample size increases, regardless of the actual distribution of the data. In other words, the data is exact whether the distribution is normal or aberrant. As a general rule, a sample size between 30 and 50 is considered sufficient for the CLT to be met, which means that the distribution of sample means is fairly normal. Therefore, the more samples are taken, the more the results will resemble a normal distribution. Note, however, that the central limit theorem will still be approximated in many cases for much smaller sample sizes, such as n=8 on=5.3

graphics

Illustration of the central limit theorem for a biased population of values. Source: ResearchGate

Key components of the central limit theorem

The central limit theorem consists of several key features. These characteristics largely revolve around samples, sample size, and data population.

  1. Sampling is successive. This means that some sample units are common with sample units selected on previous occasions.
  2. Sampling is random. All samples must be selected at random so that they have the same statistical chance of being selected.
  3. Samples must be independent. Selections or results from one sample should not influence future samples or the results of other samples.
  4. Samples must be limited. It is often said that a sample should not exceed 10% of a population if sampling is done without replacement. In general, larger population sizes justify the use of larger sample sizes.
  5. Sample size increases. The central limit theorem becomes relevant as more samples are selected.