Easy way to demonstrate central limit theorem with Python.


The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution - the “bell curve” - as the sample size gets larger — no matter what the shape of the population distribution.
In other words more samples you takes, especially large ones, your graph of the sample means will look more like a normal distribution.
Sample sizes equal to or greater than 30 are often considered sufficient for the CLT to hold, may differ in some cases.A sufficiently large sample size can predict the characteristics of a population more accurately.



Creating data with a power-law distribution:



import matplotlib.pyplot as plt
import numpy as np

# data
N = 1000000
data = np.random.randn(N)**2
# alternative data
# data = np.sin(np.linspace(0,10*np.pi,N))

# show the distribution
plt.plot(data,'.')
plt.show()

plt.hist(data,40)
plt.show()

Creating data with a power-law distribution

Distribution of samples means:



## repeated samples of the mean

samplesize   = 30
numberOfExps = 500
samplemeans  = np.zeros(numberOfExps)

for expi in range(numberOfExps):
    # get a sample and compute its mean
    sampleidx = np.random.randint(0,N,samplesize)
    samplemeans[expi] = np.mean(data[ sampleidx ])
    

# and show its distribution
plt.hist(samplemeans,30)
plt.xlabel('Mean estimate')
plt.ylabel('Count')
plt.show()

Distribution of samples means


Mixing 2 non-Gaussian datasets to get Gaussian combined signal:


IMPORTANT: 2 datasets should be properly scaled !!!



# create two datasets with non-Gaussian distributions
x = np.linspace(0,6*np.pi,10001)
s = np.sin(x)
u = 2*np.random.rand(len(x))-1

fig,ax = plt.subplots(2,3,figsize=(10,6))
ax[0,0].plot(x,s,'b')
ax[0,0].set_title('Signal')

y,xx = np.histogram(s,200)
ax[1,0].plot(y,'b')
ax[1,0].set_title('Distribution')

ax[0,1].plot(x,u,'m')
ax[0,1].set_title('Signal')

y,xx = np.histogram(u,200)
ax[1,1].plot(y,'m')
ax[1,1].set_title('Distribution')

ax[0,2].plot(x,s+u,'k')
ax[0,2].set_title('Combined signal')

y,xx = np.histogram(s+u,200)
ax[1,2].plot(y,'k')
ax[1,2].set_title('Combined distribution')

plt.show()

Mixing 2 non-Gaussian datasets to get Gaussian combined signal


See also related topics: