How to compute central tendency with Python.

Measures of central tendency serve to find the middle of a data set. The 3 most common metrics of central tendency are the mean, median and mode.
- Mean: the sum of all values divided by the total number of values.
- Median: the middle number in an ordered data set.
- Mode: the most frequent value of a dataset.

Computing the means:

The arithmetic mean is the sum of all values divided by the total number of values and it’s the most commonly used measure of central tendency. The mean can only be used on interval and ratio levels of measurement because it requires equal spacing between adjacent values or scores in the scale.

# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

# the distributions
N = 10001   # number of data points
nbins = 30  # number of histogram bins

d1 = np.random.randn(N) - 1
d2 = 3*np.random.randn(N)
d3 = np.random.randn(N) + 1

# need their histograms
y1,x1 = np.histogram(d1,nbins)
x1 = (x1[1:]+x1[:-1])/2

y2,x2 = np.histogram(d2,nbins)
x2 = (x2[1:]+x2[:-1])/2

y3,x3 = np.histogram(d3,nbins)
x3 = (x3[1:]+x3[:-1])/2

# plot them

# compute the means
mean_d1 = sum(d1) / len(d1)
mean_d2 = np.mean(d2)
mean_d3 = np.mean(d3)

# plot them
plt.plot(x1,y1,'b', x2,y2,'r', x3,y3,'k')

plt.xlabel('Data values')
plt.ylabel('Data counts')     

сomputing the mean value

Computing the median:

The median can only be used on data that can be ordered – that is, from ordinal, interval and ratio levels of measurement.

# create a log-normal distribution
shift   = 0
stretch = .7
n       = 2000
nbins   = 50

# generate data
data = stretch*np.random.randn(n) + shift
data = np.exp( data )

# and its histogram
y,x = np.histogram(data,nbins)
x = (x[:-1]+x[1:])/2

# compute mean and median
datamean = np.mean(data)
datamedian = np.median(data)

# plot data
fig,ax = plt.subplots(2,1,figsize=(4,6))

ax[1].set_title('Log-normal data histogram')

сomputing the median value

Computing the mode:

The mode can be used for any level of measurement, but it’s most meaningful for nominal and ordinal levels.

## mode

data = np.round(np.random.randn(10))

uniq_data = np.unique(data)
for i in range(len(uniq_data)):
    print(f'{uniq_data[i]} appears {sum(data==uniq_data[i])} times.')

print(' ')
print('The modal value is %g'%stats.mode(data)[0][0])

сomputing the mode value

See also related topics: