How to compute probability mass functions with Python.

In probability theory a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.
It is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

Calculating probability mass function for drawing marbles from a jar:

The following Python code shows probabilities and proportions calculation for case of drawing marbles of different colors - blue, yellow and orange - out of the box.

import matplotlib.pyplot as plt
import numpy as np

# colored marble counts
blue   = 40
yellow = 30
orange = 20
totalMarbs = blue + yellow + orange

# put them all in a jar
jar = np.hstack((1*np.ones(blue),2*np.ones(yellow),3*np.ones(orange)))

# now we draw 500 marbles (with replacement)
numDraws = 500
drawColors = np.zeros(numDraws)

for drawi in range(numDraws):
    # generate a random integer to draw
    randmarble = int(np.random.rand()*len(jar))
    # store the color of that marble
    drawColors[drawi] = jar[randmarble]

# now we need to know the proportion of colors drawn
propBlue = sum(drawColors==1) / numDraws
propYell = sum(drawColors==2) / numDraws
propOran = sum(drawColors==3) / numDraws

# plot those against the theoretical probability[1,2,3],[ propBlue, propYell, propOran ],label='Proportion')
plt.plot([0.5, 1.5],[blue/totalMarbs, blue/totalMarbs],'b',linewidth=3,label='Probability')
plt.plot([1.5, 2.5],[yellow/totalMarbs,yellow/totalMarbs],'b',linewidth=3)
plt.plot([2.5, 3.5],[orange/totalMarbs,orange/totalMarbs],'b',linewidth=3)

plt.xlabel('Marble color')

Calculating probabilities for drawing marbles from a jar

Calculating probability density (technically mass) function:

A probability density function (PDF) differes from probability mass function and associated with continuous rather than discrete random variables.

import matplotlib.pyplot as plt
import numpy as np
# continous signal (technically discrete!)
N = 10004
datats1 = np.cumsum(np.sign(np.random.randn(N)))
datats2 = np.cumsum(np.sign(np.random.randn(N)))

# let's see what they look like

# discretize using histograms
nbins = 50

y,x = np.histogram(datats1,nbins)
x1 = (x[1:]+x[:-1])/2
y1 = y/sum(y)

y,x = np.histogram(datats2,nbins)
x2 = (x[1:]+x[:-1])/2
y2 = y/sum(y)

plt.plot(x1,y1, x2,y2,linewidth=3)
plt.xlabel('Data value')

Calculating probability density (technically mass) function

See also related topics: