# How to compute probability mass functions with Python.

In probability theory a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value.

It is also known as the discrete density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

## Calculating probability mass function for drawing marbles from a jar:

The following Python code shows probabilities and proportions calculation for case of drawing marbles of different colors - blue, yellow and orange - out of the box.

```
import matplotlib.pyplot as plt
import numpy as np
# colored marble counts
blue = 40
yellow = 30
orange = 20
totalMarbs = blue + yellow + orange
# put them all in a jar
jar = np.hstack((1*np.ones(blue),2*np.ones(yellow),3*np.ones(orange)))
# now we draw 500 marbles (with replacement)
numDraws = 500
drawColors = np.zeros(numDraws)
for drawi in range(numDraws):
# generate a random integer to draw
randmarble = int(np.random.rand()*len(jar))
# store the color of that marble
drawColors[drawi] = jar[randmarble]
# now we need to know the proportion of colors drawn
propBlue = sum(drawColors==1) / numDraws
propYell = sum(drawColors==2) / numDraws
propOran = sum(drawColors==3) / numDraws
# plot those against the theoretical probability
plt.bar([1,2,3],[ propBlue, propYell, propOran ],label='Proportion')
plt.plot([0.5, 1.5],[blue/totalMarbs, blue/totalMarbs],'b',linewidth=3,label='Probability')
plt.plot([1.5, 2.5],[yellow/totalMarbs,yellow/totalMarbs],'b',linewidth=3)
plt.plot([2.5, 3.5],[orange/totalMarbs,orange/totalMarbs],'b',linewidth=3)
plt.xticks([1,2,3],labels=('Blue','Yellow','Orange'))
plt.xlabel('Marble color')
plt.ylabel('Proportion/probability')
plt.legend()
plt.show()
```

## Calculating probability density (technically mass) function:

A probability density function (PDF) differes from probability mass function and associated with continuous rather than discrete random variables.

```
import matplotlib.pyplot as plt
import numpy as np
# continous signal (technically discrete!)
N = 10004
datats1 = np.cumsum(np.sign(np.random.randn(N)))
datats2 = np.cumsum(np.sign(np.random.randn(N)))
# let's see what they look like
plt.plot(np.arange(N),datats1,linewidth=2)
plt.plot(np.arange(N),datats2,linewidth=2)
plt.show()
# discretize using histograms
nbins = 50
y,x = np.histogram(datats1,nbins)
x1 = (x[1:]+x[:-1])/2
y1 = y/sum(y)
y,x = np.histogram(datats2,nbins)
x2 = (x[1:]+x[:-1])/2
y2 = y/sum(y)
plt.plot(x1,y1, x2,y2,linewidth=3)
plt.legend(('ts1','ts2'))
plt.xlabel('Data value')
plt.ylabel('Probability')
plt.show()
```