`pymc` is a python module that implements several MCMC sampling algorithms. Currently, `pymc`'s stable release (2.x) mostly relised on the Gibbs and Metropolis-Hastings samplers, which are not that exciting, but the development version (3.x) has Hamiltonian Monte Carlo (HMC). `pymc` 2.x is available from github

https://github.com/pymc-devs/pymc/tree/2.3

You can also install via `pip` or `easy_install`. `pymc` requires a fortran compiler to build.

Similar to other examples, we start by making some fake data. In this case, we're going to do have a linear regression model with two predictor variables (x,y) and one outcome variable (z). The relation between them is: \[Z_n = \alpha + \beta_x x_n + \beta_y y_n + \epsilon\] where \(\epsilon\) is a Gaussian noise term.

In [2]:

```
from numpy import *
Nobs = 20
x_true = random.uniform(0,10, size=Nobs)
y_true = random.uniform(-1,1, size=Nobs)
alpha_true = 0.5
beta_x_true = 1.0
beta_y_true = 10.0
eps_true = 0.5
z_true = alpha_true + beta_x_true*x_true + beta_y_true*y_true
z_obs = z_true + random.normal(0, eps_true, size=Nobs)
```

In [3]:

```
%matplotlib inline
from matplotlib import pyplot as plt
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.scatter(x_true, z_obs, c=y_true, marker='o')
plt.colorbar()
plt.xlabel('X')
plt.ylabel('Z')
plt.subplot(1,2,2)
plt.scatter(y_true, z_obs, c=x_true, marker='o')
plt.colorbar()
plt.xlabel('Y')
plt.ylabel('Z')
```

Out[3]:

`PyStan`, we build a model in a hiarchical manner, but instead of writing it out in another language, we use nothing but python code.

In [20]:

```
import pymc
# define the parameters with their associated priors
alpha = pymc.Uniform('alpha', -100,100, value=median(z_obs))
betax = pymc.Uniform('betax', -100,100, value=std(z_obs)/std(x_true))
betay = pymc.Uniform('betay', -100,100, value=std(z_obs)/std(y_true))
eps = pymc.Uniform('eps', 0, 100, value=0.01)
# Now define the model
@pymc.deterministic
def model(alpha=alpha, betax=betax, betay=betay, x=x_true, y=y_true):
return alpha + betax*x + betay*y
# pymc parametrizes the width of the normal distribution by tau=1/sigma**2
@pymc.deterministic
def tau(eps=eps):
return power(eps, -2)
# Lastly relate the model/parameters to the data
data = pymc.Normal('data', mu=model, tau=tau, value=z_obs, observed=True)
```

The `pymc` model is defined solely in terms of python objects, referred to as *stochastics* (in the Bayesian sense, they are random variables). The four parameters are given uniform priors (other, more complicated priors could be used).

We then define the model as a python function with a special decorator: `pymc.deterministic`. This tells `pymc` that `model` is a function that depends on stochastic objects, but is itself a deterministic function of them (it's a function of random variables, not a random variable itself). The same is true for the function `tau` that simply converts the standard deviation parameter into a precision parameter. The last statement then tells `pymc` that the observed data `z_obs` is drawn from a normal distribution with mean equal to the model, presicsion equal to \(1/\epsilon ^2\) and value equal to the observed data. The special `observed=True` argument tells pymc that this stochastic's value should remain constant.

With all the definitions in place, we simply feed the pymc objects into a `pymc.MCMC` to create a sampler and run it.

In [34]:

```
sampler = pymc.MCMC([alpha,betax,betay,eps,model,tau,z_obs,x_true,y_true])
sampler.use_step_method(pymc.AdaptiveMetropolis, [alpha,betax,betay,eps],
scales={alpha:0.1, betax:0.1, betay:1.0, eps:0.1})
sampler.sample(iter=10000)
```

In [36]:

```
pymc.Matplot.plot(sampler)
```

For each stochastic, `pymc` plots the trace (upper-left panel), the auto-correlation (lower-left pane), and the histogram of the samples (right panel).

You'll see that nothing much happens at first, then the adaptive MH sampler finds a better covariance matrix and it starts to sample more efficiently. You can either throw up the first iterations, or just go for another sampling now that we're confident the sampler is behaving appropriately.

In [37]:

```
sampler.sample(iter=10000)
```

`pymc`'s internals.

In [49]:

```
alpha.summary()
```

In [50]:

```
m_alpha = median(alpha.trace())
m_betax = median(betax.trace())
m_betay = median(betay.trace())
m_eps = median(eps.trace())
```

`eps`). Now we can plot the results, this time correcting for the `y` predictor when plotting versus `x` and vice versa. That way we can make sure the correct trend has been found. We can also plot a shaded region denoting the intrinsic dispersion.

In [51]:

```
plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plt.plot(x_true, z_obs-m_alpha-m_betay*y_true, 'o')
plt.xlabel('X')
plt.ylabel('Z - alpha - beta_y y')
# Now plot the model
xx = array([x_true.min(), x_true.max()])
plt.plot(xx, xx*m_betax)
plt.plot(xx, xx*m_betax + m_eps, '--', color='k')
plt.plot(xx, xx*m_betax - m_eps, '--', color='k')
plt.subplot(1,2,2)
plt.plot(y_true, z_obs-m_alpha-m_betax*x_true, 'o')
plt.xlabel('Y')
plt.ylabel('Z - alpha - beta_x x')
yy = array([y_true.min(), y_true.max()])
plt.plot(yy, yy*m_betay)
plt.plot(yy, yy*m_betay + m_eps, '--', color='k')
plt.plot(yy, yy*m_betay - m_eps, '--', color='k')
```

Out[51]:

`triangle_plot` plotting package, it's an easy way to look at the covariance of the various parameters. For this to work, we need the Markov chains in a 2D array indexed by `[i,p]`, so we need to reshape the array to combine the 4 chains. `pystan` also includes a 5th "parameter", which is the log-probability at each step, so we can leave that out.

In [53]:

```
samples = array([alpha.trace(),betax.trace(),betay.trace(),eps.trace()]).T
samples = samples[0]
import triangle
tmp = triangle.corner(samples[:,:], labels=['alpha','betax','betay','eps'],
truths=[alpha_true, beta_x_true, beta_y_true, eps_true])
```