Backends

Backends for traces

Available backends

  1. NumPy array (pymc3.backends.NDArray)
  2. Text files (pymc3.backends.Text)
  3. SQLite (pymc3.backends.SQLite)

The NDArray backend holds the entire trace in memory, whereas the Text and SQLite backends store the values while sampling.

Selecting a backend

By default, a NumPy array is used as the backend. To specify a different backend, pass a backend instance to sample.

For example, the following would save the sampling values to CSV files in the directory ‘test’.

>>> import pymc3 as pm
>>> db = pm.backends.Text('test')
>>> trace = pm.sample(..., trace=db)

Selecting values from a backend

After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.

>>> trace['x']  # or trace.x or trace[x]

The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the njobs argument.)

To discard the first N values of each chain, slicing syntax can be used.

>>> trace['x', 1000:]

The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.

>>> trace.get_values('x', burn=1000, combine=False)

The chains parameter of get_values can be used to limit the chains that are retrieved.

>>> trace.get_values('x', burn=1000, chains=[0, 2])

MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.

>>> sliced_trace = trace[1000:]

The backend for the new trace is always NDArray, regardless of the type of original trace. Only the NDArray backend supports a stop value in the slice.

Loading a saved backend

Saved backends can be loaded using load function in the module for the specific backend.

>>> trace = pm.backends.text.load('test')

Writing custom backends

Backends consist of a class that handles sampling storage and value selection. Three sampling methods of backend will be called:

  • setup: Before sampling is started, the setup method will be called with two arguments: the number of draws and the chain number. This is useful setting up any structure for storing the sampling values that require the above information.
  • record: Record the sampling results for the current draw. This method will be called with a dictionary of values mapped to the variable names. This is the only sampling function that must do something to have a meaningful backend.
  • close: This method is called following sampling and should perform any actions necessary for finalizing and cleaning up the backend.

The base storage class backends.base.BaseTrace provides common model setup that is used by all the PyMC backends.

Several selection methods must also be defined:

  • get_values: This is the core method for selecting values from the backend. It can be called directly and is used by __getitem__ when the backend is indexed with a variable name or object.
  • _slice: Defines how the backend returns a slice of itself. This is called if the backend is indexed with a slice range.
  • point: Returns values for each variable at a single iteration. This is called if the backend is indexed with a single integer.
  • __len__: This should return the number of draws.

When pymc3.sample finishes, it wraps all trace objects in a MultiTrace object that provides a consistent selection interface for all backends. If the traces are stored on disk, then a load function should also be defined that returns a MultiTrace object.

For specific examples, see pymc3.backends.{ndarray,text,sqlite}.py.

ndarray

NumPy array trace backend

Store sampling values in memory as a NumPy array.

class pymc3.backends.ndarray.NDArray(name=None, model=None, vars=None)

NDArray trace object

Parameters:
  • name (str) – Name of backend. This has no meaning for the NDArray backend.
  • model (Model) – If None, the model is taken from the with context.
  • vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:
  • varname (str) –
  • burn (int) –
  • thin (int) –
Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variable names as keys.

record(point, sampler_stats=None)

Record results of a sampling iteration.

Parameters:point (dict) – Values mapped to variable names
setup(draws, chain, sampler_vars=None)

Perform chain-specific setup.

Parameters:
  • draws (int) – Expected number of draws
  • chain (int) – Chain number
  • sampler_vars (list of dicts) – Names and dtypes of the variables that are exported by the samplers.

sqlite

SQLite trace backend

Store and retrieve sampling values in SQLite database file.

Database format

For each variable, a table is created with the following format:

recid (INT), draw (INT), chain (INT), v0 (FLOAT), v1 (FLOAT), v2 (FLOAT) ...

The variable column names are extended to reflect additional dimensions. For example, a variable with the shape (2, 2) would be stored as

key (INT), draw (INT), chain (INT), v0_0 (FLOAT), v0_1 (FLOAT), v1_0 (FLOAT) ...

The key is autoincremented each time a new row is added to the table. The chain column denotes the chain index and starts at 0.

class pymc3.backends.sqlite.SQLite(name, model=None, vars=None)

SQLite trace object

Parameters:
  • name (str) – Name of database file
  • model (Model) – If None, the model is taken from the with context.
  • vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:
  • varname (str) –
  • burn (int) –
  • thin (int) –
Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters:point (dict) – Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters:
  • draws (int) – Expected number of draws
  • chain (int) – Chain number
pymc3.backends.sqlite.load(name, model=None)

Load SQLite database.

Parameters:
  • name (str) – Path to SQLite database file
  • model (Model) – If None, the model is taken from the with context.
Returns:

A MultiTrace instance

text

Text file trace backend

Store sampling values as CSV files.

File format

Sampling values for each chain are saved in a separate file (under a directory specified by the name argument). The rows correspond to sampling iterations. The column names consist of variable names and index labels. For example, the heading

x,y__0_0,y__0_1,y__1_0,y__1_1,y__2_0,y__2_1

represents two variables, x and y, where x is a scalar and y has a shape of (3, 2).

class pymc3.backends.text.Text(name, model=None, vars=None)

Text trace object

Parameters:
  • name (str) – Name of directory to store text files
  • model (Model) – If None, the model is taken from the with context.
  • vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
get_values(varname, burn=0, thin=1)

Get values from trace.

Parameters:
  • varname (str) –
  • burn (int) –
  • thin (int) –
Returns:

A NumPy array

point(idx)

Return dictionary of point values at idx for current chain with variables names as keys.

record(point)

Record results of a sampling iteration.

Parameters:point (dict) – Values mapped to variable names
setup(draws, chain)

Perform chain-specific setup.

Parameters:
  • draws (int) – Expected number of draws
  • chain (int) – Chain number
pymc3.backends.text.dump(name, trace, chains=None)

Store values from NDArray trace as CSV files.

Parameters:
  • name (str) – Name of directory to store CSV files in
  • trace (MultiTrace of NDArray traces) – Result of MCMC run with default NDArray backend
  • chains (list) – Chains to dump. If None, all chains are dumped.
pymc3.backends.text.load(name, model=None)

Load Text database.

Parameters:
  • name (str) – Name of directory with files (one per chain)
  • model (Model) – If None, the model is taken from the with context.
Returns:

A MultiTrace instance

tracetab

Functions for converting traces into a table-like format

pymc3.backends.tracetab.trace_to_dataframe(trace, chains=None, varnames=None, include_transformed=False)

Convert trace to Pandas DataFrame.

Parameters:
  • trace (NDarray trace) –
  • chains (int or list of ints) – Chains to include. If None, all chains are used. A single chain value can also be given.
  • varnames (list of variable names) – Variables to be included in the DataFrame, if None all variable are included.
  • include_transformed (boolean) – If true transformed variables will be included in the resulting DataFrame.