Backends¶
Backends for traces
Available backends¶
- NumPy array (pymc3.backends.NDArray)
- Text files (pymc3.backends.Text)
- SQLite (pymc3.backends.SQLite)
The NDArray backend holds the entire trace in memory, whereas the Text and SQLite backends store the values while sampling.
Selecting a backend¶
By default, a NumPy array is used as the backend. To specify a different backend, pass a backend instance to sample.
For example, the following would save the sampling values to CSV files in the directory ‘test’.
>>> import pymc3 as pm
>>> db = pm.backends.Text('test')
>>> trace = pm.sample(..., trace=db)
Selecting values from a backend¶
After a backend is finished sampling, it returns a MultiTrace object. Values can be accessed in a few ways. The easiest way is to index the backend object with a variable or variable name.
>>> trace['x'] # or trace.x or trace[x]
The call will return the sampling values of x, with the values for all chains concatenated. (For a single call to sample, the number of chains will correspond to the njobs argument.)
To discard the first N values of each chain, slicing syntax can be used.
>>> trace['x', 1000:]
The get_values method offers more control over which values are returned. The call below will discard the first 1000 iterations from each chain and keep the values for each chain as separate arrays.
>>> trace.get_values('x', burn=1000, combine=False)
The chains parameter of get_values can be used to limit the chains that are retrieved.
>>> trace.get_values('x', burn=1000, chains=[0, 2])
MultiTrace objects also support slicing. For example, the following call would return a new trace object without the first 1000 sampling iterations for all traces and variables.
>>> sliced_trace = trace[1000:]
The backend for the new trace is always NDArray, regardless of the type of original trace. Only the NDArray backend supports a stop value in the slice.
Loading a saved backend¶
Saved backends can be loaded using load function in the module for the specific backend.
>>> trace = pm.backends.text.load('test')
Writing custom backends¶
Backends consist of a class that handles sampling storage and value selection. Three sampling methods of backend will be called:
- setup: Before sampling is started, the setup method will be called with two arguments: the number of draws and the chain number. This is useful setting up any structure for storing the sampling values that require the above information.
- record: Record the sampling results for the current draw. This method will be called with a dictionary of values mapped to the variable names. This is the only sampling function that must do something to have a meaningful backend.
- close: This method is called following sampling and should perform any actions necessary for finalizing and cleaning up the backend.
The base storage class backends.base.BaseTrace provides common model setup that is used by all the PyMC backends.
Several selection methods must also be defined:
- get_values: This is the core method for selecting values from the backend. It can be called directly and is used by __getitem__ when the backend is indexed with a variable name or object.
- _slice: Defines how the backend returns a slice of itself. This is called if the backend is indexed with a slice range.
- point: Returns values for each variable at a single iteration. This is called if the backend is indexed with a single integer.
- __len__: This should return the number of draws.
When pymc3.sample finishes, it wraps all trace objects in a MultiTrace object that provides a consistent selection interface for all backends. If the traces are stored on disk, then a load function should also be defined that returns a MultiTrace object.
For specific examples, see pymc3.backends.{ndarray,text,sqlite}.py.
ndarray¶
NumPy array trace backend
Store sampling values in memory as a NumPy array.
-
class
pymc3.backends.ndarray.
NDArray
(name=None, model=None, vars=None)¶ NDArray trace object
Parameters: - name (str) – Name of backend. This has no meaning for the NDArray backend.
- model (Model) – If None, the model is taken from the with context.
- vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
-
get_values
(varname, burn=0, thin=1)¶ Get values from trace.
Parameters: - varname (str) –
- burn (int) –
- thin (int) –
Returns: A NumPy array
-
point
(idx)¶ Return dictionary of point values at idx for current chain with variable names as keys.
-
record
(point, sampler_stats=None)¶ Record results of a sampling iteration.
Parameters: point (dict) – Values mapped to variable names
-
setup
(draws, chain, sampler_vars=None)¶ Perform chain-specific setup.
Parameters: - draws (int) – Expected number of draws
- chain (int) – Chain number
- sampler_vars (list of dicts) – Names and dtypes of the variables that are exported by the samplers.
sqlite¶
SQLite trace backend
Store and retrieve sampling values in SQLite database file.
Database format¶
For each variable, a table is created with the following format:
recid (INT), draw (INT), chain (INT), v0 (FLOAT), v1 (FLOAT), v2 (FLOAT) ...
The variable column names are extended to reflect additional dimensions. For example, a variable with the shape (2, 2) would be stored as
key (INT), draw (INT), chain (INT), v0_0 (FLOAT), v0_1 (FLOAT), v1_0 (FLOAT) ...
The key is autoincremented each time a new row is added to the table. The chain column denotes the chain index and starts at 0.
-
class
pymc3.backends.sqlite.
SQLite
(name, model=None, vars=None)¶ SQLite trace object
Parameters: - name (str) – Name of database file
- model (Model) – If None, the model is taken from the with context.
- vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
-
get_values
(varname, burn=0, thin=1)¶ Get values from trace.
Parameters: - varname (str) –
- burn (int) –
- thin (int) –
Returns: A NumPy array
-
point
(idx)¶ Return dictionary of point values at idx for current chain with variables names as keys.
-
record
(point)¶ Record results of a sampling iteration.
Parameters: point (dict) – Values mapped to variable names
-
setup
(draws, chain)¶ Perform chain-specific setup.
Parameters: - draws (int) – Expected number of draws
- chain (int) – Chain number
-
pymc3.backends.sqlite.
load
(name, model=None)¶ Load SQLite database.
Parameters: - name (str) – Path to SQLite database file
- model (Model) – If None, the model is taken from the with context.
Returns: A MultiTrace instance
text¶
Text file trace backend
Store sampling values as CSV files.
File format¶
Sampling values for each chain are saved in a separate file (under a directory specified by the name argument). The rows correspond to sampling iterations. The column names consist of variable names and index labels. For example, the heading
x,y__0_0,y__0_1,y__1_0,y__1_1,y__2_0,y__2_1
represents two variables, x and y, where x is a scalar and y has a shape of (3, 2).
-
class
pymc3.backends.text.
Text
(name, model=None, vars=None)¶ Text trace object
Parameters: - name (str) – Name of directory to store text files
- model (Model) – If None, the model is taken from the with context.
- vars (list of variables) – Sampling values will be stored for these variables. If None, model.unobserved_RVs is used.
-
get_values
(varname, burn=0, thin=1)¶ Get values from trace.
Parameters: - varname (str) –
- burn (int) –
- thin (int) –
Returns: A NumPy array
-
point
(idx)¶ Return dictionary of point values at idx for current chain with variables names as keys.
-
record
(point)¶ Record results of a sampling iteration.
Parameters: point (dict) – Values mapped to variable names
-
setup
(draws, chain)¶ Perform chain-specific setup.
Parameters: - draws (int) – Expected number of draws
- chain (int) – Chain number
-
pymc3.backends.text.
dump
(name, trace, chains=None)¶ Store values from NDArray trace as CSV files.
Parameters: - name (str) – Name of directory to store CSV files in
- trace (MultiTrace of NDArray traces) – Result of MCMC run with default NDArray backend
- chains (list) – Chains to dump. If None, all chains are dumped.
-
pymc3.backends.text.
load
(name, model=None)¶ Load Text database.
Parameters: - name (str) – Name of directory with files (one per chain)
- model (Model) – If None, the model is taken from the with context.
Returns: A MultiTrace instance
tracetab¶
Functions for converting traces into a table-like format
-
pymc3.backends.tracetab.
trace_to_dataframe
(trace, chains=None, varnames=None, include_transformed=False)¶ Convert trace to Pandas DataFrame.
Parameters: - trace (NDarray trace) –
- chains (int or list of ints) – Chains to include. If None, all chains are used. A single chain value can also be given.
- varnames (list of variable names) – Variables to be included in the DataFrame, if None all variable are included.
- include_transformed (boolean) – If true transformed variables will be included in the resulting DataFrame.