Quick Start
You may open the file "Sprinkler.xls" (found in the Example subdirectory of the intallation directory) to get a better understanding on the general principles of ProBT-XL.
This file has been created using a Bayes net described in the Wikipedia pages on Bayesian networks.
This
flash demo
shows how to use this Excel file to do a first inference. Before watching this demonstration, please note
how the mouse sensitive green arrow is used to progress to the next slide in the following figure :

Introduction
ProBT-XL is an interface to the ProBT-Engine with a graphical capabilities allowing the design of Bayesian Networks under Microsoft Excel. ProBT-XL can be used as an entry point to the Bayesian programming approach and Graphical Models defined with ProBT-XL can be used in Bayesian programs.
This tutorial is organized as follow :
The tutorial mainly relies on commented flash files created with Wink
In this document, we focus on the use of ProBT-XL to define Bayesian Networks. Reader interested in bayesian programming may refer to the ProBT documentation. A short introduction to Bayesian programming is given here .
General presentation of the interface
This presentation makes use of four concepts : Variables, Description, Distribution and Question: they will be defined shortly.
In ProBT-XL, "Variables", "Descriptions", "Distributions" and "Questions" are described in separate worksheets.
A single workbook (a single .xls file ) contains several Descriptions, Distributions and Questions worksheets but a single Variables worksheet. Variables are considered as global to all the other worksheets.
Since several distributions are needed to build a graphical model, a typical ProBT-XL file contains several “Distribution” sheets, one “Description” sheet to enter the model, one "Question" sheet to perform the inference and one "Variables" sheet. Sophisticated users may use several description and question sheets for their own purposes.
Variables:
Variables are the categories used to define the model and express the problem. The possible type for variables are: integer, real (discretized or not), intervals and labels.
Description:
A description represents a joint probability distribution. In Bayesian Programming, the joint distribution is written as a list of distributions known as "Description". For example the joint distribution P(X Y Z) = P(X) P(Y) P(Z|X Y) will be written as the list P(X), P(Y) and P(Z|X Y) in column A of the description sheet. In most cases, it is possible to have a graphical representation of this joint distribution. ProBT-XL offers ways to go from one to the other representation (symbolical or graphical representation).
Distribution:
Distributions are the atoms used to build the joint distribution. Distributions may or not be conditional. If a distribution sheet is used to represent P(X Y | K L) then X and Y are written in column A and K and L are written in column C on the Distribution sheet. When used in a description sheet, a distribution should be referred by the name of its Excel sheet.
Using ProBT-XL allows the learning of distributions directly from data stored in Excel ranges or in CSV files.
Question:
Questions are used to define inference problems based on a given Description: for example, if someone wanted to infer the probability distribution on “Rain” knowing the "Grass is wet ” : he or she will write the variable “Rain” in column A and write “GRASSWET” with its value (TRUE) in column B and C “. ProBT XL allows to define several instances of the same question simply by defining more values in column D, E, F..... This feature is usefull to perform tests.
General interaction principle:
Once the ProBT-XL additional macros are installed, a ProBT-XL menu appears in the main menu each time Excel is launched. This menu permits to create new ProBT-XL sheets. Each type of sheets comes with a special menu which is described in the tutorial below. The “Back” button is common to all these menus allowing a direct access to the previous sheet. The buttons “Hide” and “Show” distributions of the ProBT menu are used to temporarily hide or show all the distributions of a given workbook.
This flash file shows the basic navigation mechanisms to move from one sheet to another.
The ProBT Menu is used to create the special data sheets used in ProBT-XL. The menu is also used to "simplify" the current workbook by hidding the definition of the distributions leaving the "Variable" worksheet and the worksheets used to describe and use models . The same menu can be used to display them back. The Flash file demonstrates how to create all the ProBT-XL sheets from an initially empty workbook.
This worksheet is used to declare all the variables used in the workbook. The register button can be used to check if this declaration is correct. This demonstration shows how to declare a variable for each possible types.
Once the variables have been defined, it is possible to use two graphical widgets of Excel to design a Bayes net in the "description" worksheet as shown here.
Use the "Get Bn" button to transform this graphical representation into a set of template distributions which could be instanciated to produce a usable model.
Instead of using the graphical representation of a joint distribution, it is possible to directly enter a symbolic representation. For example, to represent the joint distribution on A and B : P(A B) , one can use the following decomposition P(A) P(B|A) and write this decomposition in column A as shown in this flash demonstration. By clicking on "Create", ProBT-XL will create the necessary distribution sheets as well as new variables if they do not exist. The "Draw BN" button can be used to obtain a graphical representation which may help to navigate. It is possible to use the "Create" feature to modify the joint distribution created with the graphical input.
Rehash
The rehash field in the description tool bar is set by default to "Variables". This value is suitable when using proBT-XL to design simple models. When this field is set to "Variables", the system will always redefine all the variables and all the distributions before performing an inference. This will insure a coherence among all the data sheets. When set to "Distributions" the system will only redefine the distributions. When set to "No" the user has to insure the coherence between all the data sheets (see expert mode).
When set to "EM from range" , the joint distribution can be learned from a set of data described in a range or in a file. This feature is describe in the Mixture Learning section.
Define
The "Define" allows to register the joint distribution, it can be used to check if all the distributions making the description are properly defined.
The "Distribution" sheet allows to define the atomic distributions used to define the joint distribution of the description sheet. In the first place, we describe simple Distribution , conditional distribution will be described later on. In "F2" one can find a list box of the possible distributions handled by this version of ProBT-XL. In column A write the name of the variable on which the distribution applies. This variable must have been defined ("Register" in the variable sheet) before defining the distribution.
Uniform distributions can be defined on all type of variables. For integer and real, it is possible to reduce the support of the distribution to a smaller interval.
Histograms can be define on all variables excepted on non discretized Reals. Green cells indicate where to place the numerical value of the probability for each cases of the variable. The sum of this values is normalized by the system.
Normal can be define for Integer and Real type of Variable. See, the section Expert to have a better inside on what is actually done by the engine when dealing with this kind of variables.
The parametrization of the lognormal is given as in this reference.
Dirac is used to assign the probability 1 to a given value of a variable.
The green cell indicates where to enter the mean of the law.
Laplace
The Laplace distribution applies to variable of type integer. It is mainly used to define the learning strategy. Without learning it is equivalent to a uniform distribution.
Binomial
The parameter is the chance of success of an individual trial : the cardinality of the variable is assume to be the number of trials.
The "Distribution" sheet allows to define conditional distributions.
In column A, write the name of the variable on which the distribution applies. In column C, write the conditioning variables.
They are five possible ways to describe a conditional distribution (without considering learning).
1) Tables : the distributions are indexed with the conditionning variables (see the Sprinkler.xls example)
2) Functional dirac : It is possible to describe functions with functional dirac : The distribution on the variables is a dirac at the value of the function. The syntax used in ProBT-XL to describe these functions is C. ProBT-XL uses TCC to compile them.
3) ProBT formulae : In some cases, it is possible to use formulae instead of parameters in parametric distribution. (see the U_equal_RI.xls example)
4) Bayesian case : The conditioning variables are used as select case to determine the nature of the distribution. (see the burglar example)
5) Sub-Inference : A question sheet implicitly defines a distribution and it could be used as it is in a description. This sub-model call is described in the inference section.
One way to define distributions is learning, the system will use data to build the selected distributions or conditional distributions. Learning is permitted on multidimensional distributions. Excel ranges or files can be used as data sources. The system uses the first line to match the name of the variables with the one used in the distribution. The CSV format, with ";" as separator, is used when learning from a file.
Multiple Inference : This demonstration shows single and multiple inferences based on the file Sprinkler.xls.
ProBT Engine has two modes : exact and approximate inference
a) Exact inference : Exact inference is only possible on joint distribution having : Integer, Sparse and Label as variables.
b) Approximate inference should be used when dealing with Real and Intervals.
ProBT-XL offers two parameters to deal with approximate inference :
Integration steps
Number of samples
Integration steps is design to set the number of iterations for integration during inference.
Number of samples is used to sample the parameters of the resulting distribution. When inference is used to propapagate uncertainty, as in the given example, only this parameter is used by the system.
Questions may be used as probabilities distributions in other Models. Care should be taken when using this functionality to properly define the desciption with the "Distributions" parameters in the "Rehash" box.
ProBT uses EM (Expectation Maximization to learn the parameters of each distributions part of a joint distribution. This general capability is avialable within ProBT-XL. In the proposed example it is used to learn the parameters of a mixture of distributions. This feature allows bayesian classification since EM is design to work with incomplete data (here the class). The description is used to specify the prior knowledge about the mixture. Beware of not using uniform distriubtion to define this prior knowledge : this will surely lead EM into a local minima without signification.
Saving Question for external programs :
It is possible to use the definition of a question in an external program. This definition is stored in an XML file which also contains the joint distribution. Using this definition, it is possible to make multiple inferences just by passing new evidence values to this question. It is also possible to directly run this question on a cvs file having the names of the evidences variables written on the first line. Here is a short Python example of this last feature. To run this program one must run a version of Python with ProBT as a plugin. This plugin has been built using Swig and may be obtained from Probayes.
def eval_probt_model_on_file(model_file ,evidences_file):
""" use the probt model saved in model file and apply it to all the data stored in evidences_file. Returns a vector of vector of probabilities corresponding to each set of evidences """
_probt.load_question(model_file)
_probt.apply_on_file(evidences_file)
une séquence assez longue d’utilisation de probt -xl sans avoir de coupure
|