Open Source Bayesian Network Structure Learning API, Free-BN

I introduce a new open source Bayesian network structure learning API called, Free-BN (FBN). FBN is licensed under the Apache 2.0 license. Following, I’ll scratch the surface of FBN and walk you through an example of using FBN.

Why another Bayesian network structure learning API?

While working on my dissertation, I had a tough time looking for open source APIs for constraint-based structural learning of Bayesian networks. The few open source APIs I found dealing with Bayesian networks written in Java were:

This page here provides a long list of Bayesian network related software/APIs. One of the fruition of my dissertation (though not reported or included in my doctoral dissertation) was the development of FBN for Bayesian network structural learning written in Java.

Some features of FBN

So, what can FBN currently do (related to Bayesian networks)? Here’s a non-exhaustive list.

  • Structural learning
    • constraint-based (PC, TPDA, PDFS)
    • search-and-scoring (K2)
    • mixed-type (SC*, CrUMB+-GA)
  • Exact inference (using PPTC algorithm)
  • Logic sampling

Working with FBN should be relatively easy. It’s meant to be an API (not an application). Currently, FBN can only learn from database sources, although, you could extend the API to learn from flat files. FBN works primarily based on the design of inversion of control (IOC) or dependency injection (DI) and uses the Spring Framework to achieve that design. Using DI and working primarily with interfaces mean the API can easily be extended to include other structure learning algorithms.

Walkthrough preliminaries

Before I perform the walkthrough on how to use FBN, let’s provide some background information. The dataset is generated using logic sampling and the Bayesian network reported by (Cooper 1992). This Bayesian network has three variables: X1, X2, and X3. The structure of this Bayesian network is a serial connection: X1 -> X2 -> X3. The local probability models reported are shown in the table below.

P(X1=present)=0.6 P(X1=absent)=0.4
P(X2=present|X1=present)=0.8 P(X2=absent|X1=absent)=0.2
P(X2=present|X1=absent)=0.3 P(X2=absent|X1=absent)=0.7
P(X3=present|X2=present)=0.9 P(X3=absent|X2=absent)=0.1
P(X3=present|X2=absent)=0.15 P(X3=absent|X2=absent)=0.85

The algorithm to learn the Bayesian network from the data will be Three Phase Dependency Analysis (TPDA) (Cheng 2002). TPDA is a constraint-based Bayesian network structure learning algorithm. It has three phases: drafting, thickening, and thinning. TPDA is implemented in FBN and will be used to learn the Bayesian network structure from the data generated using logic sampling.

Setup your data source

FBN takes as input data stored in a database with JDBC drivers. Some examples of such databases are Oracle, MS SQL Server, and MySQL. In this walkthrough, I’ll be showing examples using MySQL.

The data must be stored in two separate tables: one table to specify the variables (denote this as vtable), and one table to hold the actual data (denote this as dtable). The vtable should have the following fields: name, type, and domain. An example of a DDL for a vtable using MySQL is:

create table vtable (
 name varchar(10),
 domain varchar(20),
 type varchar(10)
);

Since we have three binary variables (x1, x2, and x3), we have to insert values into the vtable to describe these variables.

insert into vtable(name, domain, type) values('x1','absent,present', '1');
insert into vtable(name, domain, type) values('x2','absent,present', '1');
insert into vtable(name, domain, type) values('x3','absent,present', '1');

The type is set to 1 for categorical variables. For all types see net.fdm.data.intf.Variable.

Now, we have to create a table to hold the data. The following is a sample MySQL DDL to create such a table.

create table dtable (
 x1 varchar(10),
 x2 varchar(10),
 x3 varchar(10)
);

Now that we have created the dtable, insert data into it.

insert into dtable(x1,x2,x3) values('present','present','present');
insert into dtable(x1,x2,x3) values('present','present','present');
insert into dtable(x1,x2,x3) values('present','present','present');
...
insert into dtable(x1,x2,x3) values('present','absent','absent');
insert into dtable(x1,x2,x3) values('absent','absent','absent');
insert into dtable(x1,x2,x3) values('absent','absent','absent');

If you download the source code for FBN, the MySQL scripts are located in demo/mysql.sql. The source code to create the Bayesian network and perform logic sampling is located in demo/com/vang/jee/fbn/demo/DataGenerator.java.

Set up your structure learning algorithm

Now it’s time to setup our structure learning algorithm of choice. We can do so in code (using Java), or, the better alternative, is to “wire up” the algorithm using Spring and XML files. The following code shows how to wire up the TPDA structure learning algorithm using Java.

/**
 * Copyright 2009 Jee Vang 
 * 
 * Licensed under the Apache License, Version 2.0 (the "License"); 
 * you may not use this file except in compliance with the License. 
 * You may obtain a copy of the License at 
 * 
 *  http://www.apache.org/licenses/LICENSE-2.0 
 *  
 *  Unless required by applicable law or agreed to in writing, software 
 *  distributed under the License is distributed on an "AS IS" BASIS, 
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
 *  See the License for the specific language governing permissions and 
 *  limitations under the License. 
 */
package com.vang.jee.fbn.demo;

import java.util.Iterator;

import javax.sql.DataSource;

import net.fbn.data.condcorr.impl.MiCondCorr;
import net.fbn.data.condcorr.impl.MiCondIndepTestImpl;
import net.fbn.data.corr.impl.MutualInformation;
import net.fbn.graph.algo.impl.MWSTImpl;
import net.fbn.graph.factory.impl.UnGraphFactoryImpl;
import net.fbn.graph.intf.Graph;
import net.fbn.learner.struct.cb.tpda.impl.DSeparateA;
import net.fbn.learner.struct.cb.tpda.impl.DSeparateB;
import net.fbn.learner.struct.cb.tpda.impl.SimpleOrientArcsImpl;
import net.fbn.learner.struct.cb.tpda.impl.StDraftImpl;
import net.fbn.learner.struct.cb.tpda.impl.StTPDALearnerImpl;
import net.fbn.learner.struct.cb.tpda.impl.StThickenImpl;
import net.fbn.learner.struct.cb.tpda.impl.StThinImpl;
import net.fbn.learner.struct.intf.StructureLearner;
import net.fdm.data.dao.impl.VariableDaoImpl;
import net.fdm.data.dao.intf.VariableDao;
import net.fdm.data.intf.Variable;

import org.apache.commons.dbcp.BasicDataSource;

/**
 * Demo for structure learning using TPDA.
 * @author Jee Vang
 *
 */
public class TestLearning {
	private DataSource _dataSource;
	private VariableDao _variableDao;
	private StructureLearner _structureLearner;
	
	/**
	 * Gets a structure learner.
	 * @return StructureLearner.
	 */
	public StructureLearner getStructureLearner() {
		if(null == _structureLearner) {
			//set the algorithm to perform TPDA drafting phase
			StDraftImpl draft = new StDraftImpl();
			draft.setMwstAlgo(new MWSTImpl());
			draft.setUnGraphFactory(new UnGraphFactoryImpl());
			
			//these are some classes used to help TPDA proceed
			double delta = 0.01d;
			double theta = 0.001d;
			double epsilon = 0.001d;
			
			MutualInformation mi = new MutualInformation();
			mi.setVariableDao(getVariableDao());
			
			MiCondCorr miCondCorr = new MiCondCorr();
			miCondCorr.setVariableDao(getVariableDao());
			miCondCorr.setCorrMetric(mi);
			
			MiCondIndepTestImpl condIndepTest = new MiCondIndepTestImpl();
			condIndepTest.setVariableDao(getVariableDao());
			condIndepTest.setCondCorrMetric(miCondCorr);
			condIndepTest.setDelta(delta);
			
			DSeparateA dSeparateA = new DSeparateA();
			dSeparateA.setVariableDao(getVariableDao());
			dSeparateA.setCondIndepTest(condIndepTest);
			
			DSeparateB dSeparateB = new DSeparateB();
			dSeparateB.setVariableDao(getVariableDao());
			dSeparateB.setEpsilon(epsilon);
			dSeparateB.setCondIndepTest(condIndepTest);
			
			SimpleOrientArcsImpl orientArcs = new SimpleOrientArcsImpl();
			orientArcs.setCondIndepTest(condIndepTest);
			orientArcs.setEpsilon(epsilon);
			
			//set the algorithm to perform the TPDA thickening phase
			StThickenImpl thicken = new StThickenImpl();
			thicken.setDSeparate(dSeparateA);
			
			//set the algorithm to perform the TPDA thinning phase
			StThinImpl thin = new StThinImpl();
			thin.setDSeparateA(dSeparateA);
			thin.setDSeparateB(dSeparateB);
			
			//now wire up tpda
			StTPDALearnerImpl tpda = new StTPDALearnerImpl();
			tpda.setDelta(delta);
			tpda.setTheta(theta);
			tpda.setCorrMetric(mi);
			tpda.setRemoveInsignificantCorrelations(true);
			tpda.setDraft(draft);
			tpda.setThin(thin);
			tpda.setThicken(thicken);
			tpda.setOrientArcs(orientArcs);
			
			_structureLearner = tpda;
		}
		return _structureLearner;
	}
	
	/**
	 * Gets variable data access object.
	 * @return VariableDao.
	 */
	public VariableDao getVariableDao() {
		if(null == _variableDao) {
			VariableDaoImpl variableDao = new VariableDaoImpl();
			variableDao.setDataSource(getDataSource());
			variableDao.setDataTable("dtable");
			variableDao.setDomainColumnName("domain");
			variableDao.setDomainDelimiter(",");
			variableDao.setTypeColumnName("type");
			variableDao.setVarTable("vtable");
			
			_variableDao = variableDao;
		}
		
		return _variableDao;
	}
	
	/**
	 * Gets a data source.
	 * @return DataSource.
	 */
	public DataSource getDataSource() {
		if(null == _dataSource) {
			String driverClassName = "com.mysql.jdbc.Driver";
			String url = "jdbc:mysql://localhost/bn?user=jee&password=jee";
			
			BasicDataSource dataSource = new BasicDataSource();
			dataSource.setDriverClassName(driverClassName);
			dataSource.setUrl(url);
			
			_dataSource = dataSource;
		}
		
		return _dataSource;
	}
	
	/**
	 * Gets an array of variables.
	 * @return Array of Variable.
	 * @throws Exception
	 */
	public Variable[] getVariables() throws Exception {
		VariableDao variableDao = getVariableDao();
		Variable[] variables = variableDao.getVariables();
		return variables;
	}

	/**
	 * Main method.
	 * @param args
	 * @throws Exception 
	 */
	public static void main(String[] args) throws Exception {
		TestLearning testLearning = new TestLearning();
		Variable[] variables = testLearning.getVariables();
		StructureLearner learner = testLearning.getStructureLearner();
		Graph graph = learner.learn(variables);
		System.out.println("NODES");
		for(Iterator it = graph.getNodes().iterator(); it.hasNext(); ) {
			System.out.println(it.next());
		}
		
		System.out.println("ARCS");
		for(Iterator it = graph.getArcs().iterator(); it.hasNext(); ) {
			System.out.println(it.next());
		}
	}
}

The getDataSource method gets a DataSource pointing to your database (in this MySQL instance). The getVariableDao method provides a reference to the VariableDao object that has access to the variable and data. The getStructureLearner method wires up the TPDA implementation. In the main method, you get a reference to all the variables for which you want to perform Bayesian network structure learning and instance of the structure learner. You then pass this array of variables into the learner to produce a Graph. The nodes in the graph should be: x1, x2, x3. The arcs in this graph is: x1–x2 and x2–x3. Therefore, the structure is: x1–x2–x3. Clearly, this graph structure is an undirected graph, and thus cannot satisfy the directed acyclic graph (DAG) requirement of a Bayesian network. The source code for this learning example is located in the source distribution under demo/src/com/vang/jee/fbn/demo/TestLearning.java.

How to get the source and dependencies?

The FBN API is dependent on two other minor projects called, Free-Display and Free-GA (FGA). The Free-Dispaly API is used to visualize the Bayesian networks, while the FGA API is used for search-and-scoring methods for Bayesian network structure learning. You may download all these APIs, and they are all licensed under the Apache 2.0 license.

Free-BN source
Free-BN binary
Free-Disp source
Free-Disp binary
Free-GA source
Free-GA binary

I hope this API helps you in your research. Happy research, data mining, and programming! Cheers! Sib ntsib dua mog!

References

  • G. F. Cooper and E. Herskovitz. “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, vol. 9, 1992, pp. 309–347.
  • J. Cheng, R. Greiner, J. Kelly, D. A. Bell, and W. Liu. “Learning Bayesian Networks from Data: an Information-Theory Based Approach,” The Artificial Intelligence Journal, vol. 137, 2002, pp. 43–90.
Advertisements

One thought on “Open Source Bayesian Network Structure Learning API, Free-BN

  1. Great stuff, looks like nicely architected Java as well. One small problem is that your box.net link for the free-ga source is giving the free-disp zip file.

    I am going to try out using this on a data mining experiment I need to run, but we really just use flat files to do things, not databases. Looks like I could write a new class to the VariableDao interface to accomplish such a thing (based on storing the data in memory rather than in a database). Does that sound right?

    Beyond learning the structure I’d like to run some test data through it, I think you said it had an inference algorithm with it, but I didn’t see any examples, do you happen to have any?

    I am mainly trying to compare BN methods with linear SVM’s, and I need an efficient learning implementation (like TPDA, I wrote my own a long time ago, but it was search based) because we have tons (1000’s) of features. Even with a more efficient algorithm not sure the performance will be reasonable.

    If I do add in a new DAO would you be interested in getting that feedback, or perhaps even putting this on github where it could be forked?

    Thanks,

    James

    PS I used to live in DC, and worked with a bunch of folks at GMU several years ago (through a company named IET)!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s